Carthesian product in Python

I recently worked on some side project that was using pandas dataframe with 2-level multiindexed columns. They could be represented as list of tuples, for example:

[("calendar", "year"), ("calendar", "month"), ("calendar", "day")]

Dataframe with such index would look like this - calendar being the upper level and year, month, day on the lower level.

      calendar          
      year month day
0     2019     2   1
1     2019     3  10

In order to create such multiindex, I needed to generate carthesian product of calendar and year, month, day.

Pandas MultiIndex object provides constructor methods for doing that, for example pd.MultiIndex.from_product:

import pandas as pd

multi_index = pd.MultiIndex.from_product([("calendar",), ("year", "month", "day")])

But I was also wondering if the same can be achieved with just the Python's standard library (and without using nested for loops).

zip & itertools.zip_longest

My initial hunch was to check if there's any parameter to built-in zip function to make it behave to fit my use-case. Zip stops at the shortest iterable so in my case

list(zip(("calendar",), ("year", "month", "day")))

would return:

[('calendar', 'year')]

Google advised to take a look itertools.zip_longest. Citing official documentation's definition:

Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted

Let's compare the results using default & custom fillvalue.

Without providing fillvalue argument:

from itertools import zip_longest

zip_longest(("calendar",), ("year", "month", "day"))

gives

[('calendar', 'year'), (None, 'month'), (None, 'day')]

With fillvalue = "calendar":

from itertools import zip_longest

list(zip_longest(("calendar",), ("year", "month", "day"), fillvalue="calendar"))

gives

[('calendar', 'year'), ('calendar', 'month'), ('calendar', 'day')]

As you can see, zip_longest with fillvalue gives us desired result. But is there a better way?

itertools.product

There is! And it's called itertools.product.

itertools.product -> Cartesian product of input iterables.

from itertools import product

list(product(("calendar",), ("year", "month", "day")))

gives

[('calendar', 'year'), ('calendar', 'month'), ('calendar', 'day')]

Exactly what we needed!

Sidenote: itertools.product does not discard duplicates. If you care about not having them, use set to remove them.

That's it for this post. Hope you learnt something new today.

Best Regards,
Kuba

Thanks for reading the article, I really appreciate it! Have you heard about Braintrust - the first decentralized talent network? Whether you're a freelancer looking for a job, an employer looking for hiring talents, or you just have a wide network of connections - there's something for you there!

Go check it out and register with below link (yeah - it's my referral link and it's free - no hidden costs):

Registration link