Fri Nov 29, 2013
itertools is pretty much the coolest thing ever. Despite a vaguely technical name and a decreased emphasis in most introductory Python materials, it’s the kind of builtin package that makes list comprehensions much less of a syntactical mess.
The biggest barrier to using
itertools is that there are, well, a lot of methods that tend to all do similar things. With that in mind, this post is a showcase of some of the more basic — yet completely rad — things you can do with these methods.
Setup and a Disclaimer
First, let’s get the boring part out of the way:
import itertools letters = ['a', 'b', 'c', 'd', 'e', 'f'] booleans = [1, 0, 1, 0, 0, 1] numbers = [23, 20, 44, 32, 7, 12] decimals = [0.1, 0.7, 0.4, 0.4, 0.5]
Well, that was easy.
chain() does exactly what you’d expect it to do: give it a list of lists/tuples/iterables and it chains them together for you. Remember making links of paper with tape as a kid? This is that, but in Python.
Let’s try it out!
print itertools.chain(letters, booleans, decimals) >>> <itertools.chain object at 0x2c7ff0>
Oh god what happened
itertools stands for iterable, which is hopefully a term you’ve run into before. Printing iterables in Python isn’t exactly the hardest thing in the world, since you just need to cast it to a list:
print list(itertools.chain(letters, booleans, decimals)) >>> ['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]
Yay, much better!
chain() also works, as you’d imagine, with lists/iterables of varying lengths:
print list(itertools.chain(letters, letters[3:])) >>> ['a', 'b', 'c', 'd', 'e', 'f', 'd', 'e', 'f']
(For the purposes of making this a readable post I’ll be surrounding most of the methods with
Let’s say you’re trying to do a sensitivity analysis of a super important business simulation. Your entire super important business simulation hinges on the hopes that the average cost of a widget is $10, but demand for that widget might explode over the new few months and you make sure you won’t hemorrhage money if it costs more money. So you want a list of theoretical widget costs to pass to
With list comprehensions, that might look something like:
[(i * 0.25) + 10 for i in range(100)] >>> [10.0, 10.25, 10.5, 10.75, ...]
Which isn’t bad at all! Except that reading it is difficult, especially if you’re chaining that list comprehension inside another list comprehension.
itertools it looks like:
Whee! Now, if you’re a smart little Pythonista you might be thinking to yourself:
Well I pass the function a starting point and a step size, but how does it know when to stop?
And the answer is it never stops.
count() and many other
itertools methods generate infinitely, until aborted (via, say, break). No, really — again,
itertools is all about iterables, and infinite iterables might be scary right now but they are incredibly helpful down the road.
So let’s say we only want the values of the above method up until $20 (this widget has very elastic demand, apparently). How do we cut off
count() like a stern mother scolding a sugar-addled child?
ifilter() is a simple invocation of a simple use case:
print list(itertools.ifilter(lambda x: x % 2, numbers)) >>> [23, 7]
Simple, right? You pass in a function and an iterable object: it returns a list of those objects which, when passed into the function, evaluate True.
So, to solve our little widget problem from earlier:
print list(itertools.ifilter(lambda x: x < 20, itertools.count(10, 0.25)) >>> ... >>> ...
Yeah, this is still going to keep on going infinitely because
count() will keep giving you values, and even though they’re going to be ignored by
ifilter() it has to process them.
So how do we do this? A common pattern is thus:
for i in itertools.count(10, 0.25): if i < 20: do_something() else: break
(Look how readable that is. Isn’t that wonderful?)
compress() is by far what gets the most of my use. It’s perfect: given two lists
b, return the elements of
a for which the corresponding elements of
b are True.
print list(itertools.compress(letters, booleans)) >>> ['a', 'c', 'f']
The final method I’m going to go over is one that should be a simple addition for readers well-versed in the functional programming staples of
imap() is just a version of map that produces an iterable. By passing it a function, it systematically grabs arguments and throws them at the function, returning the results:
print list(itertools.imap(mult, numbers, decimals)) > [2.2, 14.0, 17.6, 12.8, 3.5]
Or (perhaps even better), you can use
None in lieu of a function and get the iterables grouped as tuples back!
print list(itertools.imap(None, numbers, decimals)) > [(22, 0.1), (20, 0.7), (44, 0.4), (32, 0.4), (7, 0.5)]
Okay, so now what?
These are, in my opinion, the five most helpful elements of
itertools. But there are way more. Play around with the above five, then five more (
permutation(), I’d argue, wins the award for highest fun-to-usefulness ratio). But the big takeaway is that these methods are cool on their own, but saving a few lines and characters by migrating away from list comprehensions is a benefit that pales in comparison to what you can do by combining these methods together.
The official documentation has a bunch of great examples of how powerful
itertools is when you pair it with
itertools. My favorite is below:
def unique_everseen(utterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in ifilterfalse(seen.__contains__, utterable): seen_add(element) yield element else: for element in utterable: k = key(element) if k not in seen: seen_add(k) yield element
(Thanks to redditor bwalk for pointing out a typo!)