zip function, which knits together two iterables, is indispensable for me. It works like this:
list_one = [1, 2, 3] list_two = ["a", "b", "c"] for pair in zip(list_one, list_two): print(pair) # (1, 'a') # (2, 'b') # (3, 'c')
If the two iterables differ in length, zip halts after the shortest is exhausted. If we add an additional element to one of the lists above, we get the same results:
list_one = [1, 2, 3] list_two = ["a", "b", "c", "d"] # Note extra item, "d" for pair in zip(list_one, list_two): print(pair) # (1, 'a') # (2, 'b') # (3, 'c')
But the actual mechanics of this surprised me. Today I was working on the “chunked” problem from Python Morsels (which is great and you should totally try out if you write Python), and was left scratching my head after elements of my iterable started disappearing.
The basic problem for chunked is this: given some iterable, return its elements in
count-length lists. Trey likes you to think in terms of “any iterable” so you can’t depend on list-like behaviour, such as being able to index into the iterable or check its length without consuming it.
It’s safer to assume you get one traversal. So, my solution starts like this, creating an iterator from the iterable.
def chunked(iterable, count): iterator = iter(iterable) ...
Then (eliding the scaffolding) I build up a new
count-length chunk using
zip in a comprehension:
temp = [item for item, _ in zip(iterator, range(count))]
Here I use the “earliest finish” behaviour of
zip paired with
range — the amount of numbers in the range (
count-many of them) determines how many items I fetch from the iterator.
Let’s give this a try, using your imagination to flesh out the rest of
for chunk in chunked(iterable=range(10), count=4): print(chunk) # [0, 1, 2, 3] # [5, 6, 7, 8]
Er, hm. Not what I was expecting, which was:
# [0, 1, 2, 3] # [4, 5, 6, 7] # [8, 9]
Somehow, the program is consuming an extra item from
iterator each time I create a chunk. But that list comprehension is the only place where I touch
iterator. What gives?
Well, how does
zip know when to terminate? If you take a look in the documentation, you’ll see a handy code sample that is “equivalent to” the implementation of
zip. There we see that
zip builds up a list of results by taking an item from each of the given iterables, but if any of those iterables are finished, it just returns — and discards the result list!
So what happens with
zip(longer, shorter) is that it takes from
longer, stashes the item, discovers
shorter is exhausted, and discards the item from
longer. And that’s what happens to the missing numbers in the example above.
This situation arises because I’m zipping the same iterable repeatedly, until it’s empty, and because the iterator is the first argument to
zip. This small change works fine:
# Old, broken temp = [item for item, _ in zip(iterator, range(count))] # New, fixed temp = [item for _, item in zip(range(count), iterator)]
In the new version,
zip discovers that the iterator over the range is exhausted first, before it takes an item from
iterator, so no items are ever discarded.
So, is this OK? Really, really not! This is super-fragile. It’s not obvious that switching the arguments will break the code. And really it just looks wrong, because surely the ignored tuple element (assigned to the underscore) should come after the item that we care about?
itertools module has what we need (as always!). The reason I originally used the list comprehension-zip-range combo is because you can’t slice every iterable. For example:
(x**2 for x in range(10))[:4] # --------------------------------------------------------------------------- # TypeError Traceback (most recent call last) # <ipython-input-2-17f2a627cc7c> in <module> # ----> 1 (x**2 for x in range(10))[:4] # # TypeError: 'generator' object is not subscriptable
But you can with
list(islice((x**2 for x in range(10)), 4)) # [0, 1, 4, 9]
And this works great with iterators where you care about the current state:
to_10_sq = (x**2 for x in range(10)) list(islice(to_10_sq, 4)) # [0, 1, 4, 9] list(islice(to_10_sq, 4)) # [16, 25, 36, 49] list(islice(to_10_sq, 4)) # [64, 81]
Which leads us to the most straightforward way of building up those chunks.
chunk = list(islice(iterator, count))
(The chunks have to be “concrete” sequences as the problem requires some length-checking for one of the bonus parts, hence the
Thanks for reading. If I have some key messages, they’re these: