Why does zip(*[iter(s)]*n)
chunk s
into n
chunks in Python?
This makes possible an idiom for clustering a data series into n-length groups using
zip(*[iter(s)]*n)
.
During this year's Advent of Code, I've been learning python, and that snippet above blew my tiny little python mind. Let's take a look at everything that's going on here!
chunk = lambda s, n: zip(*[iter(s)] * n)
chunk(range(0, 9), 3) # returns iterator((0, 1, 2), (3, 4, 5), (6, 7, 8))
-
iter(s)
turnss
into an iterator. As a caller callsnext(iterator)
, the iterator is used up.my_iterator = iter([1, 2, 3, 4]) for x in my_iterator: print(x) # prints 1, 2, 3, 4 for y in my_iterator: print(y) # does nothing! there's nothing left in my_iterator next(my_iterator) # raises StopIteration exception
-
multiplying a list by
n
createsn
new copies of the list. It doesn't do a deep copy, so when you do[iter(range(0, 9))]
every list gets the same iterator.my_iterator = iter([1, 2, 3]) my_iterator_list = [my_iterator] * 9 next(my_iterator_list[0]) # 1 next(my_iterator_list[1]) # 2 next(my_iterator_list[8]) # 3 next(my_iterator_list[5]) # raises StopIteration exception
-
fn(*[1, 2, 3])
lets you call a function with the entries in the list expanded as the parameters of the function.# behold! a useless example def takes_two_arguments (a, b): return a > b takes_two_arguments(1, 2) == takes_two_arguments(*[1, 2]) # True
-
zip grabs one value from each of its arguments.
zip([1, 2, 3], [4, 5, 6], [7, 8, 9]) # iterator<(1, 4, 7), (2, 5, 6), (3, 6, 9)>
-
when you ask zip to pull from the same iterator multiple times, you end up chunking your array! If you pass in the same iterator to
zip
three times, zip calls that iterator three times to construct the first value that it yields back. It call that same iterator three more times to construct the next argument that it yield back.# chunk = lambda s, n: zip(*[iter(s)] * n) l = range(0, 10) iterable_l = iter(l) chunked_l = zip(iterable_l, iterable_l, iterable_l)
-
The order of operations for
fn(*[iter(s)] * n)
isfn(*([iter(s)] * n))
. The argument spread happens after the list multiplication -
putting that all together, we end up with an "idiomatic" way to chunk a list:
chunk = lambda s, n: zip(*[iter(s)]*n) chunk(range(0, 10), 3) # iter((0, 1, 2), (3, 4, 5), (6, 7, 8))
I think this is a coding idiom I'm going to stay away from for now. I'm not yet fluent enough with python to be confident reading or writing a line like zip(*[iter(range(0, 100))] * 10)
, but I'm glad I can at least puzzle through it!