DekGenius.com
[ Team LiB ] Previous Section Next Section

14.6 Generators and Iterators

It is possible to write functions that may be resumed after they send a value back. Such functions are known as generators because they generate a sequence of values over time. Unlike normal functions that return a value and exit, generator functions automatically suspend and resume their execution and state around the point of value generation. Because of that, they are often a useful alternative to both computing an entire series of values up front, and manually saving and restoring state in classes.

The chief code difference between generator and normal functions is that generators yield a value, rather than returning one—the yield statement suspends the function and sends a value back to the caller, but retains enough state to allow the function to resume from where it left off. This allows functions to produce a series of values over time, rather than computing them all at once, and sending them back in something like a list.

Generator functions are bound up with the notion of iterator protocols in Python. In short, functions containing a yield statement are compiled specially as generators; when called, they return a generator object that supports the iterator object interface.

Iterator objects, in turn, define a next method, which returns the next item in the iteration, or raises a special exception (StopIteration) to end the iteration. Iterators are fetched with the iter built-in function. Python for loops use this iteration interface protocol to step through a sequence (or sequence generator), if the protocol is supported; if not, for falls back on repeatedly indexing sequences instead.

14.6.1 Generator Example

Generators and iterators are an advanced language feature, so please see the Python library manuals for the full story on generators.

To illustrate the basics, though, the following code defines a generator function that can be used to generate the squares of a series of numbers over time:[4]

[4] Generators are available in Python releases after version 2.2; in 2.2, they must be enabled with a special import statement of the form: from __future__ import generators. (See Chapter 18 for more on this statement form.) Iterators were already available in 2.2, largely because the underlying protocol did not require the new, non-backward-compatible keyword, yield.

>>> def gensquares(N):
...     for i in range(N):
...         yield i ** 2               # Resume here later.

This function yields a value, and so returns to its caller, each time through the loop; when it is resumed, its prior state is restored, and control picks up again immediately after the yield statement. For example, when used as the sequence in a for loop, control will resume the function after its yield statement, each time through the loop:

>>> for i in gensquares(5):        # Resume the function. 
...     print i, ':',              # Print last yielded value.
...
0 : 1 : 4 : 9 : 16 :
>>>

To end the generation of values, functions use either a return statement with no value, or simply fall off the end of the function body. If you want to see what is going on inside the for, call the generator function directly:

>>> x = gensquares(10)
>>> x
<generator object at 0x0086C378>

You get back a generator object that supports the iterator protocol—it has a next method, which starts the function, or resumes it from where it last yielded a value:

>>> x.next(  )
0
>>> x.next(  )
1
>>> x.next(  )
4

for loops work with generators in the same way—by calling the next method repeatedly, until an exception is caught. If the object to be iterated over does not support this protocol, for loops instead use the indexing protocol to iterate.

Note that in this example, we could also simply build the list of yielded values all at once:

>>> def buildsquares(n):
...     res = [  ]
...     for i in range(n): res.append(i**2)
...     return res
...
>>> for x in buildsquares(5): print x, ':',
...
0 : 1 : 4 : 9 : 16 :

For that matter, we could simply use any of the for loop, map, or list comprehension techniques:

>>> for x in [n**2 for n in range(5)]:
...     print x, ':',
...
0 : 1 : 4 : 9 : 16 :

>>> for x in map((lambda x:x**2), range(5)):
...     print x, ':',
...
0 : 1 : 4 : 9 : 16 :

However, especially when result lists are large, or when it takes much computation to produce each value, generators allow functions to avoid doing all the work up front. They distribute the time required to produce the series of values among loop iterations. Moreover, for more advanced generator uses, they provide a simpler alternative to manually saving the state between iterations in class objects (more on classes later in Part VI); with generators, function variables are saved and restored automatically.

14.6.2 Iterators and Built-in Types

Built-in datatypes are designed to produce iterator objects in response to the iter built-in function. Dictionary iterators, for instance, produce key list items on each iteration:

>>> D = {'a':1, 'b':2, 'c':3}
>>> x = iter(D)
>>> x.next(  )
'a'
>>> x.next(  )
'c'

In addition, all iteration contexts, including for loops, map calls, and list comprehensions, are in turn designed to automatically call the iter function to see if the protocol is supported. That's why you can loop through a dictionary's keys without calling its keys method, step through lines in a file without calling readlines or xreadlines, and so on:

>>> for key in D: 
...     print key, D[key]
...
a 1
c 3
b 2

For file iterators, Python 2.2 simply uses the result of the file xreadlines method; this method returns an object that loads lines from the file on demand, and reads by chunks of lines instead of loading the entire file all at once:

>>> for line in open('temp.txt'):
...     print line,
...
Tis but
a flesh wound.

It is also possible to implement arbitrary objects with classes, which conform to the iterator protocol, and so may be used in for loops and other iteration contexts. Such classes define a special __iter__ method that return an iterator object (preferred over the __getitem__ indexing method). However, this is well beyond the scope of this chapter; see Part VI for more on classes in general, and Chapter 21 for an example of a class that implements the iterator protocol.

    [ Team LiB ] Previous Section Next Section