DekGenius.com
[ Team LiB ] Previous Section Next Section

27.8 Debugging, Testing, Timing, Profiling

To wrap up our overview of common Python tasks, we'll cover some tasks that are common for Python programmers even though they're not programming tasks per se—debugging, testing, timing, and optimizing Python programs.

27.8.1 Debugging with pdb

The first task is, not surprisingly, debugging. Python's standard distribution includes a debugger called pdb. Using pdb is fairly straightforward. You import the pdb module and call its run method with the Python code the debugger should execute. For example, if you're debugging the program in spam.py, do this:

>>> import spam                        # Import the module we want to debug.
>>> import pdb                         # Import pdb.
>>> pdb.run('instance = spam.Spam(  )') # Start pdb with a statement to run.
> <string>(0)?(  )
(Pdb) break spam.Spam.__init__                # We can set break points.
(Pdb) next
>        <string>(1)?(  )
(Pdb) n                                        # 'n' is short for 'next'.
> spam.py(3)__init__(  )
-> def __init__(self):
(Pdb) n
> spam.py(4)__init__(  )
-> Spam.numInstances = Spam.numInstances + 1
(Pdb) list                                     # Show the source code listing.
  1    class Spam:
  2        numInstances = 0
  3 B      def __init__(self):                  # Note the B for Breakpoint.
  4  ->        Spam.numInstances = Spam.numInstances + 1  # Where we are
  5        def printNumInstances(self):
  6            print "Number of instances created: ", Spam.numInstances
  7
[EOF]
(Pdb) where                                    # Show the calling stack.
<string>(1)?(  )
> spam.py(4)__init__(  )
-> Spam.numInstances = Spam.numInstances + 1
(Pdb) Spam.numInstances = 10          # Note that we can modify variables
(Pdb) print Spam.numInstances         # while the program is being debugged.
10
(Pdb) continue                        # This continues until the next break-
--Return--                            # point, but there is none, so we're
-> <string>(1)?(  )->None                 # done.
(Pdb) c                               # This ends up quitting Pdb.
<spam.Spam instance at 80ee60>        # This is the returned instance.
>>> instance.numInstances             # Note that the change to numInstance
11                                    # was before the increment op.

As the session above shows, with pdb you can list the current code being debugged (with an arrow pointing to the line about to be executed), examine variables, modify variables, and set breakpoints. Chapter 9 in the Library Reference covers the debugger in detail. Alternative debuggers abound, from the one in IDLE, to the more full-featured debuggers you'll find in commercial IDEs for Python.

27.8.2 Testing with unittest

Testing software is, in the general case, a very hard problem. For software that takes user input or more generally interacts with the outside world, doing comprehensive testing of any medium-sized program quickly becomes hard to do completely. Luckily, one can get many benefits from doing nonexhaustive testing. The easiest kind of testing to do is called unit testing, and it is supported in Python by the module unittest. In unit testing, one writes very small scripts that test one fact about the program being tested at a time. The trick is to write lots of these simple tests, learn how to write useful unit tests as opposed to silly tests, and to run these tests in between every change to the program. If you have a test suite with good coverage, you'll gain confidence that each change you make is not going to break another part of the system.

unittest is documented as part of the standard library, as well as on the PyUnit web site (http://pyunit.sourceforge.net).

27.8.3 Timing

Even when a program is working, it can sometimes be too slow. If you know what the bottleneck in your program is, and you know of alternative ways to code the same algorithm, then you might time the various alternative methods to find out which is fastest. The time module, which is part of the standard distribution, provides many time-manipulation routines. We'll use just one, which returns the time since a fixed epoch with the highest precision available on your machine. As we'll use just relative times to compare algorithms, the precision isn't all that important. Here's two different ways to create a list of 10,000 zeros:

def lots_of_appends(  ):
    zeros = [  ]
    for i in range(10000):
        zeros.append(0)

def one_multiply(  ):
    zeros = [0] * 10000

How can we time these two solutions? Here's a simple way:

import time, makezeros
def do_timing(num_times, *funcs):
    totals = {  }
    for func in funcs:
        totals[func] = 0.0
        starttime = time.clock(  )        # Record starting time.
        for x in range(num_times):
            for func in funcs:
                apply(func)
        stoptime = time.clock (  )         # Record ending time.
        elapsed = stoptime--starttime       # Difference yields time elapsed
        totals[func] = totals[func] + elapsed
    for func in funcs:
        print "Running %s %d times took %.3f seconds" % (func.__name__, num_times 
totals[func])

do_timing(100, (makezeros.lots_of_appends, makezeros.one_multiply))

And running this program yields:

csh> python timings.py
Running lots_of_appends 100 times took 7.891 seconds
Running one_multiply 100 times took 0.120 seconds

As you might have suspected, a single list multiplication is much faster than lots of appends. Note that in timings, it's always a good idea to compare lots of runs of functions instead of just one. Otherwise, the timings are likely to be heavily influenced by things that have nothing to do with the algorithm, such as network traffic on the computer or GUI events. Python 2.3 introduces a new module called timeit that provides a very simple way to do precise code timing correctly.

What if you've written a complex program, and it's running slower than you'd like, but you're not sure where the problem spot is? In that case, what you need to do is profile the program: determine which parts of the program are the time-sinks and see if they can be optimized, or if the program structure can be modified to even out the bottlenecks. The Python distribution includes just the right tools for that, the profile module, documented in the Library Reference, and another module, hotshot, which is unfortunately not well documented as of this writing. Assuming that you want to profile a given function in the current namespace, do this:

>>> from timings import *
>>> from makezeros import *
>>> profile.run('do_timing(100, (lots_of_appends, one_multiply))')
Running lots_of_appends 100 times took 8.773 seconds
Running one_multiply 100 times took 0.090 seconds
203 function calls in 8.823 CPU seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100   8.574   0.086   8.574  0.086 makezeros.py:1(lots_of_appends)
      100   0.101   0.001   0.101  0.001 makezeros.py:6(one_multiply)
        1   0.001   0.001   8.823  8.823 profile:0(do_timing(100, 
(lots_of_appends, one_multiply)))
        0   0.000           0.000        profile:0(profiler)
        1   0.000   0.000   8.821  8.821 python:0(194.C.2)
        1   0.147   0.147   8.821  8.821 timings.py:2(do_timing)

As you can see, this gives a fairly complicated listing, which includes such things as per-call time spent in each function and the number of calls made to each function. In complex programs, the profiler can help find surprising inefficiencies. Optimizing Python programs is beyond the scope of this book; if you're interested, however, check the Python newsgroup: periodically, a user asks for help speeding up a program and a spontaneous contest starts up, with interesting advice from expert users.

    [ Team LiB ] Previous Section Next Section