DekGenius.com
[ Team LiB ] Previous Section Next Section

18.7 Module Gotchas

Here is the usual collection of boundary cases, which make life interesting for beginners. Some are so obscure it was hard to come up with examples, but most illustrate something important about Python.

18.7.1 Importing Modules by Name String

The module name in an import or from statement is a hardcoded variable name. Sometimes, though, your program will get the name of a module to be imported as a string at runtime (e.g., if a user selects a module name from within a GUI). Unfortunately, you can't use import statements directly to load a module given its name as a string—Python expects a variable here, not a string. For instance:

>>> import "string"
  File "<stdin>", line 1
    import "string"
                  ^
SyntaxError: invalid syntax

It also won't work to put the string in a variable name:

x = "string"
import x

Here, Python will try to import a file x.py, not the string module.

To get around this, you need to use special tools to load modules dynamically from a string that exists at runtime. The most general approach is to construct an import statement as a string of Python code and pass it to the exec statement to run:

>>> modname = "string"
>>> exec "import " + modname       # Run a string of code.
>>> string                         # Imported in this namespace
<module 'string'>

The exec statement (and its cousin for expressions, the eval function) compiles a string of code, and passes it to the Python interpreter to be executed. In Python, the byte code compiler is available at runtime, so you can write programs that construct and run other programs like this. By default, exec runs the code in the current scope, but you can get more specific by passing in optional namespace dictionaries.

The only real drawback to exec is that it must compile the import statement each time it runs; if it runs many times, your code may run quicker if it uses the built-in __import__ function to load from a name string instead. The effect is similar, but __import__ returns the module object, so assign it to a name here to keep it:

>>> modname = "string"
>>> string = __import__(modname)
>>> string
<module 'string'>

18.7.2 from Copies Names but Doesn't Link

The from statement is really an assignment to names in the importer's scope—a name-copy operation, not a name aliasing. The implications of this are the same as for all assignments in Python, but subtle, especially given that the code that shares objects lives in different files. For instance, suppose we define the following module (nested1.py):

X = 99
def printer(  ): print X

If we import its two names using from in another module (nested2.py), we get copies of those names, not links to them. Changing a name in the importer resets only the binding of the local version of that name, not the name in nested1.py:

from nested1 import X, printer     # Copy names out.
X = 88                              # Changes my "X" only!
printer(  )                         # nested1's X is still 99

% python nested2.py
99

If you use import to get the whole module and assign to a qualified name, you change the name in nested1.py. Qualification directs Python to a name in the module object, rather than a name in the importer (nested3.py):

import nested1                    # Get module as a whole.
nested1.X = 88                    # Okay: change nested1's X
nested1.printer(  ) 

% python nested3.py
88

18.7.3 Statement Order Matters in Top-Level Code

When a module is first imported (or reloaded), Python executes its statements one by one, from the top of file to the bottom. This has a few subtle implications regarding forward references that are worth underscoring here:

  • Code at the top level of a module file (not nested in a function) runs as soon as Python reaches it during an import; because of that, it can't reference names assigned lower in the file.

  • Code inside a function body doesn't run until the function is called; because names in a function aren't resolved until the function actually runs, they can usually reference names anywhere in the file.

Generally, forward references are only a concern in top-level module code that executes immediately; functions can reference names arbitrarily. Here's an example that illustrates forward reference:

func1(  )               # Error: "func1" not yet assigned

def func1(  ):
    print func2(  )     # Okay:  "func2" looked up later

func1(  )               # Error: "func2" not yet assigned

def func2(  ):
    return "Hello"

func1(  )               # Okay:  "func1" and "func2" assigned

When this file is imported (or run as a standalone program), Python executes its statements from top to bottom. The first call to func1 fails because the func1 def hasn't run yet. The call to func2 inside func1 works as long as func2's def has been reached by the time func1 is called (it hasn't when the second top-level func1 call is run). The last call to func1 at the bottom of the file works, because func1 and func2 have both been assigned.

Mixing defs with top-level code is not only hard to read, it's dependent on statement ordering. As a rule of thumb, if you need to mix immediate code with defs, put your defs at the top of the file and top-level code at the bottom. That way, your functions are defined and assigned by the time code that uses them runs.

18.7.4 Recursive "from" Imports May Not Work

Because imports execute a file's statements from top to bottom, you sometimes need to be careful when using modules that import each other (something called recursive imports). Since the statements in a module have not all been run when it imports another module, some of its names may not yet exist. If you use import to fetch a module as a whole, this may or may not matter; the module's names won't be accessed until you later use qualification to fetch their values. But if you use from to fetch specific names, you only have access to names already assigned.

For instance, take the following modules recur1 and recur2. recur1 assigns a name X, and then imports recur2, before assigning name Y. At this point, recur2 can fetch recur1 as a whole with an import (it already exists in Python's internal modules table), but it can see only name X if it uses from; the name Y below the import in recur1 doesn't yet exist, so you get an error:

#File: recur1.py
X = 1
import recur2             # Run recur2 now if it doesn't exist.
Y = 2


#File: recur2.py
from recur1 import X      # Okay: "X" already assigned
from recur1 import Y      # Error: "Y" not yet assigned

>>> import recur1
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "recur1.py", line 2, in ?
    import recur2
  File "recur2.py", line 2, in ?
    from recur1 import Y   # Error: "Y" not yet assigned
ImportError: cannot import name Y

Python avoids rerunning recur1's statements when they are imported recursively from recur2 (or else the imports would send the script into an infinite loop), but recur1's namespace is incomplete when imported by recur2.

Don't use from in recursive imports . . . really! Python won't get stuck in a cycle, but your programs will once again be dependent on the order of statements in modules. There are two ways out of this gotcha:

  • You can usually eliminate import cycles like this by careful design; maximizing cohesion and minimizing coupling are good first steps.

  • If you can't break the cycles completely, postpone module name access by using import and qualification (instead of from), or running your froms inside functions (instead of at the top level of the module) or near the bottom of your file to defer their execution.

18.7.5 reload May Not Impact from Imports

The from statement is the source of all sorts of gotchas in Python. Here's another: because from copies (assigns) names when run, there's no link back to the module where the names came from. Names imported with from simply become references to objects, which happen to have been referenced by the same names in the importee when the from ran.

Because of this behavior, reloading the importee has no effect on clients that use from; the client's names still reference the original objects fetched with from, even though names in the original module have been reset:

from module import X       # X may not reflect any module reloads!
 . . . 
reload(module)             # Changes module, but not my names
X                          # Still references old object

Don't do it that way. To make reloads more effective, use import and name qualification, instead of from. Because qualifications always go back to the module, they will find the new bindings of module names after reloading:

import module              # Get module, not names.
 . . . 
reload(module)             # Changes module in-place.
module.X                   # Get current X: reflects module reloads

18.7.6 reload and from and Interactive Testing

Chapter 3 warned readers that it's usually better to not launch programs with imports and reloads, because of the complexities involved. Things get even worse with from. Python beginners often encounter this gotcha: after opening a module file in a text edit window, they launch an interactive session to load and test their module with from:

from module import function
function(1, 2, 3)

After finding a bug, they jump back to the edit window, make a change, and try to reload this way:

reload(module)

Except this doesn't work—the from statement assigned the name function, not module. To refer to the module in a reload, you have to first load it with an import statement, at least once:

import module
reload(module)
function(1, 2, 3)

Except this doesn't quite work either—reload updates the module object, but names like function copied out of the module in the past still refer to old objects (in this case, the original version of the function). To really get the new function, either call it module.function after the reload, or rerun the from:

import module
reload(module)
from module import function
function(1, 2, 3)

And now, the new version of the function finally runs. But there are problems inherent in using reload with from; not only do you have to remember to reload after imports, you also have to remember to rerun your from statements after reloads; this is complex enough to even trip up an expert once in a while.

You should not expect reload and from to play together nicely. Better yet, don't combine them at all—use reload with import, or launch programs other ways, as suggested in Chapter 3 (e.g., use the Edit/Runscript option in IDLE, file icon clicks, or system command lines).

18.7.7 reload Isn't Applied Transitively

When you reload a module, Python only reloads that particular module's file; it doesn't automatically reload modules that the file being reloaded happens to import. For example, if you reload some module A, and A imports modules B and C, the reload only applies to A, not B and C. The statements inside A that import B and C are rerun during the reload, but they'll just fetch the already loaded B and C module objects (assuming they've been imported before). In actual code, here's file A.py:

import B              # Not reloaded when A is
import C              # Just an import of an already loaded module

% python
>>> . . . 
>>> reload(A)

Don't depend on transitive module reloads. Use multiple reload calls to update subcomponents independently. If desired, you can design your systems to reload their subcomponents automatically by adding reload calls in parent modules like A.

Better still, you could write a general tool to do transitive reloads automatically, by scanning module __dict__s (see Section 18.6.1 earlier in this chapter), and checking each item's type( ) (see Chapter 7) to find nested modules to reload recursively. Such a utility function could call itself, recursively, to navigate arbitrarily shaped import dependency chains.

Module reloadall.py, listed below, has a reload_all function that automatically reloads a module, every module that the module imports, and so on, all the way to the bottom of the import chains. It uses a dictionary to keep track of modules already reloaded, recursion to walk the import chains, and the standard library's types module (introduced at the end of Chapter 7), which simply predefines type( ) result for built-in types.

To use this utility, import its reload_all function, and pass it the name of an already-loaded module, much like the built-in reload function; when the file runs stand-alone, its self-test code tests itself—it has to import itself, because its own name is not defined in the file without an import. We encourage you to study and experiment with this example on your own:

import types

def status(module):
    print 'reloading', module.__name__

def transitive_reload(module, visited):
    if not visited.has_key(module):              # Trap cycles, dups.
        status(module)                           # Reload this module
        reload(module)                           # and visit children.
        visited[module] = None
        for attrobj in module.__dict__.values(  ):    # For all attrs
            if type(attrobj) == types.ModuleType:    # Recur if module
                transitive_reload(attrobj, visited)
        
def reload_all(*args):
    visited = {  }
    for arg in args:
        if type(arg) == types.ModuleType:
            transitive_reload(arg, visited)

if __name__ == '__main__':
    import reloadall                # Test code: reload myself
    reload_all(reloadall)           # Should reload this, types
    [ Team LiB ] Previous Section Next Section