Learning Python 2nd Edition-Learning Python 2nd Edition

15.3 How Imports Work

The prior section talked about importing modules, without really explaining what happens when you do so. Since imports are at the heart of program structure in Python, this section goes into more detail on the import operation to make this process less abstract.

Some C programmers like to compare the Python module import operation to a C #include, but they really shouldn't—in Python, imports are not just textual insertions of one file into another. They are really runtime operations that perform three distinct steps the first time a file is imported by a program:

Find the module's file.
Compile it to byte-code (if needed).
Run the module's code to build the objects it defines.

All three of these steps are only run the first time a module is imported during a program's execution; later imports of the same module bypass all of these and simply fetch the already-loaded module object in memory. To better understand module imports, let's explore each of these steps in turn.

15.3.1 1. Find It

First off, Python must locate the module file referenced by your import statement. Notice the import statement in the prior section's example names the file without a .py suffix and without its directory path. It says just import b, instead of something like import c:\dir1\b.py. Import statements omit path and suffix details like this on purpose; you can only list a simple name.^[1] Instead, Python uses a standard module search path to locate the module file corresponding to an import statement.

^[1] In fact, it's syntactically illegal to include path and suffix detail in an import. In Chapter 17, we'll meet package imports, which allow import statements to include part of the directory path leading to a file, as a set of period-separated names. However, package imports still rely on the normal module search path, to locate the leftmost directory in a package path. They also cannot make use of any platform-specific directory syntax in the import statement; such syntax only works on the search path. Also note that module file search path issues are not as relevant when you run frozen executables (discussed in Chapter 2); they typically embed byte code in the binary image.

15.3.1.1 The module search path

In many cases, you can rely on the automatic nature of the module import search path and need not configure this path at all. If you want to be able to import files across user-defined directory boundaries, though, you will need to know how the search path works, in order to customize it. Roughly, Python's module search path is automatically composed as the concatenation of these major components:

The home directory of the top-level file.
PYTHONPATH directories (if set).
Standard library directories.
The contents of any .pth files (if present).

The first and third of these are defined automatically. Because Python searches the concatenation of these from first to last, the second and fourth can be used to extend the module search path to include your own directories. Here is how Python uses each of these path components:

Home directory

Python first looks for the imported file in the home directory. Depending on how you are launching code, this is either the directory containing your program's top-level file, or the directory in which you are working interactively. Because this is always searched first, if a program is located entirely in a single directory, all its imports will work automatically, with no path configuration required.

PYTHONPATH directories

Next, Python searches all directories listed in your PYTHONPATH envronment variable setting, from left to right (assuming you have set this at all). In brief, PYTHONPATH is simply set to a list of user-defined and platform-specific names of directories that contain Python code files. Add all the directories that you wish to be able to import from; Python uses your setting to extend the module search path.

Because Python searches the home directory first, you only need to make this setting to import files across directory boundaries—that is, to import a file that is stored in a different directory than the file that imports it. In practice, you probably will make this setting once you start writing substantial programs. When you are first starting out, though, if you save all your module files in the directory that you are working in (i.e., the home directory), your imports will work without making this setting at all.

Standard library directories

Next, Python will automatically search the directories where the standard library modules are installed on your machine. Because these are always searched, they normally do not need to be added to your PYTHONPATH.

.pth file directories

Finally, a relatively new feature of Python allows users to add valid directories to the module search path by simply listing them, one per line, in a text file whose name ends in a .pth suffix (for "path"). These path configuration files are a somewhat advanced installation-related feature, which we will not discuss fully here.

In short, a text file of directory names, dropped in an appropriate directory, can serve roughly the same role as the PYTHONPATH environment variable setting. For instance, a file named myconfig.pth, may be placed at the top level of the Python install directory on Windows (e.g., in C:\Python22), to extend the module search path. Python will add the directories listed on each line of the file, from first to last, near the end of the module search path list. Because they are based on files instead of shell settings, path files can also apply to all users of an installation, instead of just one user or shell.

This feature is more sophisticated than we will describe here. We recommend that beginners use either PYTHONPATH or a single .pth file, and then only if you must import across directories. See the Python library manual for more details on this feature, especially its documentation for standard library module site.

See also Appendix A for examples of common ways to extend your module search path with PYTHONPATH or .pth files on various platforms. Depending on your platform, additonal directories may be automatically added to the module search path as well. In fact, this description of the module search path is accurate, but generic; the exact configuration of the search path is prone to change over both platforms and Python releases.

For instance, Python may add an entry for the current working directory—the directory from which you launched your program—in the search path, after the PYTHONPATH directories, and before standard library entries. When launching from a command line, the current working directory may not be the same as the home directory of your top-level file—the directory where your program file resides. (See Chapter 3 for more on command lines.) Since the current working directory can vary each time your program runs, you normally shouldn't depend on its value for import purposes.

15.3.1.2 The sys.path list

If you want to see how the path is truly configured on your machine, you can always inspect the module search path as it is known to Python, by printing the built-in sys.path list (that is, attribute path, of built-in module sys). This Python list of directory name strings is the actual search path; on imports, Python searches each directory on this list, from left to right.

Really, sys.path is the module search path. It is configured by Python at program startup, using the four path components just described. Python automatically merges any PYTHONPATH and .pth file path settings you've made into this list, and always sets the first entry to identify the home directory of the top-level file, possibly as an empty string.

Python exposes this list for two good reasons. First of all, it provides a way to verify the search path settings you've made—if you don't see your settings somewhere on this list, you need to recheck your work. Secondly, if you know what you're doing, this list also provides a way for scripts to tailor their search paths manually. As you'll see later in this part, by modifying the sys.path list, you can modify the search path for all future imports. Such changes only last for the duration of the script, however; PYTHONPATH and .pth files are more permanent ways to modify the path.^[2]

^[2] Some programs really need to change sys.path, though. Scripts that run on web servers, for example, usually run as user "nobody" to limit machine access. Because such scripts cannot usually depend on "nobody" to have set PYTHONPATH in any particular way, they often set sys.path manually to include required source directories, prior to running any import statements.

15.3.1.3 Module file selection

Keep in mind that filename suffixes (e.g., .py) are omitted in import statements, intentionally. Python chooses the first file it can find on the search path that matches the imported name. For example, an import statement of the form import b, might load:

Source file b.py
Byte-code file b.pyc
A directory named b, for package imports
A C extension module (e.g., b.so on Linux)
An in-memory image, for frozen executables
A Java class, in the Jython system
A zip file component, using the zipimport module

Some standard library modules are actually coded in C. C extensions, Jython, and package imports all extend imports beyond simple files. To importers, though, the difference in loaded file type is completely transparent, both when importing and fetching module attributes. Saying import b gets whatever module b is, according to your module search path, and b.attr fetches an item in the module, be that a Python variable or a linked-in C function. Some standard modules we will use in this book, for example, are coded in C, not Python; their clients don't have to care.

If you have both a b.py and a b.so in different directories, Python will always load the one on the first (leftmost) directory on your module search path, during the left to right search of sys.path. But what happens if there is both a b.py and b.so in the same directory? Python follows a standard picking order, but it is not guaranteed to stay the same over time. In general, you should not depend on which type of file Python will choose within a given directory—make your module names distinct, or use module search path configuration to make module selection more obvious. It is also possible to redefine much of what an import operation does in Python, with what are known as import hooks. These hooks can be used to make imports do useful things such as load files from zip archives, perform decryption, and so on (in fact, Python 2.3 includes a zipimport standard module, which allows files to be directly imported from zip archives). Normally, though, imports work as described in this section. Python also supports the notion of .pyo optimized byte-code files, created and run with the -O Python command-line flag; because these run only slightly faster than normal .pyc files (typically 5% faster), they are infrequently used. The Psyco system (see Chapter 2) provides more substantial speedups.

15.3.2 2. Compile It (Maybe)

After finding a source code file that matches an import statement according to the module search path, Python next compiles it to byte code, if necessary. (We discussed byte code in Chapter 2.)

Python checks file timestamps and skips the source to byte code compile step, if it finds a .pyc byte code file that is not older than the corresponding .py source file. In addition, if Python finds only a byte code file on the search path and no source, it simply loads the byte code directly. Because of this, the compile step is bypassed if possible, to speed program startup. If you change the source code, Python will automatically regenerate the byte code the next time your program is run. Moreover, you can ship a program as just byte code files, and avoid sending source.

Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc byte code file for the top-level file of your program, unless it is also imported elsewhere—only imported files leave behind a .pyc on your machine. The byte code of top-level files is used internally and discarded; byte-code of imported files is saved in files to speed future imports.

Top-level files are often designed to be executed directly and not imported at all. Later, we'll see that it is possible to design a file that serves both as the top-level code of a program, and a module of tools to be imported. Such files may be both executed and imported, and thus generate a .pyc. To learn how, watch for the discussion of the special __name__ attribute and "__main__" in Chapter 18.

15.3.3 3. Run It

The final step of an import operation executes the byte code of the module. All statements in the file execute in turn, from top to bottom, and any assignments made to names during this step generate attributes of the resulting module object. This execution step generates all the tools that the module's code defines. For instance, def statements in a file are run at import time to create functions, and assign attributes within the module to those functions. The functions are called later in the program by importers.

Because this last import step actually runs the file's code, if any top-level code in a module file does real work, you'll see its results at import time. For example, top-level print statements in a module show output when the file is imported. Function def statements simply define objects for later use.

As you can see, import operations involve quite a bit of work—they search for files, possibly run a compiler, and run Python code. A given module is only imported once per process by default. Future imports skip all three import steps, and reuse the already-loaded module in memory.^[3]

^[3] Technically, Python keeps already-loaded modules in the built-in sys.modules dictionary, and checks that at the start of an import operation to know if the module is already loaded. If you want to see which modules are loaded, import sys, and print sys.modules.keys( ). More on this internal table in Chapter 18.

As you can also see, the import operation is at the heart of program architecture in Python. Larger programs are divided into multiple files, which are linked together at runtime by imports. Imports in turn use module search paths to locate your files, and modules define attributes for external use.

Of course, the whole point of imports and modules is to provide a structure to your program, which divides its logic into self-contained software components. Code in one module is isolated from code in another; in fact, no file can ever see the names defined in another, unless explicit import statements are run. To see what this all means in terms of actual code, let's move on to Chapter 16.

[ Team LiB ]