[ Team LiB ] |
15.3 How Imports WorkThe prior section talked about importing modules, without really explaining what happens when you do so. Since imports are at the heart of program structure in Python, this section goes into more detail on the import operation to make this process less abstract. Some C programmers like to compare the Python module import operation to a C #include, but they really shouldn't—in Python, imports are not just textual insertions of one file into another. They are really runtime operations that perform three distinct steps the first time a file is imported by a program:
All three of these steps are only run the first time a module is imported during a program's execution; later imports of the same module bypass all of these and simply fetch the already-loaded module object in memory. To better understand module imports, let's explore each of these steps in turn. 15.3.1 1. Find ItFirst off, Python must locate the module file referenced by your import statement. Notice the import statement in the prior section's example names the file without a .py suffix and without its directory path. It says just import b, instead of something like import c:\dir1\b.py. Import statements omit path and suffix details like this on purpose; you can only list a simple name.[1] Instead, Python uses a standard module search path to locate the module file corresponding to an import statement.
15.3.1.1 The module search pathIn many cases, you can rely on the automatic nature of the module import search path and need not configure this path at all. If you want to be able to import files across user-defined directory boundaries, though, you will need to know how the search path works, in order to customize it. Roughly, Python's module search path is automatically composed as the concatenation of these major components:
The first and third of these are defined automatically. Because Python searches the concatenation of these from first to last, the second and fourth can be used to extend the module search path to include your own directories. Here is how Python uses each of these path components:
See also Appendix A for examples of common ways to extend your module search path with PYTHONPATH or .pth files on various platforms. Depending on your platform, additonal directories may be automatically added to the module search path as well. In fact, this description of the module search path is accurate, but generic; the exact configuration of the search path is prone to change over both platforms and Python releases. For instance, Python may add an entry for the current working directory—the directory from which you launched your program—in the search path, after the PYTHONPATH directories, and before standard library entries. When launching from a command line, the current working directory may not be the same as the home directory of your top-level file—the directory where your program file resides. (See Chapter 3 for more on command lines.) Since the current working directory can vary each time your program runs, you normally shouldn't depend on its value for import purposes. 15.3.1.2 The sys.path listIf you want to see how the path is truly configured on your machine, you can always inspect the module search path as it is known to Python, by printing the built-in sys.path list (that is, attribute path, of built-in module sys). This Python list of directory name strings is the actual search path; on imports, Python searches each directory on this list, from left to right. Really, sys.path is the module search path. It is configured by Python at program startup, using the four path components just described. Python automatically merges any PYTHONPATH and .pth file path settings you've made into this list, and always sets the first entry to identify the home directory of the top-level file, possibly as an empty string. Python exposes this list for two good reasons. First of all, it provides a way to verify the search path settings you've made—if you don't see your settings somewhere on this list, you need to recheck your work. Secondly, if you know what you're doing, this list also provides a way for scripts to tailor their search paths manually. As you'll see later in this part, by modifying the sys.path list, you can modify the search path for all future imports. Such changes only last for the duration of the script, however; PYTHONPATH and .pth files are more permanent ways to modify the path.[2]
15.3.1.3 Module file selectionKeep in mind that filename suffixes (e.g., .py) are omitted in import statements, intentionally. Python chooses the first file it can find on the search path that matches the imported name. For example, an import statement of the form import b, might load:
Some standard library modules are actually coded in C. C extensions, Jython, and package imports all extend imports beyond simple files. To importers, though, the difference in loaded file type is completely transparent, both when importing and fetching module attributes. Saying import b gets whatever module b is, according to your module search path, and b.attr fetches an item in the module, be that a Python variable or a linked-in C function. Some standard modules we will use in this book, for example, are coded in C, not Python; their clients don't have to care. If you have both a b.py and a b.so in different directories, Python will always load the one on the first (leftmost) directory on your module search path, during the left to right search of sys.path. But what happens if there is both a b.py and b.so in the same directory? Python follows a standard picking order, but it is not guaranteed to stay the same over time. In general, you should not depend on which type of file Python will choose within a given directory—make your module names distinct, or use module search path configuration to make module selection more obvious. It is also possible to redefine much of what an import operation does in Python, with what are known as import hooks. These hooks can be used to make imports do useful things such as load files from zip archives, perform decryption, and so on (in fact, Python 2.3 includes a zipimport standard module, which allows files to be directly imported from zip archives). Normally, though, imports work as described in this section. Python also supports the notion of .pyo optimized byte-code files, created and run with the -O Python command-line flag; because these run only slightly faster than normal .pyc files (typically 5% faster), they are infrequently used. The Psyco system (see Chapter 2) provides more substantial speedups. 15.3.2 2. Compile It (Maybe)After finding a source code file that matches an import statement according to the module search path, Python next compiles it to byte code, if necessary. (We discussed byte code in Chapter 2.) Python checks file timestamps and skips the source to byte code compile step, if it finds a .pyc byte code file that is not older than the corresponding .py source file. In addition, if Python finds only a byte code file on the search path and no source, it simply loads the byte code directly. Because of this, the compile step is bypassed if possible, to speed program startup. If you change the source code, Python will automatically regenerate the byte code the next time your program is run. Moreover, you can ship a program as just byte code files, and avoid sending source. Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc byte code file for the top-level file of your program, unless it is also imported elsewhere—only imported files leave behind a .pyc on your machine. The byte code of top-level files is used internally and discarded; byte-code of imported files is saved in files to speed future imports. Top-level files are often designed to be executed directly and not imported at all. Later, we'll see that it is possible to design a file that serves both as the top-level code of a program, and a module of tools to be imported. Such files may be both executed and imported, and thus generate a .pyc. To learn how, watch for the discussion of the special __name__ attribute and "__main__" in Chapter 18. 15.3.3 3. Run ItThe final step of an import operation executes the byte code of the module. All statements in the file execute in turn, from top to bottom, and any assignments made to names during this step generate attributes of the resulting module object. This execution step generates all the tools that the module's code defines. For instance, def statements in a file are run at import time to create functions, and assign attributes within the module to those functions. The functions are called later in the program by importers. Because this last import step actually runs the file's code, if any top-level code in a module file does real work, you'll see its results at import time. For example, top-level print statements in a module show output when the file is imported. Function def statements simply define objects for later use. As you can see, import operations involve quite a bit of work—they search for files, possibly run a compiler, and run Python code. A given module is only imported once per process by default. Future imports skip all three import steps, and reuse the already-loaded module in memory.[3]
As you can also see, the import operation is at the heart of program architecture in Python. Larger programs are divided into multiple files, which are linked together at runtime by imports. Imports in turn use module search paths to locate your files, and modules define attributes for external use. Of course, the whole point of imports and modules is to provide a structure to your program, which divides its logic into self-contained software components. Code in one module is isolated from code in another; in fact, no file can ever see the names defined in another, unless explicit import statements are run. To see what this all means in terms of actual code, let's move on to Chapter 16. |
[ Team LiB ] |