[ Team LiB ] |
27.6 Internet-Related ModulesPython is used in a wide variety of Internet-related tasks, from making web servers to crawling the Web to "screen-scraping" web sites for data. This section briefly describes the most often used modules used for such tasks that ship with Python's core. For more detailed examples of their use, we recommend Lundh's Standard Python Library and Martelli and Ascher's Python Cookbook (O'Reilly). There are many third-party add-ons worth knowing about before embarking on a significant web- or Internet-related project. 27.6.1 The Common Gateway Interface: The cgi ModulePython programs often process forms from web pages. To make this task easy, the standard Python distribution includes a module called cgi. Chapter 28 includes an example of a Python script that uses the CGI, so we won't cover it any further here. 27.6.2 Manipulating URLs: The urllib and urlparse ModulesUniversal resource locators are strings such as http://www.python.org that are now ubiquitous. Three modules—urllib, urllib2, and urlparse—provide tools for processing URLs. The urllib module defines a few functions for writing programs that must be active users of the Web (robots, agents, etc.). These are listed in Table 27-9.
The module urllib2 focuses on the tasks of opening URLs that the simpler urllib doesn't know how to deal with, and provides an extensible framework for new kinds of URLs and protocols. It is what you should use if you want to deal with passwords, digest authentication, proxies, HTTPS URLs, and other fancy URLs. The module urlparse defines a few functions that simplify taking URLs apart and putting new URLs together. These are listed in Table 27-10.
27.6.3 Specific Internet ProtocolsThe most commonly used protocols built on top of TCP/IP are supported with modules named after them. The telnetlib module lets you act like a Telnet client. The httplib module lets you talk to web servers with the HTTP protocol. The ftplib module is for transferring files using the FTP protocol. The gopherlib module is for browsing Gopher servers (now fairly rare). In the domains of mail and news, you can use the poplib and imaplib modules for reading mail files on POP3 and IMAP servers, respectively and the smptlib module for sending mail, and the nntplib module for reading and posting Usenet news from NNTP servers. There are also modules that can build Internet servers, specifically a generic socket-based IP server (SocketServer), a simple web server (SimpleHTTPServer), a CGI-compliant HTTP server (CGIHTTPSserver), and a module for building asynchronous socket handling services (asyncore). Support for web services currently consists of a core library to process XML-RPC client-side calls (xmlrpclib), as well as a simple XML-RPC server implementation (SimpleXMLRPCServer). Support for SOAP is likely to be added when the SOAP standard becomes more stable. 27.6.4 Processing Internet DataOnce you use an Internet protocol to obtain files from the Internet (or before you serve them to the Internet), you often must process these files. They come in many different formats. Table 27-11 lists each module in the standard library that processes a specific kind of Internet-related file format (there are others for sound and image format processing; see the library reference manual).
27.6.5 XML ProcessingPython comes with a rich set of XML-processing tools. These include parsers, DOM interfaces, SAX interfaces, and more, as shown in Table 27-12.
See the standard library reference for details, or the Python Cookbook (O'Reilly) for example tasks easily solved using the standard XML libraries. The XML facilities are developed by the XML Special Interest Group, which publishes versions of the XML package in-between Python releases. See http://www.python.org/topics/xml for details and the latest version of the code. For expanded coverage, consider Python and XML, by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly). |
[ Team LiB ] |