[ Team LiB ] |
2.1 Reading DataBefore you learn about reading XML, you must learn how to read a file. In this section, I'll cover basic filesystem and network input in .NET. If you're already familiar with basic I/O types and methods in .NET, feel free to skip to the next section. I/O classes in .NET are located in the System.IO namespace. The basic object used for reading and writing data, regardless of the source, is the Stream object. Stream is an abstract base class, which represents a sequence of bytes; the Stream has a Read( ) method to read the bytes from the Stream, a Write( ) method to write bytes to the Stream, and a Seek( ) method to set the current location within the Stream. Not all instances or subclasses of Stream support all these operations; for example, you cannot write to a FileStream representing a read-only file, and you cannot Seek( ) to a position in a NetworkStream. The properties CanRead, CanWrite, and CanSeek can be interrogated to determine whether the respective operations are supported by the instance of Stream you're dealing with. Table 2-1 shows the Stream type's subclasses and the methods each type supports.
After Stream, the most important .NET I/O type is TextReader. TextReader is optimized for reading characters from a Stream, and provides a level of specialization one step beyond Stream. Unlike Stream, which provides access to data at the level of bytes, TextReader provides string-oriented methods such as ReadLine( ) and ReadToEnd( ). Like Stream, TextReader is also an abstract base class; its subclasses include StreamReader and StringReader. Most .NET XML types receive their input from Stream or TextReader. You can often pass filenames and URLs directly to their constructors and Load( ) methods; however, you'll sometimes find it necessary to manipulate a data source before dealing with its XML content. For that reason, I talk first about handling Files and Streams before delving into XML. 2.1.1 Filesystem I/O.NET provides two types that allow you to deal directly with files: File and FileInfo. A FileInfo instance represents an actual file and its metadata, but the File object contains only static methods used to manipulate files. That is, you must instantiate a FileInfo object to access the contents of the file as well as information about the file, but you can call File's static methods to access files transiently. The following C# code snippet shows how you can use FileInfo to determine the length of a file and its latest modification date. Note that both Length and LastAccessTime are properties of the FileInfo object: // Create an instance of File and query it FileInfo fileInfo = new FileInfo(@"C:\data\file.xml"); long length = fileInfo.Length; DateTime lastAccessTime = fileInfo.LastAccessTime;
You can also use the File type to get the file's last access time, but you cannot get the file's length this way. The GetLastAccessTime( ) method returns the last access time for the filename passed to it, but there is no GetLength( ) method equivalent to the FileInfo object's Length property: // Get the last access time of a file transiently DateTime lastAccessTime = File.GetLastAccessTime(@"C:\data\file.xml");
In general, you should use the File class to get or set the attributes of a file that can be obtained from the operating system, such as its creation and last access times; to open a file for reading or writing; or to move, copy, or delete a file. You may want to use the FileInfo class when you wish to open a file for reading or writing, and hold on to it for a longer period of time. Or you may just skip the File and FileInfo classes and construct a FileStream or StreamReader directly, as I show you later. You may read the contents of a file by getting a FileStream for it, via the File or FileInfo classes' OpenRead( ) methods. FileStream, one of the subclasses of Stream, has a Read( ) method that allows you to read characters from the file into a buffer. The following code snippet opens a file for reading and attempts to read up to 1024 bytes of data into a buffer, echoing the text to the console as it does so: Stream stream = File.OpenRead(@"C:\data\file.xml"); int bytesToRead = 1024; int bytesRead = 0; byte [ ] buffer = new byte [bytesToRead]; // Fill up the buffer repeatedly until we reach the end of file do { bytesRead = stream.Read(buffer, 0, bytesToRead); Console.Write(Encoding.ASCII.GetChars(buffer,0, bytesRead)); } while (bytesToRead == bytesRead); stream.Close( );
Another way to access the data from a file is to use TextReader. File.OpenText( ) returns an instance of TextReader, which includes methods such as ReadLine( ), which lets you read an entire line of text from Stream at a time, and ReadToEnd( ), which lets you read the file's entire contents in one fell swoop. As you can see, TextReader makes for much simpler file access, at least when the file's contents can be dealt with as text: TextReader reader = File.OpenText(@"C:\data\file.xml"); // Read a line at a time until we reach the end of file while (reader.Peek( ) != -1) { string line = reader.ReadLine( ); Console.WriteLine(line); } reader.Close( ); The Peek( ) method reads a single character from the Stream without moving the current position. Peek( ) is used to determine the next character which would be read without actually reading it, and it returns -1 if the next character is the end of the Stream. Other methods, such as Read( ) and ReadBlock( ), allow you to access the file in chunks of various sizes, from a single byte to a block of user-defined size.
2.1.2 Network I/ONetwork I/O is generally similar to file I/O, and both Stream and TextReader types are used to access to data from a network connection. The System.Net namespace contains additional classes that are useful in dealing with common network protocols such as HTTP, while the System.Net.Sockets namespace contains generalized classes for dealing with network sockets. To create a connection to a web server, you will typically use the abstract WebRequest class and its Create( ) and GetResponse( ) methods. Create( ) is a static factory method that returns a new instance of a subclass of WebRequest to handle the URL passed in to Create( ). GetResponse( ) returns a WebResponse object, which provides a method called GetResponseStream( ). The GetResponseStream( ) method returns a Stream object, which you can wrap in a TextReader. As you've already seen, you can use a TextReader to read from an I/O stream. The following code snippet shows a typical sequence for creating a connection to a network data source and displaying its contents to the console device. StreamReader is a concrete implementation of the abstract TextReader base class: WebRequest request = WebRequest.Create("http://www.oreilly.com/"); WebResponse response = request.GetResponse( ); Stream stream = response.GetResponseStream( ); StreamReader reader = new StreamReader(stream); // Read a line at a time and write it to the console while (reader.Peek( ) != -1) { Console.WriteLine(reader.ReadLine( )); }
This pattern works fine when the data source is a URL that adheres to the file, http, or https scheme. Here's an example of a web request that uses a URL with a file scheme: WebRequest request = WebRequest.Create("file:///C:/data/file.xml"); Here's a request that has no URL scheme at all: WebRequest request = WebRequest.Create("file.xml"); In the absence of a valid scheme name at the beginning of a URL, WebRequest assumes that you are referring to a file on the local filesystem and translates the filename to file://localhost/path/to/file. On Windows, the path C:\data\file.xml thus becomes the URL file://localhost/C:/data/file.xml. Technically, a URL using the file scheme does not require a network connection, but it behaves as if it does, as far as .NET is concerned. Therefore, your code can safely treat a file scheme URL just the same as any other URL. (For more on the URL file scheme, see http://www.w3.org/Addressing/URL/4_1_File.html.) Don't try this with an ftp URL scheme, however. While there's nothing to stop you from writing your own FTP client using the Socket class, Microsoft does not provide a means to access an FTP data source with a WebRequest.
2.1.3 Network Access Through a Web ProxyAnother useful feature of the WebRequest class is its ability to read data through a web proxy. A web proxy is a server located on the network between your code and a web server. Its job is to intercept all traffic headed for the web server and attempt to fulfill as many requests as it can without contacting the web server. If a web proxy cannot fulfill a request itself, it forwards the request to the web server for processing. Web proxies serve two primary purposes:
The .NET Framework provides the WebProxy class to help you incorporate the use of web proxy servers into your application. WebProxy is an implementation of IWebProxy, and can only be used to proxy HTTP and HTTPS (secure HTTP) requests. It's important that you know the type of URL you are requesting data from: casting a FileWebRequest to an HttpWebRequest will cause an InvalidCastException to be thrown. To make use of a proxy server that is already set up on your network, you first create the WebRequest just as before. You can then instantiate a WebProxy object, set the address of the proxy server, and set the Proxy( ) property of WebRequest to link the proxy server to the web server. The WebProxy constructor has many overloads for many different situations. In the following example, I'm using a constructor that lets me specify that the host name of the proxy server is http://proxy.mydomain.com. Setting the constructor's second parameter, BypassOnLocal, to true causes local network requests to be sent directly to the destination, circumventing the proxy server: HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://www.oreilly.com/"); request.Proxy = new WebProxy("http://proxy.mydomain.com",true); Any data that goes through WebRequest to a destination external to the local network will now use the proxy server. Why is this important? Imagine that you wish to read XML from an external web page, but your network administrator has installed a web proxy to speed general access and prevent access to some specific sites. Although the XmlTextReader has the ability to read an XML file directly from a URL, it does not have the built-in ability to access the web through a web proxy. Since XmlTextReader can read data from any Stream or TextReader, you now have the ability to access XML documents through the proxy. In the next section, I'll tell you more about the XmlReader class. |
[ Team LiB ] |