DekGenius.com
[ Team LiB ] Previous Section Next Section

2.1 Reading Data

Before you learn about reading XML, you must learn how to read a file. In this section, I'll cover basic filesystem and network input in .NET. If you're already familiar with basic I/O types and methods in .NET, feel free to skip to the next section.

I/O classes in .NET are located in the System.IO namespace. The basic object used for reading and writing data, regardless of the source, is the Stream object. Stream is an abstract base class, which represents a sequence of bytes; the Stream has a Read( ) method to read the bytes from the Stream, a Write( ) method to write bytes to the Stream, and a Seek( ) method to set the current location within the Stream. Not all instances or subclasses of Stream support all these operations; for example, you cannot write to a FileStream representing a read-only file, and you cannot Seek( ) to a position in a NetworkStream. The properties CanRead, CanWrite, and CanSeek can be interrogated to determine whether the respective operations are supported by the instance of Stream you're dealing with.

Table 2-1 shows the Stream type's subclasses and the methods each type supports.

Table 2-1. Stream subclasses and their supported members

Type

Length

Position

Flush( )

Read( )

Seek( )

Write( )

System.IO.BufferedStream

Yes

Yes

Yes

Yes

Yes

Yes

System.IO.FileStream

Yes

Yes

Yes

Yes

Yes

Yes

System.IO.IsolatedStorage.IsolatedStorageFileStream

Yes

Yes

Yes

Yes

Yes

Yes

System.IO.MemoryStream

Yes

Yes

Yes (does nothing)

Yes

Yes

Yes

System.Net.Sockets.NetworkStream

No (throws exception)

No (throws exception)

Yes (does nothing)

Yes

No (throws exception)

Yes

System.Security.Cryptography.CryptoStream

Yes

Yes

Yes

Yes

Yes

Yes

After Stream, the most important .NET I/O type is TextReader. TextReader is optimized for reading characters from a Stream, and provides a level of specialization one step beyond Stream. Unlike Stream, which provides access to data at the level of bytes, TextReader provides string-oriented methods such as ReadLine( ) and ReadToEnd( ). Like Stream, TextReader is also an abstract base class; its subclasses include StreamReader and StringReader.

Most .NET XML types receive their input from Stream or TextReader. You can often pass filenames and URLs directly to their constructors and Load( ) methods; however, you'll sometimes find it necessary to manipulate a data source before dealing with its XML content. For that reason, I talk first about handling Files and Streams before delving into XML.

2.1.1 Filesystem I/O

.NET provides two types that allow you to deal directly with files: File and FileInfo. A FileInfo instance represents an actual file and its metadata, but the File object contains only static methods used to manipulate files. That is, you must instantiate a FileInfo object to access the contents of the file as well as information about the file, but you can call File's static methods to access files transiently.

The following C# code snippet shows how you can use FileInfo to determine the length of a file and its latest modification date. Note that both Length and LastAccessTime are properties of the FileInfo object:

// Create an instance of File and query it 
FileInfo fileInfo = new FileInfo(@"C:\data\file.xml");
long length = fileInfo.Length;
DateTime lastAccessTime = fileInfo.LastAccessTime;

Since the FileInfo and File types are contained in the System.IO namespace, to compile a class containing this code snippet you must include the following using statement:

using System.IO;

I skip the using statements in code snippets, but I include them in full code listings.

You can also use the File type to get the file's last access time, but you cannot get the file's length this way. The GetLastAccessTime( ) method returns the last access time for the filename passed to it, but there is no GetLength( ) method equivalent to the FileInfo object's Length property:

// Get the last access time of a file transiently
DateTime lastAccessTime = File.GetLastAccessTime(@"C:\data\file.xml");

In C#, as in many programming languages, the backslash character (\) has special meaning within a string. In C#, you can either double up on the backslashes to represent a literal backslash within a string, or precede the string with an at sign character (@), as I've done, to indicate that any backslashes within the string are to be treated literally.


In general, you should use the File class to get or set the attributes of a file that can be obtained from the operating system, such as its creation and last access times; to open a file for reading or writing; or to move, copy, or delete a file. You may want to use the FileInfo class when you wish to open a file for reading or writing, and hold on to it for a longer period of time. Or you may just skip the File and FileInfo classes and construct a FileStream or StreamReader directly, as I show you later.

You may read the contents of a file by getting a FileStream for it, via the File or FileInfo classes' OpenRead( ) methods. FileStream, one of the subclasses of Stream, has a Read( ) method that allows you to read characters from the file into a buffer.

The following code snippet opens a file for reading and attempts to read up to 1024 bytes of data into a buffer, echoing the text to the console as it does so:

Stream stream = File.OpenRead(@"C:\data\file.xml");
int bytesToRead = 1024;
int bytesRead = 0;
byte [ ] buffer = new byte [bytesToRead];

// Fill up the buffer repeatedly until we reach the end of file
do {
  bytesRead = stream.Read(buffer, 0, bytesToRead);
  Console.Write(Encoding.ASCII.GetChars(buffer,0, bytesRead));
} while (bytesToRead == bytesRead);
stream.Close( );

The Encoding class is contained in the System.Text namespace. Encoding provides several useful methods for converting strings to byte arrays and byte arrays to strings. It also knows about several common encodings, such as ASCII. I'll talk more about encodings in Chapter 3.


Another way to access the data from a file is to use TextReader. File.OpenText( ) returns an instance of TextReader, which includes methods such as ReadLine( ), which lets you read an entire line of text from Stream at a time, and ReadToEnd( ), which lets you read the file's entire contents in one fell swoop. As you can see, TextReader makes for much simpler file access, at least when the file's contents can be dealt with as text:

TextReader reader = File.OpenText(@"C:\data\file.xml");

// Read a line at a time until we reach the end of file
while (reader.Peek( ) != -1) {
  string line = reader.ReadLine( );
  Console.WriteLine(line);
}
reader.Close( );

The Peek( ) method reads a single character from the Stream without moving the current position. Peek( ) is used to determine the next character which would be read without actually reading it, and it returns -1 if the next character is the end of the Stream. Other methods, such as Read( ) and ReadBlock( ), allow you to access the file in chunks of various sizes, from a single byte to a block of user-defined size.

So far, I've used types from the System, System.IO, and System.Text namespaces without specifying the namespaces, for the sake of brevity. In reality, you'll need to either specify the fully-qualified namespace for each class as it's used, or include a using statement in the appropriate place for each namespace.


2.1.2 Network I/O

Network I/O is generally similar to file I/O, and both Stream and TextReader types are used to access to data from a network connection. The System.Net namespace contains additional classes that are useful in dealing with common network protocols such as HTTP, while the System.Net.Sockets namespace contains generalized classes for dealing with network sockets.

To create a connection to a web server, you will typically use the abstract WebRequest class and its Create( ) and GetResponse( ) methods. Create( ) is a static factory method that returns a new instance of a subclass of WebRequest to handle the URL passed in to Create( ). GetResponse( ) returns a WebResponse object, which provides a method called GetResponseStream( ). The GetResponseStream( ) method returns a Stream object, which you can wrap in a TextReader. As you've already seen, you can use a TextReader to read from an I/O stream.

The following code snippet shows a typical sequence for creating a connection to a network data source and displaying its contents to the console device. StreamReader is a concrete implementation of the abstract TextReader base class:

WebRequest request = WebRequest.Create("http://www.oreilly.com/");
WebResponse response = request.GetResponse( );
Stream stream = response.GetResponseStream( );
StreamReader reader = new StreamReader(stream);

// Read a line at a time and write it to the console
while (reader.Peek( ) != -1) {
  Console.WriteLine(reader.ReadLine( ));
}

A network connection isn't initiated until you call the GetResponse( ) method. This gives you the opportunity to set other properties of the WebRequest right up until the time you make the connection. Properties that can be set include the HTTP headers, connection timeout, and security credentials.


This pattern works fine when the data source is a URL that adheres to the file, http, or https scheme. Here's an example of a web request that uses a URL with a file scheme:

WebRequest request = WebRequest.Create("file:///C:/data/file.xml");

Here's a request that has no URL scheme at all:

WebRequest request = WebRequest.Create("file.xml");

In the absence of a valid scheme name at the beginning of a URL, WebRequest assumes that you are referring to a file on the local filesystem and translates the filename to file://localhost/path/to/file. On Windows, the path C:\data\file.xml thus becomes the URL file://localhost/C:/data/file.xml. Technically, a URL using the file scheme does not require a network connection, but it behaves as if it does, as far as .NET is concerned. Therefore, your code can safely treat a file scheme URL just the same as any other URL. (For more on the URL file scheme, see http://www.w3.org/Addressing/URL/4_1_File.html.)

Don't try this with an ftp URL scheme, however. While there's nothing to stop you from writing your own FTP client using the Socket class, Microsoft does not provide a means to access an FTP data source with a WebRequest.

One difference between file URLs and http URLs is that a file on the local filesystem can be opened for writing, whereas a file on a web server cannot. When using file and http schemes interchangeably, you should try to be aware of what resources your code is trying to access.


2.1.3 Network Access Through a Web Proxy

Another useful feature of the WebRequest class is its ability to read data through a web proxy. A web proxy is a server located on the network between your code and a web server. Its job is to intercept all traffic headed for the web server and attempt to fulfill as many requests as it can without contacting the web server. If a web proxy cannot fulfill a request itself, it forwards the request to the web server for processing.

Web proxies serve two primary purposes:


Improving performance

A proxy server can cache data locally to speed network performance. Rather than sending two identical requests from different clients to the same web resource, the results of the first request are saved, and sent back to any other clients requesting the same data. Typical web proxies have configurable parameters that control how long cached data is retained before new requests are sent on to the web server. The HTTP protocol can also specify this cache refresh period. Many large online services, such as America Online, use caching to improve their network performance.


Filtering

A proxy server can be used to filter access to certain sites. Filtering is usually used by businesses to prevent employees from accessing web sites that have no business-related content, or by parents to prevent children from accessing web sites that may have material they believe is inappropriate. Filters can be as strict or loose as necessary, preventing access to entire IP subnets or to single URLs.

The .NET Framework provides the WebProxy class to help you incorporate the use of web proxy servers into your application. WebProxy is an implementation of IWebProxy, and can only be used to proxy HTTP and HTTPS (secure HTTP) requests. It's important that you know the type of URL you are requesting data from: casting a FileWebRequest to an HttpWebRequest will cause an InvalidCastException to be thrown.

To make use of a proxy server that is already set up on your network, you first create the WebRequest just as before. You can then instantiate a WebProxy object, set the address of the proxy server, and set the Proxy( ) property of WebRequest to link the proxy server to the web server. The WebProxy constructor has many overloads for many different situations. In the following example, I'm using a constructor that lets me specify that the host name of the proxy server is http://proxy.mydomain.com. Setting the constructor's second parameter, BypassOnLocal, to true causes local network requests to be sent directly to the destination, circumventing the proxy server:

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://www.oreilly.com/");
request.Proxy = new WebProxy("http://proxy.mydomain.com",true);

Any data that goes through WebRequest to a destination external to the local network will now use the proxy server.

Why is this important? Imagine that you wish to read XML from an external web page, but your network administrator has installed a web proxy to speed general access and prevent access to some specific sites. Although the XmlTextReader has the ability to read an XML file directly from a URL, it does not have the built-in ability to access the web through a web proxy. Since XmlTextReader can read data from any Stream or TextReader, you now have the ability to access XML documents through the proxy. In the next section, I'll tell you more about the XmlReader class.

    [ Team LiB ] Previous Section Next Section