dot NET and XML-dot NET and XML

4.1 Reading Non-XML Documents with XmlReader

To read any sort of document using a non-XML format as though it were XML, you can extend XmlReader by writing a custom XmlReader subclass. Among the advantages of writing your own XmlReader subclass is that you can use your custom XmlReader wherever you would use any of the built-in XmlReaders. For example, even if the underlying data isn't formatted using standard XML syntax, you can pass any instance of a custom XmlReader to XmlDocument.Load( ) to load the XML document into a DOM (more on XmlDocument in Chapter 5). You could load a DOM tree from the data, use XPath to query the data, even transform the data with XSLT, all this even though the original data does not look anything like XML.

As long as an alternative syntax provides a hierarchical structure similar to XML, you can create an XmlReader for it that presents its content in a way that looks like XML. In this chapter you'll learn how to write a custom XmlReader implementation which will enable you to read data formatted in PYX, a line-oriented XML format, as if it were XML.

4.1.1 Reading a PYX Document

Before you can write an XmlPyxReader, you first need to understand PYX syntax. PYX is a line-oriented XML syntax, developed by Sean McGrath, which reflects XML's SGML heritage. PYX is based on Element Structure Information Set (ESIS), a popular alternative syntax for SGML.

Unlike many of the terms in this book, PYX is not an acronym for anything. A pyx is is a container used in certain religious rites, and the PYX notation was developed mostly using the Python programming language.

In a line-oriented format, each XML node occurs on a new line. The XML nodes that PYX can represent include start element, end element, attribute, character data, and processing instruction. The first character of each line indicates what sort of node the line represents. Table 4-1 shows the prefix characters and what node type each represents.

Table 4-1. PYX prefix characters and their corresponding XmlNodeType values

PYX prefix character

XmlNodeType value

(

Element

)

EndElement

A

Attribute

-

Text

?

ProcessingInstruction

As you can see by the limited number of node types it contains, PYX represents only the logical structure of an XML document, not the physical structure. There are no DocumentType, EntityReference, Comment, or CDATA XmlNodeTypes in a PYX document. This lack of certain nodes is consistent with PYX's ESIS ancestry; in SGML, the separation between document structure and document content is enforced more rigidly than in XML.

None of this should stop you from using PYX to represent basic XML documents. In fact, PYX's structure makes it very easy to parse using the XmlReader model.

To test your XmlPyxReader, you'll need a file in PYX format. Example 4-1 shows the same purchase order we dealt with in Chapter 2, reformatted in PYX. A few lines are highlighted; I'll discuss these after the example.

Example 4-1. A purchase order expressed in PYX

(po 
Aid PO1456
(date
Ayear 2002
Amonth 6
Aday 14
)date
(address 
Atype shipping
(name 
-Frits Mendels
)address
(street
-152 Cherry St
)street
(city
-San Francisco
)city
(state
-CA
)state
(zip
-94045
)zip
)address
(address 
Atype billing
(name
-Frits Mendels
)name
(street
-PO Box 6789
)street
(city
-San Francisco
)city
(state
-CA
)state
(zip
-94123-6798
)zip
)address
(items 
(item 
Aquantity 1
AproductCode R-273
Adescription 14.4 Volt Cordless Drill
AunitCost 189.95
)item
(item 
Aquantity 1
AproductCode 1632S
Adescription 12 Piece Drill Bit Set
AunitCost 14.95
)item
)items
)po

Notice that all the data matches the data from Example 2-1, although the format is clearly very different.

Each line that begins with ( is a start element, as in the first highlighted line:

(po

This is equivalent to the <po> element start tag. The next highlighted line is an attribute:

Ayear 2002

This is equivalent to year="2002" in standard XML syntax. After the A, the next whitespace-delimited word is the name of the attribute, and the rest of the line contains the attribute value. Multiple attributes on the same element are just listed in order, on separate lines.

Although PYX doesn't really support XML namespaces, there's no reason you can't recognize them yourself. The following PYX fragment shows a way to represent namespaces in PYX:

(myElement
Axmlns http://www.mynamespaceuri.com/
Axmlns:foo http://www.anothernamespaceuri.com/
)myElement

That PYX fragment is equivalent to the following XML fragment:

<myElement xmlns="http://www.mynamespaceuri.com/" xmlns:foo="
http://www.anothernamespaceuri.com/" />

The next highlighted line in Example 4-1 is an EndElement node:

)date

The name of the element is given after the ) prefix character. This is equivalent to the </date> end tag. Note that there is no PYX shorthand for an empty element, like <item />.

The last highlighted line is text:

-Frits Mendels

After the -, the rest of the line contains the element's text value. Because only the prefix character on any line is significant, the rest of the line can contain any characters, including the PYX prefix characters (, A, -, ), and ?, and XML reserved characters <, >, and &. CDATA sections are thus irrelevant in PYX.

PYX is a fairly simple format, and XmlPyxReader will be correspondingly simple. Writing a more complex XmlReader is certainly possible, but it would take several chapters' worth of examples to show all the details. If, after reading this chapter, you're interested in a considerably more complex model for writing XmlReader subclasses, I urge you to read Ralf Westphal's article, "Implementing XmlReader Classes for Non-XML Data Structures and Formats." You can view the article online at http://msdn.microsoft.com/library/en-us/dndotnet/html/Custxmlread.asp.

4.1.2 Writing an XmlPyxReader

To read a PYX file, you need to write a subclass of XmlReader. The basic process for writing a subclass of XmlReader follows.

First, you'll want to write a skeleton class that implements all the abstract properties and methods of XmlReader. Initially, you'll want to stub them out so that you can make sure your code can always be compiled, even though it may not be fully functional yet. I recommend having the stub methods and properties throw a NotImplementedException rather than returning a default value, so that you don't depend on the some default value that the unfinished stub code returns. Returning a default value might fool you into thinking that the code is working properly when all it's doing is returning some hard-coded value!
Next, you need to define the underlying mechanism that your XmlReader subclass will use to traverse its data source. Although it appears to the user that the XmlReader.Read( ) method moves the node pointer to the next node, what that really means in terms of the XmlReader subclass's internal state may be completely different. This step may include defining a struct, a private class, or several data members to hold the reader's state.
You may find it useful to write some tests for the code so that you'll know how well your XmlReader subclass works. As part of your tests, you should read the equivalent data using XmlTextReader and your XmlReader subclass, to make sure they both behave in the same way.
Finally, you can fill in the stub properties and methods with real implementation code. Each time you implement a property or method, more and more tests should pass. Finally, when you've implemented all the properties and methods, all the tests should pass; thus you'll know that the implementation of your XmlReader subclass is complete.

I'll lead you through these steps in the sections that follow.

4.1.2.1 Writing the skeleton

The first step in writing any custom XmlReader is to create a class that derives from XmlReader and implements all of its abstract members. Example 4-2 shows a partial listing of the skeleton of an XmlPyxReader type. I've implemented the abstract properties and methods of XmlReader by causing each to throw a NotImplementedException. I'll go back and fill in this skeleton in a later step.

Example 4-2. The XmlPyxReader skeleton

using System;
using System.Xml;

public class XmlPyxReader : XmlReader {

  public override XmlNodeType NodeType { 
    get { throw new NotImplementedException( ); } 
  }

  public override string Name { 
    get { throw new NotImplementedException( ); } 
  }

  public override string LocalName { 
    get { throw new NotImplementedException( ); } 
  }

  public override string NamespaceURI { 
    get { throw new NotImplementedException( ); } 
  }

...

  public override string LookupNamespace(string prefix) { 
    throw new NotImplementedException( ); 
  }

  public override void ResolveEntity( ) { 
    throw new NotImplementedException( ); 
  }

  public override bool ReadAttributeValue( ) { 
    throw new NotImplementedException( ); 
  }
}

The full source code for the skeleton and the completed XmlPyxReader are available, along with all the other example files from the rest of the book, on the book's web site.

4.1.2.2 Defining the PYX traversal mechanism

Because XmlPyxReader reads PYX nodes, I've decided to define a private class that can be used to store the properties of each PYX node as it is read. The Node class in our implementation stores the name and value of each node read, its type, a list of its attribute names and their values, an index indicating which attribute has been read, and an indicator that shows whether the node represents an element whose close tag has been read. The last three fields of Node are referenced later in the program. They contain the values of three special attributes; xml:space, xml:lang, and xml:base.

The xml: prefix always maps to the URI http://www.w3.org/XML/1998/namespace.

Since Node uses IList and ArrayList types, you'll need to include a reference to the System.Collections and System.Collections.Specialized namespaces at the head of your source file. Here is the complete definition of the Node type:

private class Node {
  internal XmlNodeType nodeType = XmlNodeType.None;
  internal string name = string.Empty;
  internal string value = string.Empty;
  internal NameValueCollection attributes = new NameValueCollection( );
  internal int currentAttribute = -1;
  internal bool isEnd = false;
  internal XmlSpace xmlSpace = XmlSpace.Default;
  internal string xmlLang = System.Globalization.CultureInfo.CurrentCulture.
ThreeLetterISOLanguageName;
  internal string xmlBase = string.Empty;}

4.1.2.3 Storing the Node instance data

XmlPyxReader needs a way to read the data from a file and a place to store each PYX node in memory as it is read. When I define the constructors, I'll make sure that they all eventually funnel down to a TextReader, which I'll simply call reader:

private TextReader reader;

Because XML is hierarchical, it will be useful to have a Stack of Node objects to store the nodes. As a node is read from the PYX data, it is pushed onto the Stack, and when the node's end is reached, it is popped from the Stack:

private Stack nodes = new Stack( );

XmlPyxReader also requires a number of instance variables to store specific information that is returned by certain of the abstract methods derived from XmlReader. First, the ReadState enumeration is used to hold the state of the XmlReader:

private ReadState readState = ReadState.Initial;

Finally, the XmlNameTable, discussed in Chapter 2, holds atomized strings used to compare element and attribute names efficiently:

private XmlNameTable nameTable = new NameTable( );

A couple of private methods will also be useful. Keep in mind that every time you call XmlPyxReader.Read( ), you're reading another node from the underlying document. The next thing you'll want to do is to examine the properties of the node that's been read. Internally, you'll need a way to examine the node in order to return data to the user. For this purpose, a Peek( ) method will come in very handy. Calling Stack.Peek( ) on an empty Stack will cause an InvalidOperationException to be thrown, so this method should return a null instance to prevent that condition from arising:

private Node Peek( ) {
  Node node = null;
  if (nodes.Count > 0) {
    node = (Node)nodes.Peek( );
  }
  return node;

}

Similarly, removing the current Node from the Stack can be done with a Pop( ) method:

private Node Pop( ) {
  Node node = null;
  if (nodes.Count > 0) {
    node = (Node)nodes.Pop( );
  }
  return node;

}

The final private method, ReadAttributes( ), reads all attributes for the current element. It uses the TextReader object's Peek( ) and ReadLine( ) methods to check the first character of each line, and read the entire line if the prefix is A. Once it has read the line, it uses the string type's Substring( ) method to read the attribute's name and value, and adds them to the Node object's ArrayList variables. If the attribute name is xml:space, xml:lang, or xml:base, the value is stored in the appropriately named field of the Node so that it can be accessed by an XmlReader property.

Once ReadAttributes( ) has read all the attributes, if the first character of the next line is ), the element must be empty, and its Node's isEnd field can be set to true; a prefix of - or ( would indicate that the element either had character content or sub-elements. Finally, the method calls ReadLine( ) one last time to consume the close tag:

private void ReadAttributes( ) {
  Node node = Peek( );
  while (reader.Peek( ) == 'A') {
    string line = reader.ReadLine( );
    string key = line.Substring(1, line.IndexOf(" ") - 1);
    string value = line.Substring(line.IndexOf(" ") + 1);
    node.attributes.Add(key,value);
    nameTable.Add(key);
    if (key == "xml:space") {
      if (value == "default") {
        node.xmlSpace = XmlSpace.Default;
      } else if (value == "preserve") {
        node.xmlSpace = XmlSpace.Preserve;
      }
    }
    if (key == "xml:lang") {
      node.xmlLang = value;
    }
    if (key == "xml:base") {
      node.xmlBase= value;
    }      
  }
  if (reader.Peek( ) == ')') {
    node.isEnd = true;
    reader.ReadLine( );
  }
}

The ReadAttributes( ) method is your first chance to see an XmlNameTable in action. Although XmlPyxReader doesn't do anything earth-shattering with its XmlNameTable, it does add attribute and element names to the table as they are encountered. You'll see another use of the XmlNameTable shortly in the Read( ) method. As I mentioned in Chapter 2, the XmlNameTable can be used by other XML classes in .NET, so maintaining the table is worthwhile.

4.1.2.4 Writing the tests

There are several useful tests that I can think of to ensure that XmlPyxReader is working correctly. You could:

Use the XmlPyxReader to load an XmlDocument and print it out to the console. The resulting XML document should be equivalent to the original PYX document, but in standard XML syntax. Since I won't introduce XmlDocument until Chapter 5, I won't use this one yet.
Use the Microsoft.XmlDiffPatch.XmlDiff type to compare an original XML document, read with an XmlTextReader, to a PYX document, read with XmlPyxReader. To learn about XmlDiff and the Microsoft.XmlDiffPatch namespace, see Chapter 13.
Simply read the PYX document and write it to the console in a simple-to-understand, non-XML format. This is the easiest way to test the code, and this is the method I use.

Example 4-3 shows a very simple test program that uses the third approach to read data from the PYX document, and write it to the console, one node at a time. I've highlighted some lines, and I'll discuss them in a moment.

Example 4-3. Source code for ReadToConsole test class

using System;
using System.IO;
using System.Xml;

public class ReadToConsole {
   
  public static void Main(string [ ] args) {

    string filename = args[0];
    using (TextReader textReader = File.OpenText(filename)) {
      XmlReader reader = null;
      string extension = Path.GetExtension(filename);
      switch (extension) {
        case ".pyx":
          reader = new XmlPyxReader(textReader);
          break;
        case ".xml":
          XmlTextReader xmlReader = new XmlTextReader(textReader);
          xmlReader.WhitespaceHandling = WhitespaceHandling.None;
          reader = xmlReader;
          break;
        default:
          Console.Error.WriteLine("unknown file type: {0}", extension);
          Environment.Exit(1);
          break;
      }
      while (reader.Read( )) {
        Console.WriteLine("NodeType={0} Name=\"{1}\" Value=\"{2}\"", 
          reader.NodeType, reader.Name, reader.Value);
      }
    }
  }
}

In the highlighted lines, I'm using Path.GetExtension( ) to get the 4-letter extension of the filename passed in to the Main( ) method. The program should behave exactly the same way no matter what sort of XmlReader subclass is used. I'm using the file extension to determine whether I'm reading a PYX document or a standard XML document, and instantiating the appropriate XmlReader subclass. In the case of standard XML, I'm additionally setting the WhitespaceHandling property to WhitespaceHandling.None so that any empty lines or carriage returns that might clutter the output aren't printed. The beginning of the expected output for the XML file po1456.xml is shown below:

NodeType=XmlDeclaration Name="xml" Value="version="1.0""
NodeType=DocumentType Name="po" Value=""
NodeType=Element Name="po" Value=""
NodeType=Element Name="date" Value=""
NodeType=Element Name="address" Value=""
NodeType=Element Name="name" Value=""
...

4.1.2.5 Filling in the stubs

Now that the infrastructure has been created and the test program has been written, you can begin to implement the public properties and methods required for an XmlReader.

The XmlReader base class does not require any particular constructors, but I want XmlPyxReader to be able to accept TextReader, Stream and string types as input. This requirement calls for three constructors: one that takes a TextReader, one that takes a Stream, and one that takes a string. These inputs give you the flexibility to read data from any source, whether it is a local file, a network resource, or a buffer in memory. In my implementation, each of the constructors initializes the reader instance variable. reader is then used to read data from the underlying data source. Here are the three constructors:

public XmlPyxReader(TextReader reader) {
  this.reader = reader;
}

public XmlPyxReader(Stream stream) {
  reader = new StreamReader(stream);
}

public XmlPyxReader(string source) {
  reader = new StringReader(source);
}

The Stream and string constructors are interesting because they both still allow you to use the TextReader internally. The former creates a new StreamReader around the Stream, while the latter creates a new StringReader for the PYX content. Now the rest of the code doesn't care where the data came from originally; it's all a TextReader internally.

The next step is to implement some of the abstract XmlReader properties. NodeType should return the XmlNodeType of the current node. However, it's not quite as simple as returning the current Node object's NodeType; you must account for the attributes and the special XmlNodeType values EndElement and None. The NodeType property uses the Peek( ) method defined earlier.

Note that before it does anything, NodeType checks to make sure that the XmlPyxReader is in the Interactive ReadState. Many of the other properties and methods will also check the ReadState. The ReadState will be set later, in the Read( ) method:

public override XmlNodeType NodeType { 
  get {
    if (readState != ReadState.Interactive || nodes.Count <= 0)
      return XmlNodeType.None;
    
    Node node = Peek( );
    XmlNodeType nodeType = node.nodeType;
    if (node.currentAttribute > -1 && 
      node.currentAttribute < node.attributes.Count) {
      nodeType = XmlNodeType.Attribute;
    } else if (node.value != null && node.value != string.Empty) {
        nodeType = XmlNodeType.Text;
      } else if (node.isEnd) {
        nodeType = XmlNodeType.EndElement;
    }
    return nodeType;
  }
}

The Name property returns the name of the current node, whether it's an element or an attribute:

public override string Name { 
  get {
      if (readState != ReadState.Interactive || nodes.Count <= 0)
        return string.Empty;
      
      Node node = Peek( );
      string name = node.name;
      if (NodeType == XmlNodeType.Attribute) {
        name = node.attributes.AllKeys[node.currentAttribute];
      }
      return name;
  } 
}

As I demonstrated earlier, PYX can support namespaces in a roundabout way. LocalName will just call Name, and determine if there is a namespace prefix using the string.IndexOf( ) and Split( ) methods. Prefix uses a similar method to return the current node's namespace prefix. Note that since these properties call the Name property, there's no need to check the ReadState; it'll be checked within the Name property:

public override string LocalName { 
  get { 
    int index = Name.IndexOf(':');
    if (index > -1) {
      return Name.Split(':')[1];
    } else {
      return Name;
    }
  } 
}

public override string Prefix { 
  get {
    int index = Name.IndexOf(':');
    if (index > -1) {
      return Name.Split(':')[0];
    } else {
      return string.Empty;
    }
  }
}

Because of the unusual namespace handling in XmlPyxReader, the NamespaceURI property doesn't really need to return anything useful. In an XmlReader that handled namespaces properly, you'd want to return the real namespace URI here:

public override string NamespaceURI { 
  get { return string.Empty; } 
}

The BaseURI, XmlSpace, and XmlLang properties, which I described earlier, will return a default value, or the value of the relevant field for the current Node:

public override string BaseURI { 
  get { 
    if (readState == ReadState.Interactive && nodes.Count > 0) {
      return Peek( ).xmlBase; 
    } else {
      return string.Empty;
    }
  }
}

public override XmlSpace XmlSpace {
  get { 
    if (readState == ReadState.Interactive && nodes.Count > 0) {
      return Peek( ).xmlSpace; 
    } else {
      return XmlSpace.Default;
    }
  }
}

public override string XmlLang {
  get { 
    if (readState == ReadState.Interactive && nodes.Count > 0) {
      return Peek( ).xmlLang; 
    } else {
      return System.Globalization.CultureInfo.CurrentCulture.ThreeLetterISOLanguageName;
    }
  }
}

The NameTable and ReadState properties will simply return the values of the nameTable and readState instance variables, respectively:

public override XmlNameTable NameTable { 
  get { return nameTable; }
}

public override ReadState ReadState { 
  get { return readState; } 
}

The Depth property returns the depth of the current node in the document. If the current node is an element, Depth will be the number of nodes in the stack. If, however, the current node is an attribute, you must add the current attribute's position to obtain the true depth. Conveniently, the current attribute's position is stored in the Node:

public override int Depth { 
  get {
    if (readState != ReadState.Interactive)
      return 0;
    
    int depth = nodes.Count;
    Node node = Peek( ); 
    if (node != null && node.currentAttribute != -1) {
      depth += node.currentAttribute;
    }
    return depth;
  }
}

The Value property returns the value of either a text node or an attribute. Since the Node type keeps track of its current attribute, you can use Node to determine whether the reader is currently positioned on an attribute, and return the value accordingly:

public override string Value { 
  get { 
    Node node = Peek( );
    if (readState == ReadState.Interactive || node == null) {
      return string.Empty;
    }
    string value = node.value;
    if (node.currentAttribute > -1 
      && node.currentAttribute < node.attributes.Count) {
      value = node.attributes[node.currentAttribute];
    }
  }
  return value;
}

The HasValue property simply indicates whether the current node has a value:

public override bool HasValue { 
  get { 
    if (readState != ReadState.Interactive)
      return false;
    
    Node node = Peek( );
    return node.value != string.Empty;
  }
}

The IsEmptyElement property indicates whether the current node is an empty element; that is, an element of the form <element/>. In PYX, elements are never empty, so implementing IsEmptyElement is particularly easy:

public override bool IsEmptyElement { 
  get { return false; }
}

The Node type's attributes collection holds the current element's attribute names and values, This means you can fill in the various attribute-related methods. The AttributeCount property simply returns the number of attributes for the current node:

public override int AttributeCount {
  get { 
    if (readState != ReadState.Interactive)
      return 0;
    
    Node node = Peek( );
    int count = 0;
    if (node != null) {
      count = node.attributes.Count;
    }
    return count;
  }
}

That takes care of XmlReader's abstract properties. Next I'll start implementing the methods.

The various GetAttribute( ) method overloads and the various indexers all do very similar things. Many of them can be factored out to their most basic level, which is to return one specific attribute value, based either on the attribute's name or its index. The NameValueCollection type is ideally suited for this sort of thing:

public override string GetAttribute(string name) {
  Node node = Peek( );
  if (node == null || readState != ReadState.Interactive)
    return string.Empty;
  else
    return node.attributes[name];}

public override string GetAttribute(string name, string namespaceURI) {
  return GetAttribute(name);
}

public override string GetAttribute(int i) {
  Node node = Peek( );
  if (node == null || readState != ReadState.Interactive)
    return string.Empty;
  else
    return node.attributes[i];}

public override string this[int i] { 
  get { return GetAttribute(i); }
}

public override string this[string name] { 
  get { return GetAttribute(name); } 
}

public override string this[string name, string namespaceURI] { 
  get { return GetAttribute(name, namespaceURI); }
}

The this property is called an indexer. An indexer is a special sort of property that takes parameters. In C#, the parameters are enclosed in square brackets.

Indexers are used when the class contains a collection of some sort. In the case of an XmlReader, the indexers reference the collection of attributes for the current node. I included the indexers with the GetAttribute( ) methods because its behavior is more similar to a GetAttribute( ) method than to any of the other properties. In fact, you can see from the code that this really is just a proxy for GetAttribute( ).

The various MoveToAttribute( ), MoveToFirstAttribute( ), and MoveToNextAttribute( ) methods move the current attribute pointer to the specified attribute. If the attribute doesn't exist, some of these methods will return false:

public override bool MoveToAttribute(string name) { 
  Node node = Peek( );
  if (node == null || readState != ReadState.Interactive)
    return false;

  string value = node.attributes[name];
  if (value != null) {
    MoveToAttribute(Array.IndexOf(node.attributes.AllKeys, name));
    return true;
  } else {
    return false;
  }
}

public override bool MoveToAttribute(string name, string namespaceURI) { 
  return MoveToAttribute(name);
}

public override void MoveToAttribute(int i) {
  if (readState != ReadState.Interactive)
    return;

  Node node = Peek( );
if (i < node.attributes.Count)
    node.currentAttribute = i;
}

public override bool MoveToFirstAttribute( ) {
  if (readState != ReadState.Interactive)
    return false;
    
  Node node = Peek( );
  if (node.attributes.Count > 0) {
    MoveToAttribute(0);
    return true;
  } else {
    return false;
  }
}

public override bool MoveToNextAttribute( ) { 
  if (readState != ReadState.Interactive)
    return false;
    
  Node node = Peek( );
  if (node.attributes.Count > node.currentAttribute) {
    node.currentAttribute++;
    return true;
  } else {
    return false;
  }
}

If the current node is an attribute, the MoveToElement( ) method moves the XmlReader's current node pointer to the element containing the current attribute and returns true. Otherwise, MoveToElement( ) returns false:

public override bool MoveToElement( ) { 
  if (readState != ReadState.Interactive)
    return false;
    
  Node node = Peek( );
  if (node.currentAttribute != -1) {
    node.currentAttribute = -1;
    if (node.isEnd) 
      node.isEnd = false;
    return true;
  } else {
    return false;
  }
}

Next is Read( ), one of the most complex methods in the XmlPyxReader class. This method reads a line from the PYX document, and takes various actions based on the first character of the line. I'll give the method definition, with comments interspersed:

public override bool Read( ) {

To begin with, you need to set the ReadState to Interactive, indicating that document reading is underway:

if (readState == ReadState.Initial) {
  readState = ReadState.Interactive;
}

The next step is to look at the previous Node on the Stack. If there is one, and it's been marked as having ended, and it either has text value or the next line in the TextReader is a start element, it should be removed from the Stack. This logic may seem strange, but the reason for it is that you need to know when NodeType should return EndElement; EndElement is only returned when a non-empty element is encountered. A non-empty element is one which has text value or a child element. By popping the current Node, you prevent an EndElement XmlNodeType from being returned:

Node node = Peek( );
if (node != null && node.isEnd && 
    (node.value != string.Empty || reader.Peek( ) == '(')) {
    Pop( );
}

only now can you begin reading lines from the TextReader using ReadLine( ):

string line = reader.ReadLine( );

As each line is read, you need to check to see if it's null or empty. If it's null, the TextReader has reached the end of the Stream. If the line is empty, there's something wrong with the PYX data, because every line in a PYX file must have at least a prefix. Either way, this signals the end of the file, as you're only interested in lines with content:

if (line != string.Empty && line != null) {

Now you need to examine the first character of each line. As noted in Table 4-1, an open parenthesis (() in the prefix indicates the beginning of an element; in this case, you can create a new element Node, set its name, and read its attributes using the private ReadAttributes( ) method I defined earlier. This is also a good place to add the element name to the XmlNameTable. Finally, don't forget to push the Node onto the stack:

switch (line[0]) {
  case '(':
    Node elementNode = new Node( );
    elementNode.nodeType = XmlNodeType.Element;
    elementNode.name = line.Substring(1).Trim( );
    nameTable.Add(elementNode.name);
    nodes.Push(elementNode);
    ReadAttributes( );
    break;

If the line's prefix is the close parenthesis ()), the line represents an element's closing tag. Because the element has ended, you can pop it off the Stack:

case ')':
  Pop( );
  node = Peek( ); 
  break;

A hyphen (-) in the line's prefix indicates a text node. You should create a text Node to hold it, and instantiate a StringBuilder to accumulate the text. Because a text line could be followed by any number of additional text lines, you should accumulate each of these text lines in the StringBuilder. Finally, when the first character of the next line is not a hyphen, you can set the text Node's value to the accumulated value of the StringBuilder and push the Node onto the stack:

case '-':
  Node textNode = new Node( );
  textNode.nodeType = XmlNodeType.Text;
  StringBuilder text = new StringBuilder( );
  text.Append(line.Substring(1));
  while (reader.Peek( ) == '-') {
    line = reader.ReadLine( );
    text.Append(line.Substring(1));
  }
  node.isEnd = true;
  textNode.value = text.ToString( );
  nodes.Push(textNode);
  break;

If the first character of the line is a question mark (?), the line represents a processing instruction. You should create a new Node, set its name and value, and push it onto the stack. For our purposes, the name or target of the PI is everything before the first whitespace, and the data is everything after it:

case '?':
  Node piNode = new Node( );
  piNode.nodeType = XmlNodeType.ProcessingInstruction;
  piNode.name = line.Substring(1,line.IndexOf(' '));
  piNode.value = line.Substring(line.IndexOf(' '));
  nodes.Push(piNode);
  break;

Any other case is considered an error. You should set the XmlPyxReader's readState to ReadState.Error and return false. All the other cases are fine, and the Read( ) method should return true:

  default:
    readState = ReadState.Error;
    return false;
}
return true;

The last step is to handle the cases where a null or empty line was read from the PYX Stream. These cases should indicate the end of the Stream, so you should set the XmlPyxReader's readState to ReadState.EndOfFile and return false:

  } else {
    readState = ReadState.EndOfFile;
    return false;
  }      
}

The EOF property returns true if the XmlPyxReader is positioned at the end of the Stream. Since you already know that you've set readState to ReadState.EndOfFile if the reader is at the end of the Stream, you can use that knowledge here:

public override bool EOF { 
  get { return readState == ReadState.EndOfFile; }
}

The Close( ) method closes the underlying TextReader and sets the readState instance variable to ReadState.Closed:

public override void Close( ) { 
  reader.Close( );
  readState = ReadState.Closed;
}

The ReadString( ) method has different behavior depending on the current Node's XmlNodeType. If the Node is an element, this method will read lines from the Stream as long as they begin with a hyphen. It then sets the Node's value instance variable to the value read in, in much the same way that the Read( ) method did.

If, on the other hand, the current Node is already a text node, ReadString( ) simply returns the node's value. In all other cases, the method returns an empty string:

public override string ReadString( ) {
  if (readState != ReadState.Interactive)
    return string.Empty;

  Node node = Peek( );
  switch (node.nodeType) {
    case XmlNodeType.Element:
      StringBuilder text = new StringBuilder( );
      while (reader.Peek( ) == '-') {
        string line = reader.ReadLine( );
        text.Append(line.Substring(1));
      }
      node.value = text.ToString( );
      if (reader.Peek( ) == ')') {
        string line = reader.ReadLine( );
        Pop( );
        node = Peek( ); 
        if (node != null) {
          node.isEnd = true;
        }
      }
      return node.value;
    case XmlNodeType.Text:
      return node.value;
    default:
      return string.Empty;
  }
}

Since all the attributes are read when Read( ) reads an element, this implementation of ReadAttributeValue( ) does not actually read anything. It returns true if the current Node has attributes, and false otherwise:

public override bool ReadAttributeValue( ) {
  if (readState != ReadState.Interactive)
    return false;

  Node node = Peek( );
  if (node.nodeType == XmlNodeType.Attribute) {
    return true;
  } else {
    return false;
  }
}

All the remaining abstract properties and methods listed in Example 4-2 have no real meaning in XmlPyxReader, so you can let them keep their current implementation, which is to throw the NotImplementedException.

Finally, XmlReader has virtual methods and properties which you may chose to override. HasAttributes simply indicates whether the current node has attributes:

public override bool HasAttributes {
  get { 
    if (readState != ReadState.Interactive)
      return false;

    Node node = Peek( );
    return node.attributes.Count != 0; 
  }
}

Skip( ) moves the XmlReader's current position to the next sibling of the most recent element node. This is done simply by popping the current node:

public override void Skip( ) {
  if (readState != ReadState.Interactive)
    return;
  Pop( );
}

And that's it, you've just written XmlPyxReader. Now you're ready to test it.

4.1.3 Testing XmlPyxReader

You could have been using the ReadToConsole program to test your work as you were going along, and I certainly encourage that practice. Now that XmlPyxReader is done, though, you definitely should test it. Here's the output I got when I ran it:

NodeType=Element Name="po" Value=""
NodeType=EndElement Name="date" Value=""
NodeType=Element Name="address" Value=""
NodeType=Element Name="name" Value=""
NodeType=Text Name="" Value="Frits Mendels"
NodeType=EndElement Name="name" Value=""
NodeType=Element Name="street" Value=""
NodeType=Text Name="" Value="152 Cherry St"
NodeType=EndElement Name="street" Value=""
NodeType=Element Name="city" Value=""
NodeType=Text Name="" Value="San Francisco"
NodeType=EndElement Name="city" Value=""
NodeType=Element Name="state" Value=""
NodeType=Text Name="" Value="CA"
NodeType=EndElement Name="state" Value=""
NodeType=Element Name="zip" Value=""
NodeType=Text Name="" Value="94045"
NodeType=EndElement Name="zip" Value=""
NodeType=EndElement Name="address" Value=""
NodeType=Element Name="address" Value=""
NodeType=Element Name="name" Value=""
NodeType=Text Name="" Value="Frits Mendels"
NodeType=EndElement Name="name" Value=""
NodeType=Element Name="street" Value=""
NodeType=Text Name="" Value="PO Box 6789"
NodeType=EndElement Name="street" Value=""
NodeType=Element Name="city" Value=""
NodeType=Text Name="" Value="San Francisco"
NodeType=EndElement Name="city" Value=""
NodeType=Element Name="state" Value=""
NodeType=Text Name="" Value="CA"
NodeType=EndElement Name="state" Value=""
NodeType=Element Name="zip" Value=""
NodeType=Text Name="" Value="94123-6798"
NodeType=EndElement Name="zip" Value=""
NodeType=EndElement Name="address" Value=""
NodeType=Element Name="items" Value=""
NodeType=EndElement Name="item" Value=""
NodeType=EndElement Name="item" Value=""
NodeType=EndElement Name="items" Value=""
NodeType=EndElement Name="po" Value=""

Except for the absence of the XML and document declarations (which don't exist in PYX), this output looks just like the output using XmlTextReader. Since everything is working as expected, it's time to use XmlPyxReader in a real application.

4.1.4 Using XmlPyxReader

Luckily, you already have just such an application. In Chapter 2, I showed you how to write PoToPickList, which generates the PO pick list from an XML file. You can now plug XmlPyxReader in to PoToPickList to generate a pick list from a PYX document. Example 4-4 shows the Main( ) method of PoToPickList again, with the change highlighted.

Example 4-4. Main( ) method of PoToPickList, using XmlPyxReader

public static void Main(string[ ] args) {

  string filename = args[0];

  TextReader textReader = File.OpenText(filename);
  XmlReader reader = new XmlPyxReader(textReader);

  StringBuilder pickList = new StringBuilder( );
  pickList.Append("Angus Hardware PickList").Append(Environment.NewLine);
  pickList.Append("=======================").Append(Environment.NewLine).Append(
Environment.NewLine);

  while (reader.Read( )) {
    if (reader.NodeType == XmlNodeType.Element) {
      switch (reader.LocalName) {
        case "po":
          pickList.Append(POElementToString(reader));
          break;
        case "date":
          pickList.Append(DateElementToString(reader));
          break;
        case "address":
          reader.MoveToAttribute("type");
          if (reader.Value == "shipping") {
            pickList.Append(AddressElementToString(reader));
          } else {
            reader.Skip( );
          }
          break;
        case "items":
          pickList.Append(ItemsElementToString(reader));
          break;
      }
    }
  }

  Console.WriteLine(pickList);
}

If you run the PYX purchase order in Example 4-1 through PoToPicklist again, you'll see exactly the same results you saw in Chapter 2, reproduced here in Example 4-5.

Example 4-5. Output of PoToPickList, using XmlPyxReader

Angus Hardware PickList
=======================

PO Number: PO1456

Date: Friday, June 14, 2002

Shipping Address:
Frits Mendels
152 Cherry St
San Francisco, CA 94045

Quantity Product Code Description
======== ============ ===========
      1        R-273  14.4 Volt Cordless Drill
      1        1632S  12 Piece Drill Bit Set

[ Team LiB ]