dot NET and XML-dot NET and XML

8.2 Using the XSD Tool

Microsoft provides a tool with the .NET Framework SDK called XSD, the XML Schemas/DataTypes support utility. With this tool, you can generate schemas from source code, compiled assemblies, and XML documents, as well as generating source code in various .NET languages from schemas.

The XSD command-line syntax shown here is explained in Table 8-1:

xsd.exe filename.ext [argument [...]]

Table 8-1. XSD tool command-line syntax

Argument

Description

filename.ext

This argument specifies the name of the file to use as input. This can be either a CLR DLL (.dll) or executable file (.exe); an XML (.xml) file; or an XML Schema Definition (.xsd) or XML-Data-Reduced (.xdr) file. The type of the input file determines what other arguments are allowed.

/classes /c

When an XSD file is specified, generates source code for classes as determined by the schema. This argument is mutually exclusive with /dataset.

/dataset /d

When an XSD file is specified, generates source code for classes that are subclasses of System.Data.DataSet as determined by the schema. This argument is mutually exclusive with /classes.

/element:element /e:element

When an XSD file is specified, this argument specifies which elements to generate types for. /element may be specified more than once. The default is to generate types for all elements.

/language:language /l:language

When an XSD file is specified, this argument specifies the language to generate types in. language may be one of cs (C#), vb (Visual Basic.NET), or js (JScript). You may also specify the fully qualified name of any class that implements System.CodeDom.Compiler.CodeDomProvider. The default is cs.

/namespace:namespace /n:namespace

When an XSD file is specified, this argument specifies the namespace to create types in. The default is Schemas.

/outputdir:outputdir /o:outputdir

Specifies the directory in which to put output files. The default is the current directory. Generated schema files are named scheman.xsd, where n is a sequential number starting with 0. Generated source files are named with the schema filename and the appropriate extension for the language.

/uri:uri /u:uri

When an XSD file is specified, this argument specifies the URI for the elements in the schema to generate code for.

/type:type /t:type

When a DLL or EXE file is specified, this argument specifies what types to generate a schema for. If type is a fully qualified type name, the schema for that type is generated. If type is a type name without a namespace, schemas for all types with that name are generated. If type ends with a *, schemas are generated for all types with names starting with the name up to the *. /type may be specified more than once. The default is to generate schemas for all types.

/h /?

Prints information on how to use XSD.

/nologo

Suppresses printing of the XSD copyright statement and version information banner.

As you can see from the command-line syntax, xsd can be used to generate either source code or an XML Schema, based on a variety of inputs. It's a very useful tool for generating an XSD based on a .NET type or XML document you have already written, as well as generating source code for a .NET type based on an existing XSD.

When an XML-Data-Reduced (XDR) document is specified on the command line, xsd generates an equivalent XSD document. XDR was introduced by Microsoft and the University of Edinburgh in 1998, based on the earlier XML-Data standard which formed the basis for much of XML Schema.

The most likely reason anyone would need to convert an XDR to an XSD is to support an application using Microsoft's BizTalk Server. BizTalk Server is an application server supporting workflow and process management.

8.2.1 Generating a Schema from an XML Document

xsd can be used to generate a best-guess schema from any XML document. It will make certain assumptions about the structure of your document, based on the data found in the example you provide. For example, it will always set minOccurs to 1 and maxOccurs to unbounded for each element. It will also always use the xs:sequence compositor for lists of elements, even if your example XML document has elements in various orders. This can present the odd situation of the sample document used to generate the XSD failing validation with the XSD generated from it. Finally, the type attribute of each xs:element and xs:attribute element defaults to xs:string.

For these reasons, you should never take the generated XSD for granted. Always edit it to make sure it will fit your real requirements.

Using the purchase order document from Chapter 2, you can generate an XSD with the following command line:

xsd po1456.xml

You can go ahead and use XSD to generate the source code. I've already done so, and tweaked the generated code to ensure that this XSD validates the PO correctly. These edits are highlighted in Example 8-1. I intentionally introduced a couple of mistakes in my edits. I've done this to point out how XmlSchema validates an XSD, and I'll explain that more in a moment.

Example 8-1. Generated XSD for purchase orders

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="po">
    <xs:complexType>
      <xs:attribute name="id" type="xs:ID" />
      <xs:sequence>
        <xs:element name="date">
          <xs:complexType>        
            <xs:attribute name="year" type="xs:string" />
            <xs:attribute name="month" type="xs:string" />
            <xs:attribute name="day" type="xs:string" />
          </xs:complexType>
        </xs:element>
        <xs:element name="address" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string" msdata:Ordinal="0" />
              <xs:element name="street" type="xs:string" maxOccurs="3" msdata:Ordinal="1" />
              <xs:element name="city" type="xs:string" msdata:Ordinal="2" />
              <xs:element name="state" type="xs:string" msdata:Ordinal="3" />
              <xs:element name="zip" type="xs:string" msdata:Ordinal="4" />
            </xs:sequence>
            <xs:attribute name="type" type="xs:string" />
          </xs:complexType>
        </xs:element>
        <xs:element name="items" minOccurs="2" maxOccurs="1">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="item" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="quantity" type="xs:string" />
                  <xs:attribute name="productCode" type="xs:string" />
                  <xs:attribute name="description" type="xs:string" />
                  <xs:attribute name="unitCost" type="xs:string" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="NewDataSet" msdata:IsDataSet="true">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element ref="po" />
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

There are a few pieces of this generated XSD that you should note. First is the inclusion of the namespace prefix msdata in the attributes msdata:Ordinal and msdata:IsDataSet. The urn:schemas-microsoft-com:xml-msdata namespace provides hints to the DataSet class when serializing an XML instance to a database.

Second is the NewDataSet element itself. This is used when generating source code for the XSD with the /dataset flag; the resulting source code will provide the definition of a subclass of System.Data.DataSet.

I'll address both of these issues in depth in Chapter 9 and Chapter 11.

Given the generated XSD and the modifications to it, you can do two things. First, you can verify that it is a valid XML Schema after the changes. The program shown in Example 8-2 will do just that.

Example 8-2. Validation of an XML Schema

using System;
using System.IO;
using System.Xml.Schema;

public class ValidateSchema {
  public static void Main(string [ ] args) {
    ValidationEventHandler handler = new ValidationEventHandler(ValidateSchema.Handler);
    XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler);
    schema.Compile(handler);
  }

  public static void Handler(object sender, ValidationEventArgs e) {
    Console.WriteLine(e.Message);
  }
}

A ValidationEventHandler can be called in two places. The first, checking the XML Schema itself, happens on the following line:

XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler);

XmlSchema.Read( ) reads the content of the XSD from a Stream, TextReader, or XmlReader, and takes a ValidationEventHandler delegate as its second parameter; the ValidationEventHandler is covered in Chapter 2. Any XML validation errors that arise while reading in the file will be reported to the ValidationEventHandler.

It's important to note that the ValidationEventHandler handles two different aspects of checking a schema's content; checking whether it contains valid XML, and verifying whether it constitutes an acceptable XSD. In Example 8-2, I'm using the same ValidationEventHandler for both checks, but they could be two separate delegates.

The second phase, validating the content of the XSD, happens here:

schema.Compile(handler);

In this phase, the content of the XSD is checked to make sure that it is really a valid instance of XML Schema. Its errors will also be reported to the ValidationEventHandler. With the XSD in Example 8-1, running this validator will produce the following output:

C:\Chapter 8>ValidateSchema po.xsd
The content model of a complex type must consist of 'annotation'(if present) 
followed by zero or one of 'simpleContent' or 'complexContent' or 'group' or 'choice' 
or 'sequence' or 'all' followed by zero or more attributes or attributeGroups followed by 
zero or one anyAttribute. An error occurred at (6, 8).
minOccurs value cannot be greater than maxOccurs value. An error occurred at (25, 10).

Looking back, I made two mistakes. First, the id attribute of the po element is in the wrong place; the xsd:attribute element must come after the xsd:sequence element when defining an element. You can move the attribute into its proper place to avoid this error. This validation error was caught by the Read( ) method, because it is a case of the XML itself being invalid.

Granted, this error is a little contrived. xsd generated the elements in the correct order, but I moved the xsd:attribute element to make a point.

Second, the items element has minOccurs set to 3 and maxOccurs set to 1. In this case, the Compile( ) method caught my error, because the XSD was a well-formed XML document, although it did not constitute a sane XML Schema instance.

At the end of the program, you'll notice that the entire XSD is loaded. Although it is not valid, it sits in memory, ready to be used. Rather than editing the schema on disk, you could have used the XmlSchema type's methods to work with it and make it valid, as you'll see later in this chapter.

You can now use the generated XSD, with the changes to correct my errors, to validate the document that was used to generate it. Example 8-3 shows a program that validates an XML document with an XSD, with a couple of interesting lines highlighted.

Example 8-3. Validation of an XML file with an XML Schema

using System;
using System.IO;
using System.Xml;
using System.Xml.Schema;

public class Validate {

private static bool valid = true;

  public static void Main(string [ ] args) {

    using (Stream stream = File.OpenRead(args[0])) {
      XmlValidatingReader reader = new XmlValidatingReader(new XmlTextReader(stream));
      reader.ValidationType = ValidationType.Schema;
      reader.Schemas.Add("", args[1]);
      reader.ValidationEventHandler += new ValidationEventHandler(Handler);
      
      while (reader.Read( )) {
        // do nothing
      }
    }
    if (valid) {
      Console.WriteLine("Document is valid.");
    }
  }

public static void Handler(object sender, ValidationEventArgs e) {
    valid = false;
    Console.WriteLine(e.Message);
  }
}

Take a look at the lines that are highlighted in the example:

reader.ValidationType = ValidationType.Schema;

This line sets the XmlValidatingReader's ValidationType property to ValidationType.Schema. As I mentioned in the discussion of validation by DTD in Chapter 2, this alone is not enough to cause the document to be validated; the following line takes care of that:

reader.Schemas.Add("", args[1]);

This line adds the XSD whose name is passed in on the command line to the XmlSchemaCollection in XmlValidatingReader's Schemas property. XmlSchemaCollection is just what it sounds like, a collection of schemas. Its Add( ) method has four overloads. The one used here takes two strings; the first is the namespace URI to which the schema applies, and the second is the name of the XSD file which will be read. Other overloads allow you to add an XmlSchema instance, an XmlReader, or an entire XmlSchemaCollection to the list. The document will be validated with each schema in the XmlSchemaCollection:

while (reader.Read( )) {
  // do nothing
}

These lines read and validate the XML document. Once XmlValidatingReader is told to validate the document, all you have to do is read it and it will be validated. The while loop need not do anything else.

It's worth noting that, had you not validated my faulty XSD before attempting to validate an XML document with it, the same errors would have been found. There are two differences, however. First, only the first error would have been reported via an XmlSchemaException, rather than being handled with the ValidationEventHandler. Since exceptions are not being caught in this program, the errors would have short-circuited the XmlReader's processing.

Second, the XSD is not explicitly being loaded into memory, so you would not have been given the opportunity to attempt to correct it (assuming your program had a way to do that, of course).

8.2.2 Generating a Schema from a DLL or Executable

The XSD tool also knows how to generate a an XSD from compiled types in a DLL or executable file. When generating a schema, xsd makes certain assumptions about the XSD types of instance variables. For any given CLR type, xsd chooses an XSD type for the schema. Table 8-2 lists each XSD type and its corresponding common language runtime type. In the cases where more than one XSD type maps to a single CLR type, the bold one will be used.

Table 8-2. XSD-to-CLR type mappings

XSD type

CLR type

xs:hexBinary

xs:base64Binary

System.Byte[ ]

xs:Boolean

System.Boolean

xs:byte

System.SByte

xs:normalizedString

xs:ENTITY

xs:ID

xs:IDREF

xs:language

xs:Name

xs:NCName

xs:NMTOKEN

xs:NOTATION

xs:string

xs:token

System.String

xs:date

xs:gMonthDay

xs:gDay

xs:gYear

xs:gYearMonth

xs:month

xs:time

xs:timePeriod

System.DateTime

xs:decimal

xs:integer

xs:negativeInteger

xs:nonNegativeInteger

xs:nonPositiveInteger

xs:positiveInteger

System.Decimal

xs:double

System.Double

xs:ENTITIES

xs:IDREFS

xs:NMTOKENS

System.String[ ]

xs:float

System.Single

xs:int

System.Int32

xs:long

System.Int64

xs:QName

System.Xml.XmlQualifiedName

xs:short

System.Int16

xs:unsignedByte

System.Byte

xs:unsignedInt

System.UInt32

xs:unsignedLong

System.UInt64

xs:unsignedShort

System.UInt16

xs:anyURI

System.Uri

xs:hexBinary

xs:base64Binary

System.Byte[ ]

xs:Boolean

System.Boolean

Angus Hardware might have a class structure for product listings, such as is shown in Example 8-4. This code can be compiled into the library Product.dll.

Example 8-4. Product type in C#

using System;

public class Address {
  public string [ ] Street;
  public string City;
  public string State;
  public string Zip;
}

public class Manufacturer {
  public string Name;
  public Address [ ] Addresses;
}

public class Product {
  public string Name;
  public string ProductCode;
  public Manufacturer Manufacturer;
  public DateTime DateIntroduced;
  public decimal UnitCost;
}

When you run the command xsd Product.dll, you get the generated XSD shown in Example 8-5.

Example 8-5. Generated XML Schema for Product.dll

<?xml version="1.0" encoding="utf-8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Address" nillable="true" type="Address" />
  <xs:complexType name="Address">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="1" name="Street" type="ArrayOfString" />
      <xs:element minOccurs="0" maxOccurs="1" name="City" type="xs:string" />
      <xs:element minOccurs="0" maxOccurs="1" name="State" type="xs:string" />
      <xs:element minOccurs="0" maxOccurs="1" name="Zip" type="xs:string" />
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ArrayOfString">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="string" nillable="true" 
type="xs:string" />
    </xs:sequence>
  </xs:complexType>
  <xs:element name="Manufacturer" nillable="true" type="Manufacturer" />
  <xs:complexType name="Manufacturer">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="1" name="Name" type="xs:string" />
      <xs:element minOccurs="0" maxOccurs="1" name="Addresses" type="ArrayOfAddress" />
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ArrayOfAddress">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="Address" nillable="true" 
type="Address" />
    </xs:sequence>
  </xs:complexType>
  <xs:element name="Product" nillable="true" type="Product" />
  <xs:complexType name="Product">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="1" name="Name" type="xs:string" />
      <xs:element minOccurs="0" maxOccurs="1" name="ProductCode" type="xs:string" />
      <xs:element minOccurs="0" maxOccurs="1" name="Manufacturer" type="Manufacturer" />
      <xs:element minOccurs="1" maxOccurs="1" name="DateIntroduced" type="xs:dateTime" />
      <xs:element minOccurs="1" maxOccurs="1" name="UnitCost" type="xs:decimal" />
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Like the XSD generated for an XML instance, a few assumptions are made. For example, although you know from your previous usage that an Address element can only have up to three Street elements, the XSD does nothing to constrain the number; it's created a type called ArrayOfString, whose content is an unbounded number of String elements.

You can affect the generated XSD with the judicious use of C# attributes. There are a number of attributes that affect XSD generation, located in the System.Xml.Serialization namespace; a small subset is listed in Table 8-3. Refer to the .NET Framework SDK documentation section entitled "Attributes That Control XML Serialization" for the complete list.

Table 8-3. Attributes affecting XSD generation

Attribute name

Purpose

Properties

XmlRootAttribute

Identifies the class, structure, enumeration, or interface as the root element of an XML instance

DataType ElementName IsNullable Namespace

XmlElementAttribute

Identifies the class, structure, enumeration, or interface as an element in an XML instance

DataType ElementName Form IsNullable Namespace Type

XmlAttributeAttribute

Identifies the class, structure, enumeration, or interface as an attribute in an XML instance

DataType AttributeName Form Namespace Type

With this information, you can alter the original source code to force the generated code to appear in a form more to your liking. To take just the Product type from Product.cs, you can alter xsd's output significantly by marking some of its fields as attributes:

public class Product {
  [XmlAttributeAttribute(AttributeName="name")]
  public string Name;
  [XmlAttributeAttribute(AttributeName="productCode")]
  public string ProductCode;
  [XmlElementAttribute(IsNullable=false, ElementName="manufacturer")]
  public Manufacturer Manufacturer;
  [XmlAttributeAttribute(AttributeName="dateIntroduced")]
  public DateTime DateIntroduced;
  [XmlAttributeAttribute(AttributeName="unitCost")]
  public decimal UnitCost;
}

The corresponding element in the generated schema0.xsd now looks like this:

<xs:element name="product" type="Product" />
<xs:complexType name="Product">
  <xs:sequence>
    <xs:element minOccurs="0" maxOccurs="1" name="manufacturer" type="Manufacturer" />
  </xs:sequence>
  <xs:attribute name="name" type="xs:string" />
  <xs:attribute name="productCode" type="xs:string" />
  <xs:attribute name="dateIntroduced" type="xs:dateTime" />
  <xs:attribute name="unitCost" type="xs:decimal" />
</xs:complexType>

There's much more to learn about serialization, and I'll cover the topic in much more depth in Chapter 9.

8.2.3 Generating Types from a Schema

Once you have an XSD, whether generated by the XSD tool, produced from some other XML editor, or written by hand, the XSD tool can now generate source code to use an instance of the document it defines. Running the command xsd customer.xsd /classes generates the C# code shown in Example 8-6.

Example 8-6. Generated C# code for customer.xsd

//------------------------------------------------------------------------------
// <autogenerated>
//     This code was generated by a tool.
//     Runtime Version: 1.0.3705.209
//
//     Changes to this file may cause incorrect behavior and will be lost if 
//     the code is regenerated.
// </autogenerated>
//------------------------------------------------------------------------------

// 
// This source code was auto-generated by xsd, Version=1.0.3705.209.
// 
using System.Xml.Serialization;


/// <remarks/>
[System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
public class Customer {
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute(DataType="token")]
    public string Name;        
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("Address")]
    public CustomerAddress[ ] Address;
    
    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string Id;
}

/// <remarks/>
public class CustomerAddress {
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("Street")]
    public string[ ] Street;
    
    /// <remarks/>
    public string City;
    
    /// <remarks/>
    public string State;
    
    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute(DataType="token")]
    public string Zip;
}

Notice that although xsd has simply created the types necessary to read and write customer.xml, it has also inserted attributes that serve as hints to the XmlSerializer. These hints enable the XmlSerializer to properly read and write XML documents corresponding to the object instances in memory. They do not affect the storage of the object instance in memory, however. Even though Customer.Name is decorated with an XmlElementAttribute with DataType="token", there is no constraint on the data in memory; however, a document with non-token data in the Customer.Name element is invalid according to the XSD.

When I first started building customer.xsd, I pointed out the initial capital letters on element and attribute names. It should be clear now that the properties of the generated Customer and CustomerAddress types have exactly the same names as the types in the XSD. By capitalizing the first letters of the names, I've managed to comply with .NET naming convention, without having to change the generated code.

Another way to handle the case issue is through the XmlElementAttribute's Name property. If the XML schema has lowercase names, you can conform to the .NET naming standards by setting this property. You would have to edit the generated source code, however, so it's important to consider carefully whether going to this length is worthwhile.

8.2.4 Generating a DataSet Subclass from a Schema

Much like generating classes, xsd can generate DataSet subclasses from an XSD. System.Data.DataSet is a type that represents a group of database tables cached in memory. The System.Data namespace constitutes the ADO.NET architecture, which we'll talk about in Chapter 11.

[ Team LiB ]