[ Team LiB ] |
8.2 Using the XSD ToolMicrosoft provides a tool with the .NET Framework SDK called XSD, the XML Schemas/DataTypes support utility. With this tool, you can generate schemas from source code, compiled assemblies, and XML documents, as well as generating source code in various .NET languages from schemas. The XSD command-line syntax shown here is explained in Table 8-1: xsd.exe filename.ext [argument [...]]
As you can see from the command-line syntax, xsd can be used to generate either source code or an XML Schema, based on a variety of inputs. It's a very useful tool for generating an XSD based on a .NET type or XML document you have already written, as well as generating source code for a .NET type based on an existing XSD.
8.2.1 Generating a Schema from an XML Documentxsd can be used to generate a best-guess schema from any XML document. It will make certain assumptions about the structure of your document, based on the data found in the example you provide. For example, it will always set minOccurs to 1 and maxOccurs to unbounded for each element. It will also always use the xs:sequence compositor for lists of elements, even if your example XML document has elements in various orders. This can present the odd situation of the sample document used to generate the XSD failing validation with the XSD generated from it. Finally, the type attribute of each xs:element and xs:attribute element defaults to xs:string. For these reasons, you should never take the generated XSD for granted. Always edit it to make sure it will fit your real requirements. Using the purchase order document from Chapter 2, you can generate an XSD with the following command line: xsd po1456.xml You can go ahead and use XSD to generate the source code. I've already done so, and tweaked the generated code to ensure that this XSD validates the PO correctly. These edits are highlighted in Example 8-1. I intentionally introduced a couple of mistakes in my edits. I've done this to point out how XmlSchema validates an XSD, and I'll explain that more in a moment. Example 8-1. Generated XSD for purchase orders<?xml version="1.0" encoding="utf-8"?> <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="po"> <xs:complexType> <xs:attribute name="id" type="xs:ID" /> <xs:sequence> <xs:element name="date"> <xs:complexType> <xs:attribute name="year" type="xs:string" /> <xs:attribute name="month" type="xs:string" /> <xs:attribute name="day" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="address" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string" msdata:Ordinal="0" /> <xs:element name="street" type="xs:string" maxOccurs="3" msdata:Ordinal="1" /> <xs:element name="city" type="xs:string" msdata:Ordinal="2" /> <xs:element name="state" type="xs:string" msdata:Ordinal="3" /> <xs:element name="zip" type="xs:string" msdata:Ordinal="4" /> </xs:sequence> <xs:attribute name="type" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="items" minOccurs="2" maxOccurs="1"> <xs:complexType> <xs:sequence> <xs:element name="item" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="quantity" type="xs:string" /> <xs:attribute name="productCode" type="xs:string" /> <xs:attribute name="description" type="xs:string" /> <xs:attribute name="unitCost" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="NewDataSet" msdata:IsDataSet="true"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element ref="po" /> </xs:choice> </xs:complexType> </xs:element> </xs:schema> There are a few pieces of this generated XSD that you should note. First is the inclusion of the namespace prefix msdata in the attributes msdata:Ordinal and msdata:IsDataSet. The urn:schemas-microsoft-com:xml-msdata namespace provides hints to the DataSet class when serializing an XML instance to a database. Second is the NewDataSet element itself. This is used when generating source code for the XSD with the /dataset flag; the resulting source code will provide the definition of a subclass of System.Data.DataSet. I'll address both of these issues in depth in Chapter 9 and Chapter 11. Given the generated XSD and the modifications to it, you can do two things. First, you can verify that it is a valid XML Schema after the changes. The program shown in Example 8-2 will do just that. Example 8-2. Validation of an XML Schemausing System; using System.IO; using System.Xml.Schema; public class ValidateSchema { public static void Main(string [ ] args) { ValidationEventHandler handler = new ValidationEventHandler(ValidateSchema.Handler); XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler); schema.Compile(handler); } public static void Handler(object sender, ValidationEventArgs e) { Console.WriteLine(e.Message); } } A ValidationEventHandler can be called in two places. The first, checking the XML Schema itself, happens on the following line: XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler); XmlSchema.Read( ) reads the content of the XSD from a Stream, TextReader, or XmlReader, and takes a ValidationEventHandler delegate as its second parameter; the ValidationEventHandler is covered in Chapter 2. Any XML validation errors that arise while reading in the file will be reported to the ValidationEventHandler.
The second phase, validating the content of the XSD, happens here: schema.Compile(handler); In this phase, the content of the XSD is checked to make sure that it is really a valid instance of XML Schema. Its errors will also be reported to the ValidationEventHandler. With the XSD in Example 8-1, running this validator will produce the following output: C:\Chapter 8>ValidateSchema po.xsd The content model of a complex type must consist of 'annotation'(if present) followed by zero or one of 'simpleContent' or 'complexContent' or 'group' or 'choice' or 'sequence' or 'all' followed by zero or more attributes or attributeGroups followed by zero or one anyAttribute. An error occurred at (6, 8). minOccurs value cannot be greater than maxOccurs value. An error occurred at (25, 10). Looking back, I made two mistakes. First, the id attribute of the po element is in the wrong place; the xsd:attribute element must come after the xsd:sequence element when defining an element. You can move the attribute into its proper place to avoid this error. This validation error was caught by the Read( ) method, because it is a case of the XML itself being invalid.
Second, the items element has minOccurs set to 3 and maxOccurs set to 1. In this case, the Compile( ) method caught my error, because the XSD was a well-formed XML document, although it did not constitute a sane XML Schema instance. At the end of the program, you'll notice that the entire XSD is loaded. Although it is not valid, it sits in memory, ready to be used. Rather than editing the schema on disk, you could have used the XmlSchema type's methods to work with it and make it valid, as you'll see later in this chapter. You can now use the generated XSD, with the changes to correct my errors, to validate the document that was used to generate it. Example 8-3 shows a program that validates an XML document with an XSD, with a couple of interesting lines highlighted. Example 8-3. Validation of an XML file with an XML Schemausing System; using System.IO; using System.Xml; using System.Xml.Schema; public class Validate { private static bool valid = true; public static void Main(string [ ] args) { using (Stream stream = File.OpenRead(args[0])) { XmlValidatingReader reader = new XmlValidatingReader(new XmlTextReader(stream)); reader.ValidationType = ValidationType.Schema; reader.Schemas.Add("", args[1]); reader.ValidationEventHandler += new ValidationEventHandler(Handler); while (reader.Read( )) { // do nothing } } if (valid) { Console.WriteLine("Document is valid."); } } public static void Handler(object sender, ValidationEventArgs e) { valid = false; Console.WriteLine(e.Message); } } Take a look at the lines that are highlighted in the example: reader.ValidationType = ValidationType.Schema; This line sets the XmlValidatingReader's ValidationType property to ValidationType.Schema. As I mentioned in the discussion of validation by DTD in Chapter 2, this alone is not enough to cause the document to be validated; the following line takes care of that: reader.Schemas.Add("", args[1]); This line adds the XSD whose name is passed in on the command line to the XmlSchemaCollection in XmlValidatingReader's Schemas property. XmlSchemaCollection is just what it sounds like, a collection of schemas. Its Add( ) method has four overloads. The one used here takes two strings; the first is the namespace URI to which the schema applies, and the second is the name of the XSD file which will be read. Other overloads allow you to add an XmlSchema instance, an XmlReader, or an entire XmlSchemaCollection to the list. The document will be validated with each schema in the XmlSchemaCollection: while (reader.Read( )) { // do nothing } These lines read and validate the XML document. Once XmlValidatingReader is told to validate the document, all you have to do is read it and it will be validated. The while loop need not do anything else. It's worth noting that, had you not validated my faulty XSD before attempting to validate an XML document with it, the same errors would have been found. There are two differences, however. First, only the first error would have been reported via an XmlSchemaException, rather than being handled with the ValidationEventHandler. Since exceptions are not being caught in this program, the errors would have short-circuited the XmlReader's processing. Second, the XSD is not explicitly being loaded into memory, so you would not have been given the opportunity to attempt to correct it (assuming your program had a way to do that, of course). 8.2.2 Generating a Schema from a DLL or ExecutableThe XSD tool also knows how to generate a an XSD from compiled types in a DLL or executable file. When generating a schema, xsd makes certain assumptions about the XSD types of instance variables. For any given CLR type, xsd chooses an XSD type for the schema. Table 8-2 lists each XSD type and its corresponding common language runtime type. In the cases where more than one XSD type maps to a single CLR type, the bold one will be used.
Angus Hardware might have a class structure for product listings, such as is shown in Example 8-4. This code can be compiled into the library Product.dll. Example 8-4. Product type in C#using System; public class Address { public string [ ] Street; public string City; public string State; public string Zip; } public class Manufacturer { public string Name; public Address [ ] Addresses; } public class Product { public string Name; public string ProductCode; public Manufacturer Manufacturer; public DateTime DateIntroduced; public decimal UnitCost; } When you run the command xsd Product.dll, you get the generated XSD shown in Example 8-5. Example 8-5. Generated XML Schema for Product.dll<?xml version="1.0" encoding="utf-8"?> <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Address" nillable="true" type="Address" /> <xs:complexType name="Address"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="1" name="Street" type="ArrayOfString" /> <xs:element minOccurs="0" maxOccurs="1" name="City" type="xs:string" /> <xs:element minOccurs="0" maxOccurs="1" name="State" type="xs:string" /> <xs:element minOccurs="0" maxOccurs="1" name="Zip" type="xs:string" /> </xs:sequence> </xs:complexType> <xs:complexType name="ArrayOfString"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" name="string" nillable="true" type="xs:string" /> </xs:sequence> </xs:complexType> <xs:element name="Manufacturer" nillable="true" type="Manufacturer" /> <xs:complexType name="Manufacturer"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="1" name="Name" type="xs:string" /> <xs:element minOccurs="0" maxOccurs="1" name="Addresses" type="ArrayOfAddress" /> </xs:sequence> </xs:complexType> <xs:complexType name="ArrayOfAddress"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" name="Address" nillable="true" type="Address" /> </xs:sequence> </xs:complexType> <xs:element name="Product" nillable="true" type="Product" /> <xs:complexType name="Product"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="1" name="Name" type="xs:string" /> <xs:element minOccurs="0" maxOccurs="1" name="ProductCode" type="xs:string" /> <xs:element minOccurs="0" maxOccurs="1" name="Manufacturer" type="Manufacturer" /> <xs:element minOccurs="1" maxOccurs="1" name="DateIntroduced" type="xs:dateTime" /> <xs:element minOccurs="1" maxOccurs="1" name="UnitCost" type="xs:decimal" /> </xs:sequence> </xs:complexType> </xs:schema> Like the XSD generated for an XML instance, a few assumptions are made. For example, although you know from your previous usage that an Address element can only have up to three Street elements, the XSD does nothing to constrain the number; it's created a type called ArrayOfString, whose content is an unbounded number of String elements. You can affect the generated XSD with the judicious use of C# attributes. There are a number of attributes that affect XSD generation, located in the System.Xml.Serialization namespace; a small subset is listed in Table 8-3. Refer to the .NET Framework SDK documentation section entitled "Attributes That Control XML Serialization" for the complete list.
With this information, you can alter the original source code to force the generated code to appear in a form more to your liking. To take just the Product type from Product.cs, you can alter xsd's output significantly by marking some of its fields as attributes: public class Product { [XmlAttributeAttribute(AttributeName="name")] public string Name; [XmlAttributeAttribute(AttributeName="productCode")] public string ProductCode; [XmlElementAttribute(IsNullable=false, ElementName="manufacturer")] public Manufacturer Manufacturer; [XmlAttributeAttribute(AttributeName="dateIntroduced")] public DateTime DateIntroduced; [XmlAttributeAttribute(AttributeName="unitCost")] public decimal UnitCost; } The corresponding element in the generated schema0.xsd now looks like this: <xs:element name="product" type="Product" /> <xs:complexType name="Product"> <xs:sequence> <xs:element minOccurs="0" maxOccurs="1" name="manufacturer" type="Manufacturer" /> </xs:sequence> <xs:attribute name="name" type="xs:string" /> <xs:attribute name="productCode" type="xs:string" /> <xs:attribute name="dateIntroduced" type="xs:dateTime" /> <xs:attribute name="unitCost" type="xs:decimal" /> </xs:complexType> There's much more to learn about serialization, and I'll cover the topic in much more depth in Chapter 9. 8.2.3 Generating Types from a SchemaOnce you have an XSD, whether generated by the XSD tool, produced from some other XML editor, or written by hand, the XSD tool can now generate source code to use an instance of the document it defines. Running the command xsd customer.xsd /classes generates the C# code shown in Example 8-6. Example 8-6. Generated C# code for customer.xsd//------------------------------------------------------------------------------ // <autogenerated> // This code was generated by a tool. // Runtime Version: 1.0.3705.209 // // Changes to this file may cause incorrect behavior and will be lost if // the code is regenerated. // </autogenerated> //------------------------------------------------------------------------------ // // This source code was auto-generated by xsd, Version=1.0.3705.209. // using System.Xml.Serialization; /// <remarks/> [System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)] public class Customer { /// <remarks/> [System.Xml.Serialization.XmlElementAttribute(DataType="token")] public string Name; /// <remarks/> [System.Xml.Serialization.XmlElementAttribute("Address")] public CustomerAddress[ ] Address; /// <remarks/> [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")] public string Id; } /// <remarks/> public class CustomerAddress { /// <remarks/> [System.Xml.Serialization.XmlElementAttribute("Street")] public string[ ] Street; /// <remarks/> public string City; /// <remarks/> public string State; /// <remarks/> [System.Xml.Serialization.XmlElementAttribute(DataType="token")] public string Zip; } Notice that although xsd has simply created the types necessary to read and write customer.xml, it has also inserted attributes that serve as hints to the XmlSerializer. These hints enable the XmlSerializer to properly read and write XML documents corresponding to the object instances in memory. They do not affect the storage of the object instance in memory, however. Even though Customer.Name is decorated with an XmlElementAttribute with DataType="token", there is no constraint on the data in memory; however, a document with non-token data in the Customer.Name element is invalid according to the XSD. When I first started building customer.xsd, I pointed out the initial capital letters on element and attribute names. It should be clear now that the properties of the generated Customer and CustomerAddress types have exactly the same names as the types in the XSD. By capitalizing the first letters of the names, I've managed to comply with .NET naming convention, without having to change the generated code.
8.2.4 Generating a DataSet Subclass from a SchemaMuch like generating classes, xsd can generate DataSet subclasses from an XSD. System.Data.DataSet is a type that represents a group of database tables cached in memory. The System.Data namespace constitutes the ADO.NET architecture, which we'll talk about in Chapter 11. |
[ Team LiB ] |