DekGenius.com
Team LiB
Previous Section Next Section

Chapter 5: XML Integration with ADO .NET

XML in .NET Frameworks

XML stands for Extensible Markup Language and was developed by the World Wide Web Consortium (W3C). XML was designed mainly to overcome the limitation of HTML. Microsoft has embraced XML, and it plays a major part in the .NET Framework.

In the previous chapter, we saw how XML is transparently used for communication between XML Web Services. In this chapter, we will learn more about how XML fits in the .NET Framework in general, and we will go in more detail about the integration of XML in ADO .NET.

Architectural Overview and Design Goals

XML integration in .NET Framework was designed to meet certain goals:

  • Compliance with the W3C standards

  • Extensibility

  • Pluggable architecture

  • Performance

  • Tight integration with ADO .NET

Standards Compliance

.NET fully conforms to the W3C recommended standards of XML, Namespaces, XSLT, XPath, Schema, and the Document Object Model (DOM). Compliance is essential to ensure interoperability across platforms.

  • XSLT: Extensible Stylesheet Language (XSL) Transformation is used to transform the content of a source XML document into a presentation that is tailored specifically to a particular user, media, or client.

  • XPath: XPath is a query language used for addressing parts of an XML document.

.NET Framework contains sets of XML classes that support the W3C XML Schema Definition (XSD) language 1.0 recommendation.

Extensibility

Extensibility is achieved through the use of abstract base classes and virtual methods. This extensibility is also referred to as subclassing and is illustrated by the XmlReader, XmlWriter, and XPathNavigator abstract classes. These classes enable new implementations to be developed over different data sources and stores, exposing them as XML. The existing data source and stores can include any file systems, registries, flat file legacy databases, and relational databases. The new implementations not only display the data as XML but also provide XPath query support for those stores.

Pluggable Architecture

XML in the .NET Framework is a stream-based architecture. Pluggable in this architecture means that components that are based on abstract .NET XML classes can easily be substituted. It also means that if you have data streaming between the components, new components inserted or plugged into the stream can alter the processing. For example, you can plug components together using different data stores, such as an XPathDocument and XmlDocument in the transformation process. You could plug an implementation of your own XmlReader or XmlWriter for processing the output, allowing the transformation process to and from virtually any data source. To allow the processing of a new data source, simply implement your own XmlReader or XmlWriter for that data source and plug it in.

Performance

XML classes in .NET Framework represent low-level processing components and are required to have high performance. They are designed to support a streaming-based architecture. For improved performance, they have the following characteristics:

  • Minimal caching for forward-only, pull model parsing with the XmlReader

  • Forward-only validation with the XmlValidatingReader

  • Cursor style navigation of the XPathNavigator, which minimizes node creation to a single virtual node, yet provides random access to the document. It does not require a complete node tree to be built in memory like the DOM.

  • Incremental streaming output from the XslTransform class

Tight Integration with ADO .NET

In the .NET Framework, relational data and XML are coupled through tight integration between the XML classes and ADO .NET. The DataSet component in ADO .NET has the ability to read and write XML using XmlReader and XmlWriter classes, including the ability to persist its relational schema as XML Schemas and construe the schema structure from an XML document. DataSet and XmlDataDocument can be synchronized so that changes in one can be reflected in the other. We will learn more about XML integration with ADO .NET later in this chapter.

DOM: The XML Document Object Model

The Document Object Model (DOM) class is simply an in-memory representation of an XML document, which allows you to programmatically read, manipulate, and modify XML documents. In .NET, the DOM is presented by the XmlDocument object. Editing is the primary function of the DOM. It is the structured way that XML data is represented in memory, even though the actual XML data is stored in a linear fashion when in a file or in an XML stream from another object.

The DOM is represented as a tree. The basic element of the DOM tree is a node, which is represented in .NET by an XmlNode object. Consider the following XML data.

<?xml version="1.0"?>
  <products>

    <product>
      <productname>Smelly Cheese</productname >
      <price format="dollar">100.99</price>
      <expirydate>01/01/2009</expirydate>
    </product>

    <supplierinfo>
      <supplier>Good Cheese Express</supplier>
      <state>WA</state>
    </supplierinfo>

  </products>

The illustration below shows the DOM tree for the XML data:

In the illustration, each circle represents a node. Node objects have set methods and properties, as well as some basic characteristics:

  • Nodes have a single parent and most can have multiple child nodes.

  • There are different types of nodes that can have multiple child nodes:

    • Document

    • DocumentFragment

    • EntityReference

    • Element

    • Attribute

  • There are a few types of nodes that cannot have child nodes:

    • XmlDeclaration

    • Notation

    • Entity

    • CDATASection

    • Text

    • Comment

    • ProcessingInstruction

    • DocumentType

  • Attribute is one special node that does not have siblings, parent, or child.

    Note 

    Attributes, although defined as nodes by the WC3 standards, are better considered a property of an element node. Attributes are made up of a name and value pair (for example, format="dollar").

Nodes on the same level in the DOM tree are siblings, such as with the product node and the supplierinfo node in Figure 5-1.

Click To expand
Figure 5-1: The DOM tree

The XmlDocument class extends the XmlNode and supports methods for performing operations on the document as a whole, such as, loading into memory or saving the XML to a file. In addition, XmlDocument provides a means to view and manipulate the nodes in the entire XML document.

Note 

For optimization purposes, if you do not require the structure or editing capabilities provided by the XmlDocument class, the XmlReader and XmlWriter classes provide non-cached, forward-only stream access to XML.

Nodes in .NET

Since a node is the basic structure for the DOM, let’s look at the different node types that .NET supports in more detail.

W3C DOM Node Type

.NET Class

Description

Document

XmlDocument

The container of all the nodes in the tree. It is also known as the document root, which is not always the same as the root element.

Document-Fragment

XmlDocument-Fragment

A temporary bag containing one or more nodes without any tree structure

DocumentType

XmlDocumentType

Represents the <!DOCTYPE…> node

EntityReference

XmlEntityReference

Represents the non-expanded entity reference text

Element

XmlElement

Represents an element node

Attr

XmlAttribute

An attribute of an element accessed using the GetAttribute method of an XmlElement

Processing-Instruction

XmlProcessing-Instruction

A processing instruction node

Comment

XmlComment

A comment node

Text

XmlText

Text belonging to an element or attribute

CDATASection

XmlCDataSection

Represents CDATA

Entity

XmlEntity

Represents the <!ENTITY…> declarations in an XML document, either from an internal document type definition (DTD) subset or from external DTDs and parameter entities

Notation

XmlNotation

Represents a notation declared in the DTD

Not in W3C specification

XmlDeclaration

Represents the declaration node <?xml version="1.0"…>

Not in W3C specification

XmlSignificant-Whitespace

Represents significant white space, which is white space in mixed content

Not in W3C specification

XmlWhitespace

Represents the white space in the content of an element

Not in W3C specification

EndElement (not a class)

Returned when XmlReader gets to the end of an element (for example, XML: </item>)

Not in W3C specification

EndEntity (not a class)

Returned when XmlReader gets to the end of the entity replacement as a result of a call to ResolveEntity

Loading XML Documents in the DOM

XML information is read into memory from different formats or sources. These can be a stream, URL, text reader, XmlReader object, or derived class of the reader. The Load method loads the document into memory. It is an overloaded method that can take data from each of the different formats. There is also a LoadXml method that reads XML from a string, which is the method we will be using in the following example.

Imports System
Imports System.IO
Imports System.Xml

Public Class Sample

    Public Shared Sub Main()
        'Create the XmlDocument.
        Dim doc As New XmlDocument()
        Dim XmlString As String
        'Define the XmlString
        XmlString = _
            "<?xml version=""1.0""?>" & _
            "<products>" & _
            "<product>" & _
            "<productname>Smelly Cheese</productname>" & _
            "<price format=""dollar"">100.99</price>"  & _
            "<expirydate>01/01/2009</expirydate>" & _
            "</product>" & _
            "<supplierinfo>" & _
            "<supplier>Good Cheese Express</supplier>" & _
            "<state>WA</state>" & _
            "</supplierinfo>" & _
            "</products>"

        'Load the DOM
        doc.LoadXml(XmlString)

        'Save the document to a file.
        doc.Save("Smelly Cheese data.xml")

    End Sub 'Main

End Class  Sample

The above example does not do much; it just creates the DOM for a string and saves it to a file. Notice that you need the System.Xml namespace to be able to use XML classes and System.IO to save the file.

Validating XML Documents

Schemas are used to validate XML documents to make sure that they are well-formed and follow certain required rules. XML documents can be validated using a document type declaration (DTD) file or an XML Schema.

DTD: The XML Document Type Declaration

The document type declaration is used to validate XML documents. It is the original schema definition language for XML. DTDs have their own syntax and rules, which are different from XML. In XML documents, the <!DOCTYPE> statement is used to link the document to a DTD. DTDs are somewhat limited when compared to the more flexible XML Schema.

In .NET, the XmlValidatingReader class is used to validate an XML document against an inline DTD section or an external DTD file. To perform validation against a document type definition, XmlValidatingReader uses the DTD defined in the DOCTYPE declaration of an XML document. The DOCTYPE declaration can either point to an inline DTD or be a reference to an external DTD file.

SOM: The XML Schema Object Model

The Schema Object Model (SOM) classes provide an in-memory representation of an XML Schema, which allows you to create and validate XML documents. XML Schemas are similar to data modeling in a relational database in that they provide a way to define the structure of XML documents. This is achieved by specifying the elements that can be used in the documents, including the structure and types that these elements must follow. The schema itself is an XML file, typically with an .xsd file extension. XML Schemas provide some advantages over document type definitions:

  • Additional data types

  • Ability to create custom data types

  • Schema uses XML syntax

  • Schema supports object-oriented concepts like polymorphism and inheritance

In .NET, SOM facilities are provided by a set of classes in the System.XML.Schema namespace.

The World Wide Web Consortium (W3C) schema recommendation specifies the data types that can be used in XML Schemas. In .NET, these data types are represented as XmlSchemaDatatype objects. An XmlSchemaDatatype object contains the ValueType property, which holds the name of the type, as specified in the W3C XML 1.0 recommendation, and the TokenizedType property, which holds the name of the equivalent .NET data type. The table below shows the equivalent .NET data type for each XML Schema data type:

XML Schema Data Type

.NET Framework Data Type

anyURI

System.Uri

base64Binary

System.Byte[]

Boolean

System.Boolean

Byte

System.SByte

Date

System.DateTime

dateTime

System.DateTime

decimal

System.Decimal

Double

System.Double

duration

System.TimeSpan

ENTITIES

System.String[]

ENTITY

System.String

Float

System.Single

gDay

System.DateTime

gMonthDay

System.DateTime

gYear

System.DateTime

gYearMonth

System.DateTime

hexBinary

System.Byte[]

ID

System.String

IDREF

System.String

IDREFS

System.String[]

int

System.Int32

integer

System.Decimal

language

System.String

long

System.Int64

month

System.DateTime

Name

System.String

NCName

System.String

negativeInteger

System.Decimal

NMTOKEN

System.String

NMTOKENS

System.String[]

nonNegativeInteger

System.Decimal

nonPositiveInteger

System.Decimal

normalizedString

System.String

NOTATION

System.String

positiveInteger

System.Decimal

QName

System.Xml.XmlQualifiedName

short

System.Int16

string

System.String

time

System.DateTime

timePeriod

System.DateTime

token

System.String

unsignedByte

System.Byte

unsignedInt

System.UInt32

unsignedLong

System.UInt64

unsignedShort

System.UInt16

The XmlSchemaElement and XmlSchemaAttribute classes both have AttributeType properties and ElementType properties that contain an XmlSchemaDatatype object once the schema has been validated and compiled.

Team LiB
Previous Section Next Section