XML stands for Extensible Markup Language and was developed by the World Wide Web Consortium (W3C). XML was designed mainly to overcome the limitation of HTML. Microsoft has embraced XML, and it plays a major part in the .NET Framework.
In the previous chapter, we saw how XML is transparently used for communication between XML Web Services. In this chapter, we will learn more about how XML fits in the .NET Framework in general, and we will go in more detail about the integration of XML in ADO .NET.
XML integration in .NET Framework was designed to meet certain goals:
Compliance with the W3C standards
Extensibility
Pluggable architecture
Performance
.NET fully conforms to the W3C recommended standards of XML, Namespaces, XSLT, XPath, Schema, and the Document Object Model (DOM). Compliance is essential to ensure interoperability across platforms.
XSLT: Extensible Stylesheet Language (XSL) Transformation is used to transform the content of a source XML document into a presentation that is tailored specifically to a particular user, media, or client.
XPath: XPath is a query language used for addressing parts of an XML document.
.NET Framework contains sets of XML classes that support the W3C XML Schema Definition (XSD) language 1.0 recommendation.
Extensibility is achieved through the use of abstract base classes and virtual methods. This extensibility is also referred to as subclassing and is illustrated by the XmlReader, XmlWriter, and XPathNavigator abstract classes. These classes enable new implementations to be developed over different data sources and stores, exposing them as XML. The existing data source and stores can include any file systems, registries, flat file legacy databases, and relational databases. The new implementations not only display the data as XML but also provide XPath query support for those stores.
XML in the .NET Framework is a stream-based architecture. Pluggable in this architecture means that components that are based on abstract .NET XML classes can easily be substituted. It also means that if you have data streaming between the components, new components inserted or plugged into the stream can alter the processing. For example, you can plug components together using different data stores, such as an XPathDocument and XmlDocument in the transformation process. You could plug an implementation of your own XmlReader or XmlWriter for processing the output, allowing the transformation process to and from virtually any data source. To allow the processing of a new data source, simply implement your own XmlReader or XmlWriter for that data source and plug it in.
XML classes in .NET Framework represent low-level processing components and are required to have high performance. They are designed to support a streaming-based architecture. For improved performance, they have the following characteristics:
Minimal caching for forward-only, pull model parsing with the XmlReader
Forward-only validation with the XmlValidatingReader
Cursor style navigation of the XPathNavigator, which minimizes node creation to a single virtual node, yet provides random access to the document. It does not require a complete node tree to be built in memory like the DOM.
Incremental streaming output from the XslTransform class
In the .NET Framework, relational data and XML are coupled through tight integration between the XML classes and ADO .NET. The DataSet component in ADO .NET has the ability to read and write XML using XmlReader and XmlWriter classes, including the ability to persist its relational schema as XML Schemas and construe the schema structure from an XML document. DataSet and XmlDataDocument can be synchronized so that changes in one can be reflected in the other. We will learn more about XML integration with ADO .NET later in this chapter.
The Document Object Model (DOM) class is simply an in-memory representation of an XML document, which allows you to programmatically read, manipulate, and modify XML documents. In .NET, the DOM is presented by the XmlDocument object. Editing is the primary function of the DOM. It is the structured way that XML data is represented in memory, even though the actual XML data is stored in a linear fashion when in a file or in an XML stream from another object.
The DOM is represented as a tree. The basic element of the DOM tree is a node, which is represented in .NET by an XmlNode object. Consider the following XML data.
<?xml version="1.0"?> <products> <product> <productname>Smelly Cheese</productname > <price format="dollar">100.99</price> <expirydate>01/01/2009</expirydate> </product> <supplierinfo> <supplier>Good Cheese Express</supplier> <state>WA</state> </supplierinfo> </products>
The illustration below shows the DOM tree for the XML data:
In the illustration, each circle represents a node. Node objects have set methods and properties, as well as some basic characteristics:
Nodes have a single parent and most can have multiple child nodes.
There are different types of nodes that can have multiple child nodes:
Document
DocumentFragment
EntityReference
Element
Attribute
There are a few types of nodes that cannot have child nodes:
XmlDeclaration
Notation
Entity
CDATASection
Text
Comment
ProcessingInstruction
DocumentType
Attribute is one special node that does not have siblings, parent, or child.
Note |
Attributes, although defined as nodes by the WC3 standards, are better considered a property of an element node. Attributes are made up of a name and value pair (for example, format="dollar"). |
Nodes on the same level in the DOM tree are siblings, such as with the product node and the supplierinfo node in Figure 5-1.
The XmlDocument class extends the XmlNode and supports methods for performing operations on the document as a whole, such as, loading into memory or saving the XML to a file. In addition, XmlDocument provides a means to view and manipulate the nodes in the entire XML document.
Since a node is the basic structure for the DOM, let’s look at the different node types that .NET supports in more detail.
W3C DOM Node Type |
.NET Class |
Description |
---|---|---|
Document |
XmlDocument |
The container of all the nodes in the tree. It is also known as the document root, which is not always the same as the root element. |
Document-Fragment |
XmlDocument-Fragment |
A temporary bag containing one or more nodes without any tree structure |
DocumentType |
XmlDocumentType |
Represents the <!DOCTYPE…> node |
EntityReference |
XmlEntityReference |
Represents the non-expanded entity reference text |
Element |
XmlElement |
Represents an element node |
Attr |
XmlAttribute |
An attribute of an element accessed using the GetAttribute method of an XmlElement |
Processing-Instruction |
XmlProcessing-Instruction |
A processing instruction node |
Comment |
XmlComment |
A comment node |
Text |
XmlText |
Text belonging to an element or attribute |
CDATASection |
XmlCDataSection |
Represents CDATA |
Entity |
XmlEntity |
Represents the <!ENTITY…> declarations in an XML document, either from an internal document type definition (DTD) subset or from external DTDs and parameter entities |
Notation |
XmlNotation |
Represents a notation declared in the DTD |
Not in W3C specification |
XmlDeclaration |
Represents the declaration node <?xml version="1.0"…> |
Not in W3C specification |
XmlSignificant-Whitespace |
Represents significant white space, which is white space in mixed content |
Not in W3C specification |
XmlWhitespace |
Represents the white space in the content of an element |
Not in W3C specification |
EndElement (not a class) |
Returned when XmlReader gets to the end of an element (for example, XML: </item>) |
Not in W3C specification |
EndEntity (not a class) |
Returned when XmlReader gets to the end of the entity replacement as a result of a call to ResolveEntity |
XML information is read into memory from different formats or sources. These can be a stream, URL, text reader, XmlReader object, or derived class of the reader. The Load method loads the document into memory. It is an overloaded method that can take data from each of the different formats. There is also a LoadXml method that reads XML from a string, which is the method we will be using in the following example.
Imports System Imports System.IO Imports System.Xml Public Class Sample Public Shared Sub Main() 'Create the XmlDocument. Dim doc As New XmlDocument() Dim XmlString As String 'Define the XmlString XmlString = _ "<?xml version=""1.0""?>" & _ "<products>" & _ "<product>" & _ "<productname>Smelly Cheese</productname>" & _ "<price format=""dollar"">100.99</price>" & _ "<expirydate>01/01/2009</expirydate>" & _ "</product>" & _ "<supplierinfo>" & _ "<supplier>Good Cheese Express</supplier>" & _ "<state>WA</state>" & _ "</supplierinfo>" & _ "</products>" 'Load the DOM doc.LoadXml(XmlString) 'Save the document to a file. doc.Save("Smelly Cheese data.xml") End Sub 'Main End Class Sample
The above example does not do much; it just creates the DOM for a string and saves it to a file. Notice that you need the System.Xml namespace to be able to use XML classes and System.IO to save the file.
Schemas are used to validate XML documents to make sure that they are well-formed and follow certain required rules. XML documents can be validated using a document type declaration (DTD) file or an XML Schema.
The document type declaration is used to validate XML documents. It is the original schema definition language for XML. DTDs have their own syntax and rules, which are different from XML. In XML documents, the <!DOCTYPE> statement is used to link the document to a DTD. DTDs are somewhat limited when compared to the more flexible XML Schema.
In .NET, the XmlValidatingReader class is used to validate an XML document against an inline DTD section or an external DTD file. To perform validation against a document type definition, XmlValidatingReader uses the DTD defined in the DOCTYPE declaration of an XML document. The DOCTYPE declaration can either point to an inline DTD or be a reference to an external DTD file.
The Schema Object Model (SOM) classes provide an in-memory representation of an XML Schema, which allows you to create and validate XML documents. XML Schemas are similar to data modeling in a relational database in that they provide a way to define the structure of XML documents. This is achieved by specifying the elements that can be used in the documents, including the structure and types that these elements must follow. The schema itself is an XML file, typically with an .xsd file extension. XML Schemas provide some advantages over document type definitions:
Additional data types
Ability to create custom data types
Schema uses XML syntax
Schema supports object-oriented concepts like polymorphism and inheritance
In .NET, SOM facilities are provided by a set of classes in the System.XML.Schema namespace.
The World Wide Web Consortium (W3C) schema recommendation specifies the data types that can be used in XML Schemas. In .NET, these data types are represented as XmlSchemaDatatype objects. An XmlSchemaDatatype object contains the ValueType property, which holds the name of the type, as specified in the W3C XML 1.0 recommendation, and the TokenizedType property, which holds the name of the equivalent .NET data type. The table below shows the equivalent .NET data type for each XML Schema data type:
XML Schema Data Type |
.NET Framework Data Type |
---|---|
anyURI |
System.Uri |
base64Binary |
System.Byte[] |
Boolean |
System.Boolean |
Byte |
System.SByte |
Date |
System.DateTime |
dateTime |
System.DateTime |
decimal |
System.Decimal |
Double |
System.Double |
duration |
System.TimeSpan |
ENTITIES |
System.String[] |
ENTITY |
System.String |
Float |
System.Single |
gDay |
System.DateTime |
gMonthDay |
System.DateTime |
gYear |
System.DateTime |
gYearMonth |
System.DateTime |
hexBinary |
System.Byte[] |
ID |
System.String |
IDREF |
System.String |
IDREFS |
System.String[] |
int |
System.Int32 |
integer |
System.Decimal |
language |
System.String |
long |
System.Int64 |
month |
System.DateTime |
Name |
System.String |
NCName |
System.String |
negativeInteger |
System.Decimal |
NMTOKEN |
System.String |
NMTOKENS |
System.String[] |
nonNegativeInteger |
System.Decimal |
nonPositiveInteger |
System.Decimal |
normalizedString |
System.String |
NOTATION |
System.String |
positiveInteger |
System.Decimal |
QName |
System.Xml.XmlQualifiedName |
short |
System.Int16 |
string |
System.String |
time |
System.DateTime |
timePeriod |
System.DateTime |
token |
System.String |
unsignedByte |
System.Byte |
unsignedInt |
System.UInt32 |
unsignedLong |
System.UInt64 |
unsignedShort |
System.UInt16 |
The XmlSchemaElement and XmlSchemaAttribute classes both have AttributeType properties and ElementType properties that contain an XmlSchemaDatatype object once the schema has been validated and compiled.