DekGenius.com
[ Team LiB ] Previous Section Next Section

13.1 Using the XmlDiffPatch Namespace

Since XmlDiffPatch is a .NET assembly, it can be included in any .NET project by adding it as a reference in Visual Studio .NET, or by specifying the /r:xmldiffpatch.dll command-line switch to the C# compiler. If you plan to use XmlDiffPatch in multiple projects, you may find it useful to add it to your Global Assembly Cache using the installutil executable or the Microsoft .NET Framework Configuration tool.

Once you've added the XmlDiffPatch reference to your project or Makefile, you can include it in your own source code with the using statement:

using Microsoft.XmlDiffPatch;

Example 13-1 shows a program which constructs two XmlDocument instances in memory, then compares them using XmlDiff.

Example 13-1. Program to construct and compare two XmlDocument instances
using System;
using System.Text;
using System.Xml;

using Microsoft.XmlDiffPatch;

public class DoDiff {
  public static void Main(string [ ] args) {
    XmlDocument doc1 = new XmlDocument( );
    doc1.AppendChild(doc1.CreateXmlDeclaration("1.0", null, null));
    doc1.AppendChild(doc1.CreateElement("foo"));
    
    XmlDocument doc2 = new XmlDocument( );
    doc2.AppendChild(doc2.CreateXmlDeclaration("1.0", null, null));
    doc2.AppendChild(doc2.CreateElement("bar"));
    doc2.DocumentElement.AppendChild(doc2.CreateElement("baz"));
    
    XmlTextWriter diffgram = new XmlTextWriter(Console.Out);
    diffgram.Formatting = Formatting.Indented;
    
    XmlDiff diff = new XmlDiff(XmlDiffOptions.None);
    diff.Compare(doc1, doc2, diffgram);
    
    diffgram.Flush( );
  }
}

This program simply creates two XmlDocument instances, with one element each. The original document's root element is named foo, and the modified document's root element is named bar. The modified document's bar element is then given a child element named baz. The program then creates an instance of the XmlDiff class and calls its Compare( ) method, which sends the following output to the console. Note that I've formatted it for readability:

<?xml version="1.0" encoding="IBM437"?>
<xd:xmldiff 
  version="1.0" srcDocHash="1260031300178880892" 
  options="None" fragments="no" 
    xmlns:xd="http://schemas.microsoft.com/xmltools/2002/xmldiff">
  <xd:node match="1" />
  <xd:remove match="2" subtree="no" />
  <xd:add type="1" name="bar">
    <xd:add type="1" name="baz" />
  </xd:add>
</xd:xmldiff>

The XmlDiff constructor has an XmlDiffOptions parameter which can be used to specify that certain information should be ignored while computing the difference. Things to be ignored can include the order of child nodes, comments, DTDs, XML declarations, processing instructions, XML namespaces and namespace prefixes, and whitespace.

Compare( ) has six overloads to deal with the various types of inputs, including URI, XmlNode, and XmlReader. It is also capable of dealing with non-well-formed XML fragments and partial documents. The overload of Compare( ) which I've used here takes two XmlNode instances as input and sends the output to an XmlWriter. The output is an XML Diff Language (XDL) Diffgram, about which I'll talk more in a minute.

Once you've run Compare( ) and written the XDL Diffgram to an XmlWriter, you can use XmlPatch to apply the Diffgram to the original XML document, and duplicate the changes necessary to produce the modified document.

Example 13-2 shows modifications to the original program to patch the original XML document. The changes are highlighted.

Example 13-2. Program to patch an XML document
using System;
using System.IO;
using System.Text;
using System.Xml;

using Microsoft.XmlDiffPatch;

public class DoDiff {
  public static void Main(string [ ] args) {
    XmlDocument doc1 = new XmlDocument( );
    doc1.AppendChild(doc1.CreateXmlDeclaration("1.0", null, null));
    doc1.AppendChild(doc1.CreateElement("foo"));
    
    XmlDocument doc2 = new XmlDocument( );
    doc2.AppendChild(doc2.CreateXmlDeclaration("1.0", null, null));
    doc2.AppendChild(doc2.CreateElement("bar"));
    doc2.DocumentElement.AppendChild(doc2.CreateElement("baz"));
    
    Stream stream = new MemoryStream( );
    XmlTextWriter diffgram = new XmlTextWriter(new StreamWriter(stream));
    diffgram.Formatting = Formatting.Indented;
    
    XmlDiff diff = new XmlDiff(XmlDiffOptions.None);
    diff.Compare(doc1, doc2, diffgram);
    
    stream.Seek(0, SeekOrigin.Begin);
    
    XmlPatch patch = new XmlPatch( );
    patch.Patch(doc1, new XmlTextReader(stream));
    
    XmlTextWriter writer = new XmlTextWriter(Console.Out);
    writer.Formatting = Formatting.Indented;
    doc1.WriteTo(writer);
  }
}

In the first group of highlighted lines, the Diffgram is written to an XmlTextWriter which wraps a MemoryStream instead of to the console. Then the MemoryStream is reset back to the beginning, and a new XmlPatch instance is created. The XmlPatch.Patch( ) method is called to patch the doc1 XmlDocument according to the Diffgram. Finally, the patched doc1 is written to the console with an XmlTextWriter.

The output from the modified program is below. It should look just the same as doc2 would have looked if it were serialized to the console:

<?xml version="1.0"?>
<bar>
  <baz />
</bar>

13.1.1 The XDL Diffgram format

Despite its similar name and appearance, this XML format is not directly related to the Diffgram format used in ADO.NET. The XDL Diffgram contains a list of the changes made to the original document. The root element of the XDL Diffgram, xd:xmldiff, has a namespace of http://schemas.microsoft.com/xmltools/2002/xmldiff. Its attributes are listed below:


version

Indicates the version of the XDL Diffgram format being used.


srcDocHash

Provides a hashed value which is used to verify that the XML document the Diffgram is being used to patch is the same as the original XML document used to generate the Diffgram.


options

Contains the names of the XmlDiffOptions passed into the XmlDiff constructor, if any. The names are separated by spaces.


fragments

If the value is yes, the Diffgram contains differences between XML nodes. If the value is no, the Diffgram contains differences between XML documents.


xmlns:xd

Indicates the XML namespace URI for the xd: prefix.

The xd:diffgram element may have any number of xd:node child nodes. Each xd:node element represents a node in the original document that has changed in the changed document, or the position of a new or deleted node in the changed document. The xd:node element's match attribute contains a path descriptor for the node.

Although they look similar, the XDL path syntax is not XPath. XDL paths are based on the original XML document after it has been loaded into a DOM tree. An XDL path consists of a combination of numerals, the characters /, -, and |, and attribute names preceded with the character @. Table 13-1 lists the meanings of the XDL path values.

Table 13-1. XDL path values

Value

Description

n

nth child node of the current node

n1-n2

n1st through n2nd child node of the current node

n1|n2

n1st and n2nd child node of the current node

/n

nth child node of the root node

n1/n2

n2nd child node of the n1st child node of the current node

@name

attribute named name of the current node

The various XDL path values may be combined to produce a path describing each node in the original XML document.

The child elements of xd:node may be any combination of xd:add, which indicates a node that was added in the new XML document; xd:change, which indicates that the node was changed in the new XML document; and xd:remove, which indicates that the node was removed in the new XML document. The list below describes the attributes of the xd:add element:


type

The node type, matching the XmlNodeType enumeration values (see Table 2-1).


name

Name of the new node. If the new node is an element or attribute, this is the local name.


ns

Namespace URI of the new node.


prefix

Namespace prefix of the new node.


systemId

System ID of the new node if it is a document type declaration.


publicId

Public ID of the new node if it is a document type declaration.


opId

The operation ID of the addition. The opId value is used to tie changes in different parts of the Diffgram together.

The xd:add element's content may be a text value if the new node is an attribute or XML declaration, a CDATA section if the new node is a document type declaration, or any number xd:add elements if the new node is an element. If the xd:add has no attributes, its content may also be any complete XML fragment matching the new content.

The opId attribute is used to tie together changes that appear in a Diffgram. For example, a node that is moved from one location in the original XML document to another in the changed document will appear in an xd:remove element and an xd:add element in the Diffgram. The xd:remove and xd:add elements will have the same opId attribute value to indicate that they represent two parts of the same operation.

The following list describes the attributes of the xd:change element:


match

The relative path from the current node to the child node which has changed.


name

New name of the node. If the changed node is an element or attribute, this is the local name.


ns

New namespace URI of the node.


prefix

New namespace prefix of the node.


systemId

New system ID of the node if it is a document type declaration.


publicId

New public ID of the node if it is a document type declaration.


opId

The operation ID of the change. The opId value is used to tie changes in different parts of the Diffgram together.

The content of an xd:change element may be text if the changed node is a text node or attribute value, a CDATA section if the changed node is a CDATA section or document type declaration, a processing instruction if the changed node is a processing instruction, or a comment if the changed node is a comment. If the changed node is an element, the xd:change may have any number of xd:node, xd:add, xd:change, or xd:remove child elements representing differences for the changed node's child nodes.

The following list describes the attributes of the xd:remove element:


match

The relative path from the current node to the list of nodes which have been removed.


subtree

If the entire subtree of each of the matching nodes has been removed, the attribute's value is yes. Otherwise, the attribute's value is no and the match attribute must evaluate to a single node.


opId

The operation ID of the removal. The opId value is used to tie changes in different parts of the Diffgram together.

If the subtree attribute is set to no, the xd:remove element's content may be any number of xd:node, xd:add, xd:change, or xd:remove, indicating differences to the removed node's child nodes. The child nodes become child nodes of the removed node's parent.

    [ Team LiB ] Previous Section Next Section