DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 17.7 Handling Invalid Characters in anXML String

Problem

You are creating an XML string. Before adding a tag containing a text element, you want to check it to determine whether the string contains any of the following invalid characters:

<
>
\"
\'
&

If any of these characters are encountered, you want them to be replaced with their escaped form:

&lt;
&gt;
&quot;
&apos;
&amp;

Solution

There are different methods to accomplish this, depending on which XML creation approach you are using. If you are using XmlTextWriter, the WriteCData and WriteElementString methods take care of this for you. If you are using XmlDocument and XmlElements, the XmlElement.InnerXML and XmlElement.InnerText methods will handle these characters.

The two ways to handle this using an XmlTextWriter work like this. The WriteCData method will wrap the invalid character text in a CDATA section, as shown in the creation of the InvalidChars1 element in the example that follows. The other method, using XmlTextWriter, is to use the WriteElementString method that will automatically escape the text for you, as shown while creating the InvalidChars2 element:

// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlTextWriter writer = new XmlTextWriter(Console.Out);
writer.WriteStartElement("Root");
writer.WriteStartElement("InvalidChars1");
writer.WriteCData(invalidChars);
writer.WriteEndElement( );
writer.WriteElementString("InvalidChars2",invalidChars);
writer.WriteEndElement( );
writer.Close( );

The output from this is:

<Root>
    <InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
    <InvalidChars2>&lt;&gt;\&amp;'</InvalidChars2>
</Root>

The two ways you can handle this problem with XmlDocument and XmlElement are as follows: the first way is to surround the text you are adding to the XML element with a CDATA section, and add it to the InnerXML property of the XmlElement like this:

// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";

The second way is to let the XmlElement class escape the data for you by assigning the text directly to the InnerText property like this:

// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
invalidElement2.InnerText = invalidChars;

The whole XmlDocument is created with these XmlElements in this code:

public static void HandlingInvalidChars( )
{
    // set up a string with our invalid chars
    string invalidChars = @"<>\&'";

    XmlDocument xmlDoc = new XmlDocument( );
    // create a root node for the document
    XmlElement root = xmlDoc.CreateElement("Root");
    xmlDoc.AppendChild(root);

    // create the first invalid character node
    XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
    // wrap the invalid chars in a CDATA section and use the 
    // InnerXML property to assign the value as it doesn't
    // escape the values, just passes in the text provided
    invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";
    // append the element to the root node
    root.AppendChild(invalidElement1);

    // create the second invalid character node
    XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
    // Add the invalid chars directly using the InnerText 
    // property to assign the value as it will automatically
    // escape the values
    invalidElement2.InnerText = invalidChars;
    // append the element to the root node
    root.AppendChild(invalidElement2);

    Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml);
    Console.WriteLine( );
}

The XML created by this procedure (and output to the console) looks like this:

<Root>
    <InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
    <InvalidChars2>&lt;&gt;\&amp;'</InvalidChars2>
</Root>

Discussion

One of the more interesting types of nodes is the CDATA type of node. A CDATA node allows you to represent the items in the text section as character data, not as escaped XML, for ease of entry. Normally these characters would need to be in their escaped format (&lt; for < and so on) but the CDATA section allows us to enter them as regular text.

When the CDATA tag is used in conjunction with the InnerXML property of the XmlElement class, you can submit characters that would normally need to be escaped first. The XmlElement class also has an InnerText property that will automatically escape any markup found in the string assigned. This allows you to add these characters without having to worry about them.

See Also

See the "XmlDocument Class," "XmlElement Class," and "CDATA Sections" topics in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section