[ Team LiB ] |
Recipe 17.7 Handling Invalid Characters in anXML StringProblemYou are creating an XML string. Before adding a tag containing a text element, you want to check it to determine whether the string contains any of the following invalid characters:
If any of these characters are encountered, you want them to be replaced with their escaped form:
SolutionThere are different methods to accomplish this, depending on which XML creation approach you are using. If you are using XmlTextWriter, the WriteCData and WriteElementString methods take care of this for you. If you are using XmlDocument and XmlElements, the XmlElement.InnerXML and XmlElement.InnerText methods will handle these characters. The two ways to handle this using an XmlTextWriter work like this. The WriteCData method will wrap the invalid character text in a CDATA section, as shown in the creation of the InvalidChars1 element in the example that follows. The other method, using XmlTextWriter, is to use the WriteElementString method that will automatically escape the text for you, as shown while creating the InvalidChars2 element: // set up a string with our invalid chars string invalidChars = @"<>\&'"; XmlTextWriter writer = new XmlTextWriter(Console.Out); writer.WriteStartElement("Root"); writer.WriteStartElement("InvalidChars1"); writer.WriteCData(invalidChars); writer.WriteEndElement( ); writer.WriteElementString("InvalidChars2",invalidChars); writer.WriteEndElement( ); writer.Close( ); The output from this is: <Root> <InvalidChars1><![CDATA[<>\&']]></InvalidChars1> <InvalidChars2><>\&'</InvalidChars2> </Root> The two ways you can handle this problem with XmlDocument and XmlElement are as follows: the first way is to surround the text you are adding to the XML element with a CDATA section, and add it to the InnerXML property of the XmlElement like this: // set up a string with our invalid chars string invalidChars = @"<>\&'"; XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1"); invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>"; The second way is to let the XmlElement class escape the data for you by assigning the text directly to the InnerText property like this: // set up a string with our invalid chars string invalidChars = @"<>\&'"; XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2"); invalidElement2.InnerText = invalidChars; The whole XmlDocument is created with these XmlElements in this code: public static void HandlingInvalidChars( ) { // set up a string with our invalid chars string invalidChars = @"<>\&'"; XmlDocument xmlDoc = new XmlDocument( ); // create a root node for the document XmlElement root = xmlDoc.CreateElement("Root"); xmlDoc.AppendChild(root); // create the first invalid character node XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1"); // wrap the invalid chars in a CDATA section and use the // InnerXML property to assign the value as it doesn't // escape the values, just passes in the text provided invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>"; // append the element to the root node root.AppendChild(invalidElement1); // create the second invalid character node XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2"); // Add the invalid chars directly using the InnerText // property to assign the value as it will automatically // escape the values invalidElement2.InnerText = invalidChars; // append the element to the root node root.AppendChild(invalidElement2); Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml); Console.WriteLine( ); } The XML created by this procedure (and output to the console) looks like this: <Root> <InvalidChars1><![CDATA[<>\&']]></InvalidChars1> <InvalidChars2><>\&'</InvalidChars2> </Root> DiscussionOne of the more interesting types of nodes is the CDATA type of node. A CDATA node allows you to represent the items in the text section as character data, not as escaped XML, for ease of entry. Normally these characters would need to be in their escaped format (< for < and so on) but the CDATA section allows us to enter them as regular text. When the CDATA tag is used in conjunction with the InnerXML property of the XmlElement class, you can submit characters that would normally need to be escaped first. The XmlElement class also has an InnerText property that will automatically escape any markup found in the string assigned. This allows you to add these characters without having to worry about them. See AlsoSee the "XmlDocument Class," "XmlElement Class," and "CDATA Sections" topics in the MSDN documentation. |
[ Team LiB ] |