dot NET and XML-dot NET and XML

9.1 Defining Serialization

Serialization refers to the process of transforming data from an object instance in memory into a structured representation of that data in a stream. Serialization allows you to preserve the state of an application's objects, whether to simply save the data you're working with or to transmit the data to another application. By using the framework to serialize an object to a stream, you can avoid much of the tedium of hand-coding the logic of reading each field from the object and writing its data to the stream. Instead, the serialization class knows how to do this translation with minimal intervention from the programmer.

The process of reading a stream of data into a new object instance in memory is deserialization, which is the opposite of serialization. Although you would hope that no data would ever be lost in the serialization-deserialization process, the reality is that different formats support different datatypes, and they do not always map well to each other. .NET takes two different approaches to serialization, runtime serialization and XML serialization. Each has its advantages, and I'll compare them later.

Object serialization should not be confused with transaction serialization, in which database transactions are performed in sequence so that each transaction happens in complete isolation from all others.

In runtime serialization, the .NET framework uses a formatter class to create a serialized version of an object, using information available about the object from reflection. Reflection is the mechanism by which objects in memory can be interrogated at runtime for information about their fields, properties, methods, and attributes. Different formatters do the actual work of serialization, based on hints the object provides through reflection.

In XML serialization, the structured representation is defined via XML syntax. I introduced you to a simple form of serialization in Chapter 8, wherein the XmlSerializer class was used to control the transformation specified by an XML Schema document.

XML Schema is one way to specify the serialization format, and I'll talk about .NET's XML serialization functionality in a moment. But first, I'll introduce SOAP.

9.1.1 Introducing the SOAP Specification

SOAP is one of the underlying technologies behind Web Services. I'll talk more about Web Services in Chapter 10.

The development of SOAP began in 1998. The World Wide Web Consortium released a note on SOAP in 2000. .NET explicitly supports section 5 of the SOAP note, which is available on the Web at http://www.w3.org/TR/SOAP/.

What SOAP actually provides is a standard mechanism for packaging data for transmission between interoperating computer systems. While other remote procedure call (RPC) protocols exist, most of them were designed before the era of distributed, object-oriented programming. SOAP's design goals include several features not normally found in RPC protocols:

Distributed garbage collection: Distributed garbage collection allows for objects to be removed from memory automatically when all remote references to them go out of scope.
Message batching: Also known as boxcarring or pipelining, message batching allows several messages to be grouped together for sequential transactional processing.
Objects-by-reference: In the programming concept of pass-by-reference, an instance of an object is passed to methods in such a way that changes to the instance are visible after the method exits. This concept is pretty much a requirement for distributed programming, when you're invoking an object located at a remote machine.
Activation: To instantiate a local object, you use the C# new operator. However, to instantiate an object on a remote machine, there must be something on the other end to receive your request to instantiate the object. Activation refers to the ability to instantiate a remote object.

The SOAP specification is made up of three parts:

SOAP envelope: Because SOAP is a general-purpose messaging framework, one part of the message has to describe the message. The envelope includes such information as what is included in the data, who the message is intended for, and whether the actions described are required or optional. The envelope also provides information on how to deal with errors.
Encoding rules: The encoding rules provide standard data types and structures that disparate systems can use to marshal the data in a SOAP message to their own native data types. These rules are largely based on XML Schema's data types.
RPC representation: The RPC representation rules are used to allow methods on one system to be invoked by code running on another system, including both one-way and two-way messaging.

Although the SOAP specification provides for what is fundamentally a one-way transmission, two-way messaging is possible by sending a SOAP message in response to a SOAP message.

Although SOAP is designed to provide messaging regardless of the underlying network protocols, it is typically implemented atop HTTP. A message follows a message path, along which it may reach a number of different applications. Each application that receives the message must take the following steps:

Examine the message to find all the actions intended for the current application.
If the current application can support all the mandatory actions specified in the message, take those actions. Otherwise, return a fault message.
After removing any parts of the envelope that were intended only for the current application, pass the message along to the next recipient.

Example 9-1 shows a hypothetical SOAP message that might be used to communicate an order between Angus Hardware and a supplier. I'll use this example throughout the discussion of the SOAP specification.

Example 9-1. A SOAP message for a wholesale product order

<?xml version ="1.0" encoding="UTF-8" ?>
<SOAP-ENV:Envelope
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <SOAP-ENV:Header>
    <ns1:terms xmlns:ns1="urn:angushardware"
      SOAP-ENV:MustUnderstand="1">Net 30</ns1:terms>
  </SOAP-ENV:Header>
  
  <SOAP-ENV:Body>
    <ns1:placeOrder
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <productCode xsi:type="xsd:string">99HGTY</productCode>
      <quantity xsi:type="xsd:int">300</quantity>
    </ns1:placeOrder>
  </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

A SOAP message may not contain a document type declaration or processing instructions. If you're creating and responding to SOAP messages automatically through the .NET Framework, this is not an issue. However, you should be aware of this restriction when dealing with SOAP messages produced or consumed by clients and servers written in other frameworks and languages.

9.1.1.1 The SOAP envelope

There are four namespaces included in the SOAP message. The first one, with the prefix SOAP-ENV, refers to the SOAP envelope:

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

The SOAP-ENV:Envelope element has two sub-elements, SOAP-ENV:Header and SOAP-ENV:Body. The SOAP-ENV:Header element is optional, and it provides information to the application that processes the message.

The element within the SOAP-ENV:Header element can be any element from any namespace other than SOAP-ENV, and it can have any attributes without restriction. Although it is open-ended to allow for maximum flexibility, the SOAP-ENV:Header's sub-elements do have two specific optional attributes, MustUnderstand and Actor.

The MustUnderstand attribute has a Boolean value, indicating to the application that if it does not know how to process the message, it must discard the entire message. The Actor attribute, whose value is a URI, can be used when a SOAP message is sent to several applications on a message path, and indicates which application the message is intended for:

<SOAP-ENV:Header>
  <ns1:terms xmlns:ns1="urn:angushardware"
    SOAP-ENV:MustUnderstand="1">Net 30</ns1:terms>
</SOAP-ENV:Header>

In Example 9-1, the ns1:terms element indicates the terms Angus Hardware is willing to give their vendor in purchasing more inventory. The terms are Net 30, and the application processing the order must understand the ns1:terms element in order to continue processing.

Like SOAP-ENV:Header, the SOAP-ENV:Body element can have any sub-elements. However, unlike SOAP-ENV:Header, SOAP-ENV:Body cannot have any attributes. The content of the SOAP-ENV:Body element is a sequence of actions to be processed; in Example 9-1, the action is a call to the placeOrder method:

<SOAP-ENV:Body>
  <ns1:placeOrder
    SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <productCode xsi:type="xsd:string">99HGTY</productCode>
    <quantity xsi:type="xsd:int">300</quantity>
  </ns1:getInventory>
</SOAP-ENV:Body>

The ns1:placeOrder element has one attribute, SOAP-ENV:encodingStyle. This attribute indicates the encoding rules for the message, which I'll talk about shortly.

A SOAP envelope can also contain a response, as shown in Example 9-2.

Example 9-2. A SOAP response message for a wholesale product order

<?xml version ="1.0" encoding="UTF-8" ?>
<SOAP-ENV:Envelope
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <SOAP-ENV:Body>
    <ns1:placeOrderResponse xmlns:ns1="urn:angushardware"
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <ns1:deliveryDate xsi:type="xsd:date">2002-09-04</deliveryDate>
    </ns1:placeOrderResponse>
  </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

Example 9-2 shows a possible response to the message in Example 9-1. In this case, the server responds with a ns1:placeOrderResponse element, containing an ns1:deliveryDate element, which indicates an expected delivery date of September 4, 2002. However, a SOAP response can also include a SOAP-ENV:Fault element, indicating an error. Example 9-3 shows a SOAP fault message.

Example 9-3. A SOAP fault message for a wholesale product order

<?xml version ="1.0" encoding="UTF-8" ?>
<SOAP-ENV:Envelope
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <SOAP-ENV:Body>
    <SOAP-ENV:Fault>
      <SOAP-ENV:faultCode xsi:type="xsd:string">
        SOAP-ENV:MustUnderstand
      </SOAP-ENV:faultCode>
      <SOAP-ENV:faultString xsi:type="xsd:string">
        The server did not understand the header element ns1:terms
      </SOAP-ENV:faultString>
      <SOAP-ENV:faultActor xsi:type="xsd:string">
        Smith's Sprocket Company
      </SOAP-ENV:faultActor>
    </SOAP-ENV:Fault>
  </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

As you can see, this SOAP message is very similar in overall structure to both the previous examples, except that it includes a SOAP-ENV:Fault element.

The SOAP-ENV:faultCode element contains a fault code, which indicates the type of error that occurred. SOAP-ENV:faultString provides a human-readable explanation of the fault; there's also an optional SOAP-ENV:detail element, which could provide more information about the error. Finally, the SOAP-ENV:faultActor indicates which actor had the fault. In this case, the SOAP server at Smith's Sprocket Company has indicated that it does not understand the ns1:terms element in the request header.

A few standard SOAP-ENV:faultCode values are defined in the SOAP specification, and they are listed in Table 9-1.

Table 9-1. Standard SOAP fault codes

Fault code

Fault description

SOAP-ENV:VersionMismatch

The namespace URI for the SOAP envelope does not match the content.

SOAP-ENV:MustUnderstand

The actor designated to process an element in the SOAP header with a SOAP-ENV:mustUnderstand value of 1 could not process the element.

SOAP-ENV:Client

The message was not properly formed or was missing some information, and should not be resent without correcting the errors.

SOAP-ENV:Server

The message could not be processed for some reason other than its content or structure. For example, some external process required to process the message may have failed.

The SOAP-ENV:Header and SOAP-ENV:Fault elements are optional in any SOAP message. The two SOAP messages, request and response, actually have identical structure; the only difference is in which of the two optional elements are included, and in the specific syntax of the body elements.

The overall structure of the SOAP message is extremely flexible, which makes it important for the client and server to have well-established rules for the syntax of their communications.

The XML Schema Definition for the SOAP envelope is located at http://schemas.xmlsoap.org/soap/envelope/, and it makes a fine reference to the contents of the envelope since you learned about the XML Schema language in Chapter 8.

There are places where the XSD differs from the prose of the specification. However, the XSD documents these differences in xs:documentation elements.

9.1.1.2 Encoding rules

Several of the elements in the SOAP messages I just discussed include the SOAP-ENV:encodingStyle attribute:

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"

The SOAP-ENV:encodingStyle attribute specifies the encoding system to be used. The encoding system is a mutually agreed-upon way of representing data in a SOAP message. Although the examples use the SOAP 1.1 encoding rules, you may actually use any encoding you wish, or none at all, in your SOAP envelope. To use the SOAP 1.2 encoding system, for example, you would specify http://www.w3.org/2003/05/soap-encoding. The URI, like namespace URIs, is used as a unique name rather than an actual Internet resource.

The encoding style applies to the entire scope of the element on which the attribute appears. In Examples 9-1 and 9-2, the SOAP-ENV:encodingStyle attributes appear on the ns1:placeOrder and ns1:placeOrderResponse elements, respectively.

9.1.1.3 RPC representation

Although the SOAP envelope provides information about the remote method to be called and the encoding of the data being passed to the remote object, it does not inherently contain any information about the remote program or system. SOAP depends on the transport protocol to provide this. While HTTP works very well as a transport, and SOAP has specific bindings for HTTP, there is nothing in the specification that limits your choice of transport protocols.

There's not enough room in this book to cover everything about SOAP. For more information, see Programming Web Services with SOAP (O'Reilly).

9.1.2 When to Use Serialization

In general, you should consider using XML serialization when your application requires data to be exchanged between possibly disparate systems, whose only commonality might be the ability to read and write XML. Although this case covers XML serialization in general, it's also important to determine which form of serialization to use.

Simple XML serialization is appropriate when you have an existing XML schema (whether a formal W3C XML Schema or simply an agreed-upon format) and wish to read the data into an object; or when you have existing objects and wish to produce a representation of their data in an XML format. These cases usually involve non-interactive data exchange; that is, data is being exchanged, but not in a Web Services context.

On the other hand, you should use SOAP serialization when you know that your data exchange partner supports it, or when you are designing a new distributed application that requires interactive data exchange.

Finally, you should consider runtime serialization when the communication is happening between two .NET applications.

9.1.3 SOAP Versus XML-RPC

Remote procedure calling (RPC) refers to the ability to invoke a method of an object that resides outside of the caller's address space, as if it were local. Although RPC is an old term, dating back to the early days of networked computers, the concept of using XML as an RPC mechanism dates back to the early days of Web Services.

SOAP is not the only XML RPC mechanism; in fact, another mechanism, called XML-RPC, is arguably simpler and easier to use. However, this simplicity comes at the expense of flexibility. Although XML-RPC evolved from an early version of SOAP, Microsoft has chosen not to support XML-RPC directly in .NET. However, there is nothing to stop some enterprising developer from producing an XML-RPC framework for .NET.

In fact, Charles Cook has developed just such a beast. Cook Computing offers XML-RPC.NET, currently at Version 0.8.1 as of this writing. XML-RPC.NET is available for download at http://www.xml-rpc.net/, licensed under the Lesser GNU Public License.

As I said earlier, .NET supports two general methods of serialization. Which one you should choose depends on your needs.

[ Team LiB ]