Sie sind auf Seite 1von 14

Charteris White Paper:

Deserializing Individual
Elements in XML Documents
Version 1.0
Thomas Manson (thomas.manson@charteris.com)
12 May 2003

2003 Charteris plc


CONTENTS
1. INTRODUCTION 3
2. BACKGROUND 3
3. LIMITATIONS OF MONOLITHIC DESERIALIZATION 3
4. DESERIALIZING INDIVIDUAL ELEMENTS IN XML DOCUMENTS 4
5. TEST RESULTS 12
6. CONCLUSIONS 14

20 May 2003 Deserializing Individual Elements in XML Documents Page 2


Version 1.0 Charteris White Paper:
1. INTRODUCTION
This paper discusses one of the limitations of common implementations of XML
deserialization using the .NET Framework. It then discusses a possible solution, and
highlights the differences between the standard solution and the proposed one. It is
assumed that the reader has some familiarity with XML Serialization, when it is used,
and how this is implemented within the .NET Framework.
This paper focuses on the current version of the .NET Framework at the time of
writing – version 1.1. However, the recommendations should still be valid for version
1.0.
2. BACKGROUND
XML Serialization is the process of converting an object to a form that can be easily
transported. For example, an object can be serialized and transported over HTTP.
XML Deserialization is used on the receiving system to create an object tree from
XML. There is not necessarily any correlation between the system doing the
serialization, and the system doing the deserialization – they may be on different
platforms and/or using different technologies to process the requests. The object that
is created as a result of deserialization is not the same object that was serialized; it only
has the same public properties.
Typically, when a .NET system is built to handle incoming XML, the system will have a
number of classes that conform to the XML schema definition language schema for the
incoming XML. These can either be generated by hand, or by using XSD.exe.
When incoming XML is received, the .NET framework will create instances of the
classes, and populate their public properties according to the XML received. By default,
this is a monolithic process – the XmlSerializer will read the entire stream of XML and
populate all the objects.

3. LIMITATIONS OF MONOLITHIC DESERIALIZATION


Depending on the XML being received, deserializing the entire stream may not be
appropriate. If the stream is large, the resulting in-memory representation may consume
significant amounts of memory. The processing logic may also decide to stop
processing the XML after only processing a relatively small number of the created
entities. This results in suboptimal performance as the system has had to create all the
entities, only not to use them. Also, as each object created is part of the entire object
tree, none are available to garbage collection until the entire tree has been processed,
even though many objects have been processed.
It is proposed in this paper that, in some cases, it would be better to only deserialize the
entities as they are required. If processing needs to stop, the rest of the XML has not
been deserialized, so the are no unnecessary objects. Once an individual item has been

20 May 2003 Deserializing Individual Elements in XML Documents Page 3


Version 1.0 Charteris White Paper:
deserialized and processed, it is not part of an object tree, and is thus available for
garbage collection.
4. DESERIALIZING INDIVIDUAL ELEMENTS IN XML
DOCUMENTS
To demonstrate deserializing individual elements, this paper will use a fictitious example
of an Estate Agent system that exchanges agent and property details with other
systems. This system allows estate agents across the country to see properties that are
outside of their specific areas.
To facilitate the exchange of the agent and property details, a schema has been drawn
up, and it is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
targetNamespace="www.charteris.com/namespaces/propertyexchange"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="www.charteris.com/namespaces/propertyexchange"
elementFormDefault="qualified">
<xs:element name="PropertyExchange" type="PropertyExchangeType"/>
<xs:complexType name="PropertyExchangeType">
<xs:sequence>
<xs:element name="Agents" type="AgentsType"/>
<xs:element name="Properties" type="PropertiesType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AgentsType">
<xs:sequence>
<xs:element name="Agent" type="AgentType" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PropertiesType">
<xs:sequence>
<xs:element name="Property" type="PropertyType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:simpleType name="AgentID">
<xs:restriction base="xs:string">
<xs:minLength value="5"/>
<xs:maxLength value="30"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="Line1" type="xs:string"/>
<xs:element name="Line2" type="xs:string"/>
<xs:element name="Line3" type="xs:string"/>
<xs:element name="Line4" type="xs:string"/>
<xs:element name="PostalCode" type="xs:string"/>
</xs:sequence>

20 May 2003 Deserializing Individual Elements in XML Documents Page 4


Version 1.0 Charteris White Paper:
</xs:complexType>
<xs:complexType name="AgentType">
<xs:sequence>
<xs:element name="AgentID" type="AgentID" minOccurs="0"/>
<xs:element name="Name" type="xs:string" minOccurs="0"/>
<xs:element name="Address" type="AddressType"
minOccurs="0"/>
</xs:sequence>
<xs:attribute name="action" type="Action" use="required"/>
</xs:complexType>
<xs:complexType name="PropertyType">
<xs:sequence>
<xs:element name="PropertyID" type="PropertyID"
minOccurs="0"/>
<xs:element name="OwningAgentID" type="AgentID"
minOccurs="0"/>
<xs:element name="Address" type="AddressType"
minOccurs="0"/>
<xs:element name="Price" type="xs:int" minOccurs="0"/>
<xs:element name="DateListed" type="xs:date" minOccurs="0"/>
<xs:element name="PropertyDetails" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="Type">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Detached House"/>
<xs:enumeration value="Semi Detached House"/>
<xs:enumeration value="Terraced House"/>
<xs:enumeration value="End of terrace House"/>
<xs:enumeration value="Flat"/>
<xs:enumeration value="Bungalow"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="BedRooms" type="xs:int"/>
<xs:element name="Bathrooms" type="xs:int"/>
<xs:element name="Kitchen" type="xs:int"/>
<xs:element name="ReceptionRooms" type="xs:int"/>
<xs:element name="Garage" type="xs:int"
minOccurs="0"/>
<xs:element name="OffRoadParking" type="xs:int"
minOccurs="0"/>
<xs:element name="Built" type="xs:gYear"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Viewings" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Appointment"/>
<xs:enumeration value="PhoneBefore"/>
<xs:enumeration value="Weekdays"/>

20 May 2003 Deserializing Individual Elements in XML Documents Page 5


Version 1.0 Charteris White Paper:
<xs:enumeration value="Weekends"/>
<xs:enumeration value="AllWeek"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="action" type="Action" use="required"/>
</xs:complexType>
<xs:simpleType name="Action">
<xs:restriction base="xs:string">
<xs:enumeration value="Add"/>
<xs:enumeration value="Update"/>
<xs:enumeration value="Delete"/>
<xs:enumeration value="Query"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="PropertyID">
<xs:restriction base="xs:string">
<xs:minLength value="5"/>
<xs:maxLength value="30"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

Using XSD.exe to generate the required classes results in the following classes for the
PropertyExchangeType, the AgentType and the PropertyType types (the complete
sample application source code can be downloaded from
http://www.charteris.com/Publications/WhitePapers/Downloads/PropertyExchange.
zip).
/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris.
com/namespaces/propertyexchange")]
[System.Xml.Serialization.XmlRootAttribute("PropertyExchange",
Namespace="www.charteris.com/namespaces/propertyexchange",
IsNullable=false)]
public class PropertyExchangeType {

/// <remarks/>
[System.Xml.Serialization.XmlArrayItemAttribute("Agent",
IsNullable=false)]
public AgentType[] Agents;

/// <remarks/>
[System.Xml.Serialization.XmlArrayItemAttribute("Property",
IsNullable=false)]
public PropertyType[] Properties;
}

/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris.
com/namespaces/propertyexchange")]

20 May 2003 Deserializing Individual Elements in XML Documents Page 6


Version 1.0 Charteris White Paper:
public class AgentType {

/// <remarks/>
public string AgentID;

/// <remarks/>
public string Name;

/// <remarks/>
public AddressType Address;

/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public Action action;
}

/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris.
com/namespaces/propertyexchange")]
public class PropertyType {

/// <remarks/>
public string PropertyID;

/// <remarks/>
public string OwningAgentID;

/// <remarks/>
public AddressType Address;

/// <remarks/>
public int Price;

/// <remarks/>
[System.Xml.Serialization.XmlIgnoreAttribute()]
public bool PriceSpecified;

/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute(DataType="date")]
public System.DateTime DateListed;

/// <remarks/>
[System.Xml.Serialization.XmlIgnoreAttribute()]
public bool DateListedSpecified;

/// <remarks/>
public PropertyTypePropertyDetails PropertyDetails;

/// <remarks/>
public PropertyTypeViewings Viewings;

/// <remarks/>
[System.Xml.Serialization.XmlIgnoreAttribute()]
public bool ViewingsSpecified;

20 May 2003 Deserializing Individual Elements in XML Documents Page 7


Version 1.0 Charteris White Paper:
/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public Action action;
}

If the standard model of deserialization was used, the PropertyExchangeType would


be the root of the object tree, and it would contain arrays of AgentType and
PropertyType. Before any of the AgentTypes could be processed, all of the
PropertyTypes would have to be deserialized, and there could be an unlimited
number of them. It would be potentially more efficient if each AgentType could be
processed independently before the next one was deserialized, and then the same
applied to the PropertyTypes.
To do this, an XmlNodeDeserializer class has been added to the project. This class will
deserialize an individual element into an object, given the XmlReader that contains
the XML to read.
To deserialize an element independently of the rest of the XML document, the
XmlSerializer has to consider the element to be the root node of the document. This
could be done by modifying the code generated by XSD.exe and changing the
XmlTypeAttribute to an XmlRootAttribute. However, this means that if the
schema should change, and the code need to be regenerated, the change would need to
be made again. It would also affect the serialization of the PropertyExchangeType for
outgoing messages.
The same affect can be achieved by overriding the current XmlTypeAttribute that is
applied to the AgentType during the process of deserialization. This is done using the
XmlAttributeOverrides class, and is shown in the code below. When the
XmlSerializer deserializes the XML, it will use the attributes in the
XmlAttributesOverrides, to override the ones applied to the classes.
// As we are only deserializing a fragment, we need to add
// the XmlRootAttribute
// This is done by overriding the current attribute
XmlAttributes xmlAttribs = new XmlAttributes();
// Create the new XmlRootAttribute and set its
// name and namespace
XmlRootAttribute rootAttrib = new XmlRootAttribute(elementName);
rootAttrib.Namespace = ns;

xmlAttribs.XmlRoot = rootAttrib;

// Create the overrides object and add the attributes


XmlAttributeOverrides overrides = new XmlAttributeOverrides();
overrides.Add(objectType, xmlAttribs);

// Use the overrides to deserialize


xmlSer = new XmlSerializer(objectType, overrides);

// Now actually deserialize the object

20 May 2003 Deserializing Individual Elements in XML Documents Page 8


Version 1.0 Charteris White Paper:
object returnData = xmlSer.Deserialize(xmlReader);

Creating an XmlSerializer is an expensive operation. If this is done repeatedly for each


element to deserialize, it will cripple the performance of the application. To prevent this
happening, XmlNodeDeserializer stores the created XmlSerializer in a private static
Hashtable. The XmlSerializer is not thread safe for instance methods, so each instance
of the XmlSerializer is stored in the Hashtable by type and ThreadID. A new instance
of the XmlSerializer will be created for each thread, and will be dedicated to that thread.
In a single-threaded application, this will result in an instance of the XmlSerializer being
created for each type to deserialize. In a multithreaded application, an instance will be
created for each type for each thread. The complete code for the class is shown below.
using System;
using System.Collections;
using System.Xml;
using System.Xml.Serialization;

namespace PropertyExchange
{
/// <summary>
/// Summary description for XmlNodeDeserializer.
/// </summary>
internal class XmlNodeDeserializer
{
static Hashtable serializerCache = new Hashtable(2);
static object serializerCacheLock = new object();

const string ns =
"www.charteris.com/namespaces/propertyexchange";

internal XmlNodeDeserializer() {
}

internal object Deserialize(XmlReader xmlReader) {

XmlReader localReader;
// If xmlReader is a validatingReader, need to use
// its reader
if (xmlReader is XmlValidatingReader) {
localReader = ((XmlValidatingReader)xmlReader).Reader;
}
else {
localReader = xmlReader;
}
string elementName = string.Empty;
Type objectType = null;

if (localReader.NodeType == XmlNodeType.Element) {
if (localReader.NamespaceURI == ns) {
switch (localReader.LocalName) {
case "Agent":

20 May 2003 Deserializing Individual Elements in XML Documents Page 9


Version 1.0 Charteris White Paper:
elementName = "Agent";
objectType = typeof(AgentType);
break;
case "Property":
elementName = "Property";
objectType = typeof(PropertyType);
break;
default:
throw new XmlException("Unrecognised node: "
+ localReader.LocalName);
}
}
else {
throw new XmlException("Unrecognised namespace on node:"
+ localReader.Name);
}
}
else {
throw new XmlException("xmlReader must be on the element
to deserialize");
}

XmlSerializer xmlSer = GetXmlSerializer(elementName,


objectType);

object returnData = xmlSer.Deserialize(xmlReader);


return returnData;

private XmlSerializer GetXmlSerializer(string elementName, Type


objectType) {

// Attempt to retrieve the XmlSerializer from the hashtable


XmlSerializer xmlSer =
(XmlSerializer)serializerCache[objectType.FullName
+ AppDomain.GetCurrentThreadId().ToString()];

if (xmlSer == null) {
// As we are only deserializing a fragment, we need to
// add the XmlRootAttribute
// This is done by overriding the current attribute
XmlAttributes xmlAttribs = new XmlAttributes();
// Create the new XmlRootAttribute and set its
// name and namespace
XmlRootAttribute rootAttrib = new
XmlRootAttribute(elementName);
rootAttrib.Namespace = ns;

xmlAttribs.XmlRoot = rootAttrib;

// Create the overrides object and add the attributes


XmlAttributeOverrides overrides = new
XmlAttributeOverrides();

20 May 2003 Deserializing Individual Elements in XML Documents Page 10


Version 1.0 Charteris White Paper:
overrides.Add(objectType, xmlAttribs);

// Use the overrides to deserialize


xmlSer = new XmlSerializer(objectType, overrides);

// Store the XmlSerializer in the Hashtable


// by type and threadid
serializerCache.Add(objectType.FullName +
AppDomain.GetCurrentThreadId().ToString(), xmlSer);
}
return xmlSer;
}
}
}

The result of this is that the Deserialize method of the XmlNodeDeserializer will
deserialize the current element in the XML, and return it. It will also have moved the
current position in the XmlReader to the end of closing element tag. However, for this
to work, the XmlReader has to be at the start of the required element (in this case,
either an <Agent> or <Property> tag) when the Deserialize method is called.
Below is the code that is used to set up the XmlReader, move it to the correct position,
and then call the Deserialize method. In this case an XmlValidatingReader has
been used, but if schema validation is not required, an XmlTextReader can be used.
Note that if an XmlValidatingReader is used, only those nodes that are actually
read are validated against the schema. So in this case, if the XML is invalid somewhere
within the Properties node, the Agents will still get deserialized. This may or may not be
appropriate, depending on the application requirements.
const string ns = "www.charteris.com/namespaces/propertyexchange";

// Deserialize the agents each one in turn


XmlTextReader reader = new XmlTextReader(fileName);
XmlValidatingReader valReader = new XmlValidatingReader(reader);
valReader.Schemas.Add(ns, schemaFile);

// Move the reader to the start of the agents


valReader.ReadStartElement("PropertyExchange", ns);
valReader.ReadStartElement("Agents", ns);

XmlNodeDeserializer ser = new XmlNodeDeserializer();


AgentType agent = null;
while (valReader.Reader.LocalName == "Agent" &&
valReader.Reader.NamespaceURI == ns) {

agent = (AgentType)ser.Deserialize(valReader);

// Simulate the processing of the agent


Console.WriteLine("Agent Name: {0}", agent.Name);
}

20 May 2003 Deserializing Individual Elements in XML Documents Page 11


Version 1.0 Charteris White Paper:
The significant lines of code are the two ReadStartElement lines. These are used to
move the current position within the XmlTextReader to the <Agents> node. When
the Deserialize method is called, it will start reading at the next node, which is the
<Agent> node. Each agent is deserialized, one at a time, until the current node is not an
<Agent> node. This also takes care of the situation where there are no <Agent> nodes
in the xml.
Once all the agents have been processed, the XmlTextReader is moved to the
<Properties> node, and the process is repeated for each of the <Property> nodes. If
agent nodes have been processed, the current node will be of type EndElement.
However, if no agent nodes have been processed (because there were none in the
XML), current node will be of type Element. So before moving, a check is made on the
current node type, and if it is EndElement, the ReadEndElement method is called.
Once all the nodes have been read, the Close method on the XmlTextReader is called
to close the file.
5. TEST RESULTS
To test the memory usage and execution times, a large XML file was used with 113
agents and 180 properties. Both the XmlValidatingReader and the XmlTextReader
readers were used. The total execution time and memory usage was recorded for 3 runs
of each of the following:
♦ Standard deserialization where the entire xml is deserialized first, before processing
starts,
♦ Item deserialization, where each agent and property is deserialized and processed
before the next one is deserialized.
The timings were also collected for the full processing of the entire XML, and when the
processing was halted after the first agent was processed.
The results are shown in the graphs below.

20 May 2003 Deserializing Individual Elements in XML Documents Page 12


Version 1.0 Charteris White Paper:
Execution Times

1.6

1.4

1.2

1
Seconds

Full
0.8
Failed
0.6

0.4

0.2

0
Standard Item Deserialization Standard Item Deserialization
Deserialization Deserialization with with validation
validation

Working Set

18000000

17500000

17000000
Full
Failed
16500000

16000000

15500000
Standard Item Standard Item
Deserialization Deserialization Deserialization Deserialization
with validation with validation
Bytes

The above graphs show that deserializing each item just prior to processing is more
expensive in terms of execution time, by 32% for both an XmlTextReader and an
XmlValidatingReader. However, if processing fails early on, the execution times

20 May 2003 Deserializing Individual Elements in XML Documents Page 13


Version 1.0 Charteris White Paper:
for item deserialization drop significantly, but for standard deserialization, they hardly
change.
When looking at the working set required by the process, a very similar picture
emerges, but the differences are not as large. Standard deserialization requires less
memory if everything completes (by 1.4% for an XmlTextReader and 1.1% for an
XmlValidatingReader). Again, if processing fails early on, there is a memory saving
using the node deserialization (3% for an XmlTextReader and 3.2% for an
XmlValidatingReader).
The objects that were created were simple entities that were easy to serialize and
deserialize. If the complexity of the objects were to change, it is doubtful if the above
results could be extrapolated to cover that situation without further study. Also, as the
actual memory required is small, and the tests short, it is unlikely that the garbage
collector (GC) will have run. How the GC will be able to recover memory will be
application-dependant, but it is expected that in most cases, when an item has been
deserialized and processed, it will be available for garbage collection. If the entire xml
has been deserialized, each object is still reference within the object tree, so none of the
objects are available for garbage collection. This would have the effect of increasing the
memory savings shown by deserializing each item as it is required.

6. CONCLUSIONS
If an application deserializes large XML documents, this can consume significant
amounts of memory. If this is done before any of the objects created are processed, it
may result in some of the objects never being used, as processing may be stopped
before they are used. As has been shown in this paper, significant gains can be made by
only deserializing objects as they are required. However, it should be remembered that
this will not always be appropriate, especially if it is expected that most of the time all
the entities will be processed.

20 May 2003 Deserializing Individual Elements in XML Documents Page 14


Version 1.0 Charteris White Paper:

Das könnte Ihnen auch gefallen