Sie sind auf Seite 1von 26

CSCE 547

Windows Programming

XML Support
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29208
Why XML?
XML stands for eXtensible Markup Language.
XML is an extension of HTML; it is designed to express the structure
of data and information about how to render the data.
Some organizations are embarked on defining standards that use XML
to express the semantics of their domains (healthcare, automotive,
security and the military).
WHY XML? Because:

1. It is just text, readable by any OS (Linux, MacOs, WinTel, etc)

and humans
2. It has become the de facto standard adopted by everybody who
is somebody wishing to communicate data over the WWW

This chapter discusses .NET support for XML.

CSCE 547 Fall 2002 Ch 13 - 2
Why XML?
XML encourages the separation of interface from structured data,
allowing the seamless integration of data from diverse sources, and
providing the infrastructure to create N-tier architectures.


CSCE 547 Fall 2002 Ch 13 - 3

XML Documents
XML documents can be described in terms of their logical and physical

The logical structure is a function of the XML elements and attributes

contained in the document.

The physical structure is the set of storage units in which the

document actually exists. These units, called entities, could be a
stream of characters or a (set of) files.

XML documents contain two parts, called the header and the content.

Typically, the header contains declarations or processing instructions

(commands for the XML processor).

CSCE 547 Fall 2002 Ch 13 - 4

XML Documents Can Contain
• Processing Instructions (aka PIs) delimited by
<? . . . ?>
• Declarations, in the form <! aDeclaration >
• Elements
• Attributes
• Entities
• Comments

Typically, you will include in the header declarations and/or

processing instructions

CSCE 547 Fall 2002 Ch 13 - 5

Processing Instructions and Declarations

<?xml version="1.0"?>
<?xml-stylesheet href="XSL\DotNet.html.xsl" type="text/xsl"?>
<?xml-stylesheet href="XSL\DotNet.wml.xsl" type="text/xsl"
<?cocoon-process type="xslt"?>

<?xml-stylesheet type="text/xsl" href="Guitars.xsl"?>

<?xml version="1.0" encoding="UTF-16"?>

<!NOTATION PNG SYSTEM “program.exe”>
<!ATTLIST . . . >


<!ENTITY memoText “blablabla”>
<memo> & memoText; </memo> & is Reference Notation

CSCE 547 Fall 2002 Ch 13 - 6

XML Elements
XML elements are made up of a start tag, an end tag, and data in
between. The start and end tags describe the data or value of the
<Student> Anita Donut </Student>
<CarDriver> Anita Donut </CarDriver>
<BloodDonor> Anita Donut </BloodDonor>

Elements can be empty, e.g.,

<memo> </memo>

But this only makes sense when creating attributes. The preferred
way is:

<memo />

Attributes define properties for an element. XML elements can

contain one or more attributes
CSCE 547 Fall 2002 Ch 13 - 7
XML Elements
The XML tree in Figure 13-1 was
produced by the code below Using attributes:
<?xml version="1.0"?>
<Guitars> <Guitar Year="1977">
<Guitar> <Make>Gibson</Make>
<Make>Gibson</Make> <Model>SG</Model>
<Model>SG</Model> <Color>Tobacco Sunburst</Color>
<Year>1977</Year> <Neck>Rosewood</Neck>
<Color>Tobacco Sunburst</Color> </Guitar>
<Guitar> <Guitar Image="MySG.jpeg">
<Make>Fender</Make> <Make>Gibson</Make>
<Model>Stratocaster</Model> <Model>SG</Model>
<Year></Year> <Year>1977</Year>
<Color>Black</Color> <Color>Tobacco Sunburst</Color>
<Neck>Maple</Neck> <Neck>Rosewood</Neck>
</Guitar> </Guitar>

CSCE 547 Fall 2002 Ch 13 - 8

Name Spaces
XML uses name spaces to avoid name collisions, such that, e.g.,
gibson:color and fender:color may refer to different elements

<?xml version="1.0"?>
<gibson:Color>Tobacco Sunburst</gibson:Color>

CSCE 547 Fall 2002 Ch 13 - 9

Default Name Spaces
A default space is declared with no tag. The XML in the previous slide
has the same content as this one.

<?xml version="1.0"?>
<win:Guitars Default Name Space
<gibson:Color>Tobacco Sunburst</gibson:Color>

CSCE 547 Fall 2002 Ch 13 - 10

Document Validation
“Well-formed” documents satisfy XML syntactic rules. Well-formed documents may be
validated against schema documents, which define in great detail how elements in the
document must be written.
<?xml version="1.0"?> Document is a schema
schema id="Guitars" xmlns=""
<xsd:element name="Guitars">
<xsd:complexType> As of 2001, this was the
<xsd:choice maxOccurs="unbounded"> mother of all schemas
<xsd:element name="Guitar">
<xsd:element name="Make" type="xsd:string" />
<xsd:element name="Model" type="xsd:string" />
<xsd:element name="Year" type="xsd:gYear"
minOccurs="0" />
<xsd:element name="Color" type="xsd:string"
minOccurs="0" />
<xsd:element name="Neck" type="xsd:string"
minOccurs="0" />
</xsd:sequence> The definitions in red come from
</xsd:complexType> the XMLSchema document
CSCE 547 Fall 2002 Ch 13 - 11
Parsing XML
There are two main APIs for XML parsers: DOM and SAX. The differences are significant.
DOM parsers assume that the entire document resides in memory, while SAX parsers do
their work under an event-driven model.
DOM offers the advantage of random-access while SAX offers advantages derived from
the event-driven style of processing.
Microsoft offers a DOM-based parser, MSXML.dll as part of IE in Windows.
The DOM tree of Figure 13-2 can be produced by:

<?xml version="1.0"?> <?xml version="1.0"?>

<Guitars> <Guitars>
<Guitar Image="MySG.jpeg">
<Guitar Image="MySG.jpeg">
<Make>Gibson</Make> <Model>SG</Model>
<Model>SG</Model> <Year>1977</Year>
<Year>1977</Year> <Color>Tobacco Sunburst</Color>
<Color>Tobacco Sunburst</Color> <Neck>Rosewood</Neck>
<Neck>Rosewood</Neck> </Guitar>
<Guitar Image="MyStrat.jpeg"
PreviousOwner="Eric Clapton">
</Guitars> <Make>Fender</Make>

CSCE 547 Fall 2002 Ch 13 - 12

This sample code reads XML using MSXML.dll.
Although the code is great fun to decipher, not every not enjoys doing so L
The crucial code is

Create a COM object to host the parser in the

memory of this process
hr = CoCreateInstance (CLSID_DOMDocument, NULL,

Use the parser to load XML doc from file

hr = pDoc
pDoc->load (var, &success);
Get element given tag into pNodeList

hr = pDoc
pDoc->getElementsByTagName (tag, &pNodeList);

CSCE 547 Fall 2002 Ch 13 - 13

The code below also reads the Guitars.xml file and writes into the console the values
associated to the “Guitar” tag.
The entire code is:

using System;
using System.Xml;

class MyApp
static void Main ()
XmlDocument doc = new XmlDocument ();
doc.Load ("Guitars.xml");
XmlNodeList nodes = doc.GetElementsByTagName
GetElementsByTagName ("Guitar");
foreach (XmlNode
XmlNode node in nodes) {
Console.WriteLine ("{0} {1}", node["Make"].InnerText,

CSCE 547 Fall 2002 Ch 13 - 14

XmlDocument Class
This class is compatible with DOM level 2. Using that class is quite trivial, even to
discover the contents of the nodes in the document

XmlDocument doc = new XmlDocument ();

doc.Load ("Guitars.xml"); Document points to root
OutputNode (doc.DocumentElement); when loaded
void OutputNode (XmlNode node) XmlNode is a class that contains
{ type, name and value information
node.NodeType, node.Name, node.Value);

if (node.HasChildNodes) {
XmlNodeList children = node.ChildNodes;
foreach (XmlNode child in children)
OutputNode (child);
} The items in red are defined in
the Xml Name Space

CSCE 547 Fall 2002 Ch 13 - 15

Inspecting Attributes
A node may have a collection named Attributes
Attributes, which may contain XmlAttribute items,
which in turn may contain type, name and value

void OutputNode (XmlNode node)

Console.WriteLine ("Type={0}\tName={1}\tValue={2}",
node.NodeType , node.Name
, node.Value

if (node.Attributes != null) { Attributes and XmlAttribute

foreach (XmlAttribute attr in node.Attributes)
Console.WriteLine ("Type={0}\tName={1}\tValue={2}",
attr.NodeType, attr.Name, attr.Value);

if (node.HasChildNodes) {
foreach (XmlNode child in node.ChildNodes)
OutputNode (child);
} HasChildNode and ChildNodes

CSCE 547 Fall 2002 Ch 13 - 16

This class is a forward-only reader, which, as the ADO.NET DataReader
class, provides a fast mechanism for traversing through an XML
XmlTextReader reader = null;
try {
reader = new XmlTextReader ("Guitars.xml");
reader.WhitespaceHandling = WhitespaceHandling.None;
while (reader.Read ()) {
if (reader.NodeType == XmlNodeType.Element &&
reader.Name == "Guitar" &&
reader.AttributeCount > 0) {
while (reader.MoveToNextAttribute ()) {
if (reader.Name == "Image") {
Console.WriteLine (reader.Value);
finally {
if (reader != null)
reader.Close ();

CSCE 547 Fall 2002 Ch 13 - 17

Hopefully you guessed it: This class performs validations while
reading. Validation could be against schemas of types DTD XSD, XDR
using System; using System.Xml;
using System.Xml.Schema;
class MyApp {
static void Main (string[] args) {
if (args.Length < 2) {
Console.WriteLine ("Syntax: VALIDATE xmldoc schemadoc");
XmlValidatingReader reader = null;
try {
XmlTextReader nvr = new XmlTextReader (args[0]);
nvr.WhitespaceHandling = WhitespaceHandling.None;
reader = new XmlValidatingReader (nvr);
reader.Schemas.Add (GetTargetNamespace (args[1]), args[1]);
reader.ValidationEventHandler +=
new ValidationEventHandler (OnValidationError);
while (reader.Read ());
} Throw exception if invalid
catch (Exception ex) { elements are found
Console.WriteLine (ex.Message);
finally {
if (reader != null)
reader.Close ();
CSCE 547 Fall 2002 Ch 13 - 18
This class has methods for reading and writing elements, attributes,
comments, etc, from/to an XML Document.
try {
writer = new XmlTextWriter
("Guitars.xml", System.Text.Encoding.Unicode);
writer.Formatting = Formatting.Indented;

writer.WriteStartDocument ();
writer.WriteStartElement ("Guitars");
writer.WriteStartElement ("Guitar");
writer.WriteAttributeString ("Image", "MySG.jpeg");
writer.WriteElementString ("Make", "Gibson");
writer.WriteElementString ("Model", "SG");
writer.WriteElementString ("Year", "1977");
writer.WriteElementString ("Color", "Tobacco <?xml
version="1.0" encoding="utf-16"?>
writer.WriteElementString ("Neck", "Rosewood");
writer.WriteEndElement (); <Guitar Image="MySG.jpeg">
writer.WriteEndElement (); <Make>Gibson</Make>
} <Model>SG</Model>
finally { <Year>1977</Year>
if (writer != null) <Color>Tobacco Sunburst</Color>
writer.Close (); <Neck>Rosewood</Neck>
} </Guitar>

CSCE 547 Fall 2002 Ch 13 - 19

XPath is a query language that can be used to get elements or
attributes from an XML document, using “path expressions.” Since
these expressions are a bit arcane, the WWW consortium is working
on a SQL-like query language aimed at replacing XPath.
In the meantime, .NET offers XPath support via a class named
XPathNavigator, which contains a number of features (methods,
events, etc) that make querying a document quite simple, as seen in
using System; using System.Xml.XPath;
class MyApp {
This is the query expresion
static void Main () {
XPathDocument doc = new XPathDocument ("Guitars.xml");
XPathNavigator nav = doc.CreateNavigator ();
XPathNodeIterator iterator = nav.Select ("/Guitars/Guitar");
while (iterator.MoveNext ()) {
XPathNodeIterator it = iterator.Current.Select ("Make");
it.MoveNext ();
string make = it.Current.Value;
it = iterator.Current.Select ("Model");
it.MoveNext ();
string model = it.Current.Value;
Console.WriteLine ("{0} {1}", make, model);

CSCE 547 Fall 2002 Ch 13 - 20

This application, shown in Figure 13-12, illustrates the power of XPath.
You can load a document, and make queries dynamically (provided
that you are familiar with xPath expressions)
The crucial methods in this application are OnExecuteExpression
where a navigator is built, and AddNoteAndChildren, where, depending
on the type of item found, nodes are added to the TreeView.

CSCE 547 Fall 2002 Ch 13 - 21

XSL Transformations
XSL is a language that can be used to transform the format of a
document into a different format. XSL stands for eXtensible Stylesheet
Language, and was probably the main reason XML became so popular,
as it was a crucial factor in the early success of EDI (Electronic Data
Organizations use XSL to get their document from/to other
organizations, e.g., just in the healthcare sector
Humana ó KaiserPermanente
BlueCrossBlueShield ó HCA

XSLT is at the heart of MS BizTalk Server, a set of B2B tools, that

facilitate converting all kinds of business forms (invoices, paychecks,
purchase orders, etc) from one format to another.
Figure 13-13 illustrates this concept.

CSCE 547 Fall 2002 Ch 13 - 22

Copy Figure 13-16’s Guitars.xml and Guitars.xsl into a directory
Comment out the following statement in Guitars.xml:
<?xml-stylesheet type="text/xsl" href="Guitars.xsl"?>
Open Guitars.xml in IE. (Figure 13-14).
Uncomment the statement
Open Guitars.xml again in IE. (Figure 13-15).

The code in
<?xml-stylesheet type="text/
" href

Contains instructions to transform the XML file into an

HTML table at the client side

CSCE 547 Fall 2002 Ch 13 - 23

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl=""
<xsl:template match="/">
<h1>My Guitars</h1>
<hr />
<table width="100%" border="1">
<tr bgcolor="gainsboro">
<xsl:for-each select="Guitars/Guitar">
<td><xsl:value-of select="Make" /></td>
<td><xsl:value-of select="Model" /></td>
<td><xsl:value-of select="Year" /></td>
<td><xsl:value-of select="Color" /></td>
<td><xsl:value-of select="Neck" /></td>

CSCE 547 Fall 2002 Ch 13 - 24

XSLT at the server
.NET provides a class, named XslTransform,
XslTransform that can convert a document
from a format to another, at the server side, using ASP.NET
The chapter illustrates how this can be done in three files:

Quotes.aspx Quotes.xml Quotes.xml

The result is shown in figure 13-17.

Note that the key to get this done is to have a good understanding of .XSL

CSCE 547 Fall 2002 Ch 13 - 25

XslTransform in CS
The code below shows how easy it is to work with XslTransform.
Again, as long as you know the details of XSL, transforming a document to
another format is quite easy.

using System; using System.Xml.XPath;

using System.Xml.Xsl;
class MyApp {
static void Main (string[] args) {
if (args.Length < 2) {
Console.WriteLine ("Syntax: TRANSFORM xmldoc xsldoc");
try {
XPathDocument doc = new XPathDocument (args[0]);
XslTransform xsl = new XslTransform ();
xsl.Load (args[1]);
xsl.Transform (doc, null, Console.Out);
catch (Exception ex) {
Console.WriteLine (ex.Message);

CSCE 547 Fall 2002 Ch 13 - 26