Sie sind auf Seite 1von 4

DataSets, DataTables, and XML Mapping

This article may contain URLs that were valid when originally published, but now link to sites or pages

that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the

links.

Exploring XML

Use .NET to Store XML Data


DataSets provide a relational mapping for XML documents

Rich Rollman

SQL Server 2000 and XML for SQL Server 2000 Web releases (SQLXML) provide three ways in
which you can store XML data. XML Bulk Load and Updategrams, two client-side technologies,
use annotated schemas to specify the mapping between the contents of an XML document
and the tables in your database. OpenXML is a server-side technology that lets you define a
relational view on an XML document. With OpenXML's relational view, you can use T-SQL code
to query the data in the XML document and store the results in your SQL Server database.

Each of these three storage technologies is designed for a particular purpose. XML Bulk Load
stores data from very large XML documents in SQL Server. Updategrams perform optimistic
updates of SQL Server data. (Optimistic updates are updates without locks, in which the system
checks to see whether another user has changed the data after it was originally read.) And
OpenXML provides familiar relational access for XML data.

Of these three technologies, OpenXML is the most flexible because it provides a programming
model (T-SQL) that you can use to write business rules or perform computational logic on the
XML data before storing it in your SQL Server database. However, because OpenXML is a
server-based technology, if you use it frequently or with large documents, it can degrade SQL
Server's performance. But if you've adopted the Microsoft .NET Framework, you can work
around these performance and scalability limitations by using ADO.NET's DataSet, which gives
you a powerful technology including a full programming model for storing XML data in SQL
Server.
Last month, in "XML Query Results in .NET" (InstantDoc ID 39160), I showed you an easy way to
generate XML query results from SQL Server by using a DataSet. By providing a relational
cache that you can use on client and middle-tier machines, the DataSet can load and
manipulate data from a variety of sources, including SQL Server, other relational databases,
and XML.

When you load a DataSet from an XML document, the DataSet must map the data that's
stored in the hierarchical XML representation into the DataSet's relational representation. For
example, if you have an XML document that contains a list of Order elements that has nested
LineItem elements as children, that document would most commonly be mapped to Orders
and LineItems DataTables in the relational representation. The mapping is similar in purpose to
the way OpenXML uses XPath queries to construct a relational view on the XML document. But
instead of using XPath specifications, DataSets have their own way of mapping data.

DataSets use XML Schema Definition (XSD) schemas to map data from an XML document into
the DataSet's relational cache. DataSets give you two ways that you can specify a schema to
map the XML data. First, you can reference an XSD schema that defines the elements,
attributes, and relationships that are used in the XML document. Alternatively, you can infer
the schema directly from the XML document's structure. In other words, the DataSet can build
a schema by examining the structure and content of the XML document.

When you reference an XSD schema, the DataSet uses the elements and attributes that are
defined in the schema along with the relationships that are defined between the elements to
construct the DataTables, DataColumns, and DataRelationships in the relational cache that you
use to store the mapped XML data. I refer to the structure, or schema, of the relational cache
generically as the shape of the cache. When processing the schema, the DataSet applies a set
of rules, similar to default mapping rules that Updategrams and XML Bulk Load use when no
annotations are specified in their mapping schema, to create the tables that the DataSet uses
to store the mapped XML data. You can summarize the DataSet's mapping rules as follows:

• Complex elements those that contain other elements or attributes are mapped to
tables.
• Attributes and simple-valued subelements elements that contain only data, not other
elements or attributes are mapped to columns.
• Data types are mapped from the XSD types to .NET types.

The useful ADO.NET DataSet documentation, Generating DataSet Relational Structure from
XML Schema (XSD), contains full details of the mapping rules. By referencing the schema of
your choice, you can control the shape of the cache that the DataSet creates.
Inference is a quick, easy way to load an XML document into a DataSet. Tables, columns, and
relationships are created automatically by introspection a process whereby the DataSet
examines the XML document's structure and content. Although using inference significantly
reduces your programming effort, it introduces unpredictability in your implementation
because small changes to the XML document can cause the DataSet to create different-shaped
tables. These changes in shape can cause your application to break unexpectedly. Therefore, I
recommend that you always reference a schema for production applications and limit your use
of inference to building prototypes.

Now let's look at an example of how you can easily use a schema to build a client-side DataSet
cache that you can use to update your SQL Server database.

Mapping an XML Order

Suppose you're writing an application that accepts orders from your customers in the XML
format that the XSD schema inFigure 1 defines. The schema defines three complex types that
provide the order's customer data, order data, and line items. A top-level Customer element
defines the XML document's root. The containment hierarchy defines relationships between
the elements: An Order element contains a LineItem element, and a Customer element
contains an Order element. Figure 2 shows an instance of an XML document that matches
Figure 1's schema.

The C# code in Listing 1, page 38, uses the ReadXmlSchema method to load the schema from
Figure 1 into a DataSet called orderDS. ReadXMLSchema creates three DataTables that
correspond to the Customer, Order, and LineItem elements that the schema defines. So that
you can verify that the schema created the expected tables in the relational cache, the
printDSShape method writes the table name for each table to the console, followed by a list of
columns and the data type for each column.

Look closely at the column names in Figure 3, page 38. The Customer_Id and Order_Id columns
are present in the DataTables although they aren't specified in the schema. The
ReadXmlSchema method automatically adds these columns to the DataSet. The DataSet uses
the columns as foreign keys to model the relationships between a Customer element and its
Order element and between an Order element and its LineItem element. Because XML typically
uses nested relationships instead of foreign keys, the DataSet automatically generates its own
primary and foreign keys between the DataTables and stores them in these columns.

Also look carefully at the data types in Figure 3 the DataSet has mapped the data types from
XML Schema data types to the corresponding .NET data types. When you load an XML
document into the DataSet, the DataSet converts each value from the XML to the
corresponding .NET type.

After loading the schema into the DataSet, all you have to do to complete the relational
mapping is load the XML data into the DataSet. Listing 1's ReadXml method opens the file
named Order.xml, which Figure 2 shows. Then, it reads the data from the file into the three
DataTables that the DataSet created when you read the schema in the previous step. Your XML
order is now accessible through the DataSet.

To demonstrate how to access the data in the DataSet, Listing 1's printDSData method
navigates through the DataTables and, for each table, displays the column names, followed by
all rows in the DataTable. Figure 3 shows the automatically generated values for the
Customer_Id and Order_Id columns that the ReadXmlSchema method added to the DataSet.

Also notice that three elements that appear in Order.xml PO, Address, and Description aren't
mapped into the DataTables. This data is omitted because the schema you supplied to the
DataSet didn't contain these elements, and the DataSet simply ignores any data not described
in the schema when it's creating the shape of the relational cache and loading the XML data.
This convenient feature lets your code work properly even if additional data you didn't
anticipate is included in the XML order you receive from your customer.

Building Applications That Use the Cache

Now that you've learned how to use the DataSet to build a relational cache for XML data, you
can apply this knowledge to implement applications that execute business logic and update
SQL Server. Implementing business logic is relatively straightforward when you use the
DataSet programming model. ADO.NET gives you several alternatives for updating data in SQL
Server, including using DataAdapters, writing your own queries, and executing stored
procedures. DataSets make mapping XML data to a relational model easy; the rest is up to you.

Bugs, comments, suggestions | Legal | Privacy | Advertising

Copyright © 2003 Penton Media, Inc. All rights reserved.

Das könnte Ihnen auch gefallen