Sie sind auf Seite 1von 7

THE ESSENTIAL GUIDE TO

XML
BY SHARON L. HOFFMAN AUGUST 2005
XML syntax and how XML compares with languages used for related tasks.

XML

is a key technology for sharing data between business entities because it bridges different ways of storing and referencing data. Although XML can be described as a language, the extensible nature of XML means that its more correctly classified as a standard. Many interrelated standards (for a list, see Essential XML Standards on page 4) complement XML and expand its capabilities. XML is also a fundamental building block for other standards. For example, many Web-services standards, such as Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL), are based on XML. To give you a sense of how you might use XML in your own applications, lets start with a quick look at

XML in Context

An XML document is made up of XML elements. Each element contains a starting tag, an ending tag, and (usually) data nested between the two tags. By choosing descriptive names for elements, you can make your XML documents more human-readable and therefore self-documenting. In Figure 1, the highlighted line is a single element called product_code. If a document contains more than one element of the same type, the tags will be repeated for each element as shown for the product_code and requested_qty elements in Figure 1. For more information about XML syntax see Essential XML Syntax and Terminology on page 3. Repeating the data description for every element means that XML documents are entirely self-contained you wont need to refer to a database layout, for example. However, the overhead of repeating all the element-description information Figure 1: Sample XML document quickly becomes unwieldy. As a result, <?xml version="1.0" encoding="UTF-8"?> most developers prefer using data<inventory_inquiry> description languages (e.g., SQL, DDS) <customer_reference>bike component availability</customer_reference> <date_required>9/1/2005</date_required> to define databases. However, XML shines <customer> <customer_name>Acme Company</customer_name> in data-transfer applications that involve <contact_name>Sharon Hoffman</contact_name> relatively small amounts of data (these are <contact_email>shoffman@iseriesnetwork.com</contact_email> </customer> typically single transactions such as an <requested_products> inventory inquiry or a purchase order). <product_code>12345</product_code> <requested_qty>5</requested_qty> Data transfer is by far the most common <product_code>67892</product_code> XML application in iSeries environments. <requested_qty>25</requested_qty> </requested_products> However, you can also use XML to add </inventory_inquiry> meaning to text within documents. Used in this way, XML becomes a powerful

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML


tool for organizing information and improving search capabilities. To understand the benefits of an XMLencoded document, you should consider the differences between XML and HTML. Although the two languages are syntactically similar because they have the same antecedents (see Essential XML History on page 5 for information), they have different strengths. HTML is best used to format information for display, while the descriptive information in XML tags makes it easier to deal with document content. For example, suppose you have a document containing a list of PC printers that contains information about the features of each printer model. If the document is stored in HTML, its difficult to create a search that finds all printers that support color printing, duplex printing, and can print at least 10 pages per minute. Conversely, if you store the same document using XML, you would probably create separate elements for each important feature (e.g., maximum_print_speed) and could easily develop an application that searches for all printers that meet your criteria. Of course, a database is ideal for such a search, but XML provides database-like search capabilities for information that is stored in documents such as user manuals or marketing brochures. As youll see in the following section, the XML data can easily be converted into HTML for display purposes. Because XML documents are plain text, you can write XML using any text editor (e.g., Notepad). However, as you begin working with XML, youll quickly find that an XMLaware editor is a big time-saver. An XML editor should help you write XML by providing syntax-checking and document-generation capabilities. For example, if you begin to create a new element, some editors will automatically generate the ending tag for you. An XML document can stand entirely on its own, without any related documents. More often, though, an XML document is part of a larger application architecture that includes components that define the structure required for a particular type of XML document, solutions that reformat XML data (e.g., create an HTML document for display using data from an XML document), and applications that process

ESSENTIAL XML SYNTAX AND TERMINOLOGY


1 2 3
XML is case sensitive. Generally, white space (e.g., indents, blank lines) in an XML document is ignored. You can choose any element names you like as long as they conform to a few basic rules: Element names cannot contain spaces. Element names must begin with a letter or an underline. After the first character, element names can contain numbers, hyphens, periods, colons, letters, and underscores. (Colons are usually avoided in element names because they have special meaning within XML.) Element names cannot begin with the letters xml, regardless of case (i.e., xml, XML, xMl, and Xml are all invalid). The following nesting is correct:
<customer_name> <first_name>Sharon</first_name> <last_name>Hoffman</last_name> </customer_name>

The following nesting is syntactically correct, although it doesnt make much sense:
<customer_name> <first_name>Sharon

The following nesting is syntactically incorrect:


<customer_name> <first_name>Sharon <last_name>Hoffman </first_name> </last_name> </customer_name>

Elements can contain one or more attributes. In many cases, the XML designer may choose whether to use elements or attributes to define a particular structure. As a rule of thumb, attributes should be used for information that is not integral to the element. An element cannot contain more than one attribute with the same name. Both starting and ending tags are required for all elements except empty elements. Empty elements occur most often when an element is completely defined by its attributes. Elements must be properly nested (i.e., once an inner element tag is opened, it must be closed before any outer tags).

8 9 10

The outermost element in any XML document is referred to as the root element. The root element may be preceded by a document declaration and processing instructions. Built-in XML entities are used to include a character that has special meaning in XML (e.g., a greater-than sign) within XML content. You can also define additional entities as short-hand for text and structures that you use repeatedly. An XML document that has correct syntax is well formed. An XML document that conforms to the structure defined by its Document Type Definition (DTD) or schema is valid. It is possible for an XML document to be well formed but invalid, but the reverse is not possible.

5 6

11 12

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML


developing the DTD or schema. Whether you use a DTD or a schema, there is typically a one-to-many relationship between the DTD or schema and the XML documents. For <?xml version=1.0 encoding="UTF-8"?> <!ELEMENT contact_email (#PCDATA)> example, you could publish a DTD or a schema <!ELEMENT contact_name (#PCDATA)> (or both) specifying the format for incoming <!ELEMENT customer (customer_name,contact_name,contact_email)> <!ELEMENT customer_name (#PCDATA)> inventory inquiries and, hopefully, many of your cus<!ELEMENT customer_reference (#PCDATA)> <!ELEMENT date_required (#PCDATA)> tomers would then begin to send you inventory <!ELEMENT inventory_inquiry (customer_reference,date_required,customer,requested_products)> inquiries in XML format. DTDs and schemas for <!ELEMENT product_code (#PCDATA)> external documents (versus documents that are inter<!ELEMENT requested_products ((product_code,requested_qty)+)> <!ELEMENT requested_qty (#PCDATA)> nal to a particular company) are usually published online so that they can be shared more easily. Ideally, everybody would use the same structure for the same type of document (e.g., inventory inquiries), but XML documents. Understanding how these pieces work thats not always the case not even within a single industry. together is vital to understanding XML. Fortunately, many industry groups are working on standards The Big Picture that should help alleviate some of the Tower-of-Babel aspects An XML document is almost always associated with a of XML. Youll find the latest information on industry-specific second document that defines the valid structure for a XML structures online at xml.org. particular type of documents. For example, an XML In addition to DTDs and schemas, other components can document might contain a particular inventory inquiry from be associated with XML documents. For example, if you XYZ Company, but the structural-definition document would plan to display an XML document in a Web page, youll define the format for all inventory inquiry documents. probably want to first convert the XML document into There are two standards for these structural-definition an HTML document. Similarly, you often might need to documents: DTD is the older and simpler standard, whereas create multiple XML documents that contain the same XML schema is the newer standard. DTDs and schemas general information but use slightly different structures. serve the same purpose, but their complexity and capabilities If you need to convert lots of documents between the same vary significantly. two structures, it makes sense to automate the process. The Figure 2 contains a DTD that you could use to define the simplest way to do this is via an Extensible Stylesheet XML document in Figure 1, and Figure 3 contains the schema Language Transformations (XSLT) document that defines for the same document. Both the DTD and the schema were how input elements should be formatted in the output (XML generated using an XML editor (WebSphere Development or HTML) document. For example, if several of your vendors Studio Client for iSeries WDSc, in this case). Youll find accept inventory inquiries in XML, but each uses a slightly that creating a sample document (e.g., an inventory inquiry) different schema, you could develop a generic XML and using it to generate an initial version of the DTD or inventory inquiry, then create the variations using XSLT. schema is often the simplest way to create a structuralAs with DTDs and schemas, your XML editor should include definition document. While you may need to clean up the tools to help you create XSLT documents. generated code, it will give you a good starting point for An XSLT document works in conjunction with an XSLT

Figure 2:A DTD generated by WDSc for the XML document in Figure 1

ESSENTIAL XML STANDARDS


XML itself is a standard, but it also involves many related standards. Here are some of the most widely used XML standards.
XLINK is a standard for defining hyperlinks in XML. XML Namespaces make it possible to create unique element names. XML Schemas define the rules for the specialized XML documents used to define the structure of other XML documents. XPATH addresses each part of an XML document via a hierarchical structure (e.g., first_name within customer_name within quote_request). XQUERY is a relatively new standard that provides SQL-like query capabilities for XML documents. Extensible Stylesheet Language (XSL) formats XML documents for display. There are two components of the XSL standard: XSL Transformations (XSLT) and XSL Formatting Objects (XSL FO).

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML Essential XML History


The histories of individual computer languages are mostly just curiosities, but XMLs history provides a glimpse into its syntax as well. XML is part of the same family of languages as HTML and is based on Standard Generalized Markup Language (SGML). SGML is a direct descendent of Generalized Markup Language, which was developed by IBM researchers in the 1960s. The concept behind markup languages is to separate document content from document structure and display. Thus in both XML and HTML, the tags contain information about data formatting information in HTML, and context information in XML. SGML became an ISO standard in 1986. HTML, which evolved somewhat independently but incorporates many SGML concepts, is slowly being brought back into compliance with the larger SGML standard. In 1996, developers began working on a simplified version of SGML that focuses on document structure rather than document format. That project is the basis for XML, which became a Worldwide Web Consortium standard in 1998.

The Essential XML Resources


Charles F. Goldfarbs All the XML Books in Print
Goldfarb, one of the developers of SGML, attempted to list all the XML books in print. Although the list was last updated in early 2004, its still a useful resource. xmlbooks.com

IBM RESOURCES Developerworks XML site


www-106.ibm.com/developerworks/xml iSeries XML information home page www-1.ibm.com/servers/enable/site/xml/iseries/index.html

The CoverPages

The XML CoverPages include XML news, background material, and technical tips. xml.coverpages.org

DevX.com

XML FAQs, articles, discussion groups and more. devx.com/xml

Two IBM white papers illustrate how to process XML documents using RPG or Cobol: Parsing XML documents using the new V5R3 ILE COBOL syntax
www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.html

World Wide Web Consortium XML page


w3.org/XML

XML Interface for RPG maps XML into DB2 UDB for iSeries
www-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlface

XML.com

OReilly Media, Inc., a premier technical book publisher, maintains this XML information site. xml.com

ESSENTIAL XML PARSER CONCEPTS


Although most XML editors include an XML parser, youll also need an XML parser for production applications. XML parsers may be part of a Web application server, or they may be available as separate software options. There are two general standards for XML parsers: Document Object Model (DOM) and Simple API for XML (SAX). The only functional difference between DOM parsers and SAX parsers is that DOM parsers can modify an XML document, while SAX parsers are read-only (of course, an application that uses a SAX parser can always write out a new XML document in a different format than the incoming XML document). The other differences between DOM and SAX parsers dont affect their capabilities, but they can have an impact on ease-of-use, and in some cases, performance. SAX parsers are event-driven and are best suited for applications that need to choose specific elements from a larger XML document. Youll find the SAX parsers more intuitive if your programming background includes languages that have event-driven capabilities (e.g., Visual Basic, Java). DOM parsers read an entire XML document into an application where the elements can be referenced, much as an RPG program might reference fields in a record format. Therefore, DOM parsers have an advantage over SAX parsers when you need to process a high percentage of the elements in an XML document. In addition, DOM parsers generally feel more natural than SAX parsers if your programming background includes procedural languages such as RPG and Cobol.

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML


processor software that applies the rules defined in the XSLT document to an incoming XML document and produces an output document in HTML, XML, or text format. An XSLT processor is typically bundled into a Web application server such as WebSphere Application Server (WAS) and can be accessed by calling APIs in an application. Most XML editors also include an XSLT processor for testing purposes. parsers, see Essential XML Parser Concepts on page 5. As you begin developing in XML, you might not even realize that youre using an XML parser. For example, when an XML editor validates an XML document against its associated DTD or schema, an XML parser is invoked to perform the validation. XML parsers, including those for iSeries, are typically free. The iSeries-specific XML parser support is packaged in the no-charge licensed program product, XML Toolkit for iSeries (5733-XT1). If youre working with very low document volumes, it may be possible to assemble and disassemble XML documents using the tools built into an XML editor. However, for production processing of XML documents, youll usually need to develop code that moves data back and forth between a particular type of XML document (e.g., an inventory inquiry) and the associated database records. You can create an XML document using a variety of techniques. At one end of the spectrum, you could write an RPG program that creates an XML document as an iSeries database file by hand-coding the tags and their contents. Then, you could convert the database file to a stream file using the CPYTOSTMF (Copy to Stream File) CL command. Other options include using APIs to output a stream file from an RPG program, generating an XML document using the results of an SQL query, or writing a Java application that builds an XML document. Although you can write custom code to extract data from an XML document, its simpler to leverage the capabilities of an XML parser. For example, you might write code that invokes specific parser functions such as reading the data for a particular type of element (e.g., product_code). Java is the language of choice for working with XML because it includes extensive support for accessing parser APIs. However, you can also invoke parser APIs using RPG or Cobol, and products are available that will automate part of the process of assembling or disassembling XML documents.

From XML to the Database and Vice-Versa


In an iSeries environment, XML projects almost invariably involve extracting data from DB2 UDB for iSeries or moving data from XML documents into the database. While its possible to store entire XML documents in iSeries files, more often youll need to separate the data for one or more elements from its tags and store the data itself as a field or fields within existing iSeries database records. Youll also find lots of requirements for the opposite task creating XML documents using data from one or more database records. The underlying software that is used to separate an XML document into data and data-description components is an XML parser. An XML parser understands the rules of XML syntax, just as the parser that is part of the RPG compiler understands RPG syntax. For more about XML

Figure 3:An XML schema generated by WDSc for the XML document in Figure 1
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="contact_email" type="xsd:string"/> <xsd:element name="contact_name" type="xsd:string"/> <xsd:element name="customer"> <xsd:complexType> <xsd:sequence> <xsd:element ref="customer_name"/> <xsd:element ref="contact_name"/> <xsd:element ref="contact_email"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="customer_name" type="xsd:string"/> <xsd:element name="customer_reference" type="xsd:string"/> <xsd:element name="date_required" type="xsd:string"/> <xsd:element name="inventory_inquiry"> <xsd:complexType> <xsd:sequence> <xsd:element ref="customer_reference"/> <xsd:element ref="date_required"/> <xsd:element ref="customer"/> <xsd:element ref="requested_products"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="product_code" type="xsd:string"/> <xsd:element name="requested_products"> <xsd:complexType> <xsd:sequence maxOccurs="unbounded" minOccurs="1"> <xsd:element ref="product_code"/> <xsd:element ref="requested_qty"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="requested_qty" type="xsd:string"/> </xsd:schema>

Explore XML
XML is a powerful tool for communicating data between applications using different databases and running on different platforms, and it is rapidly becoming the medium of choice for transaction-level data transfer. XML can also organize information within a document, thus making it easier to modify and search large amounts of text. For all its strengths, XML is still a relatively new technology with a maze of confusing, and sometimes competing, standards. To take advantage of XML, it helps to have a clearly defined goal and the flexibility to experiment with various tools and techniques. Its also useful to understand how other businesses are using XML. To explore the opportunities XML offers, visit the Web sites listed in Essential XML Resources on page 5. Sharon L. Hoffman is a senior technical editor for iSeries NEWS.

SUPPLEMENT TO iSeries NEWS 2005

THE ESSENTIAL GUIDE TO SOFTWARE XML

SUPPLEMENT TO iSeries NEWS 2005

Das könnte Ihnen auch gefallen