Beruflich Dokumente
Kultur Dokumente
za
XML
XML
by Dawid Loubser and Dr. Fritz Solms
Table of Contents
Preface: About Solms TCD .................................................................................... xi 1. Copyright ................................................................................................ xi 1.1. Solms Public License (SPL) ............................................................. xi 1.1.1. Terms ................................................................................ xi 1.1.2. Application domain .............................................................. xi 1.1.3. Conditions .......................................................................... xi 2. Overview ............................................................................................... xii 2.1. Vendor-neutral, concepts-based training with short- and long-term value . xii 2.2. Training methods .......................................................................... xii 2.2.1. Instructor-Led Training ........................................................ xii 3. About the Author(s) ................................................................................. xii 3.1. Fritz Solms .................................................................................. xii 3.2. Dawid Loubser ............................................................................ xiii 4. Solms TCD Guarantee ..............................................................................xiv 1. Introduction to XML .......................................................................................... 1 1. What is XML? .......................................................................................... 1 2. The History of Markup Languages ................................................................ 1 3. Common uses of XML ................................................................................ 1 3.1. Scientific Applications ..................................................................... 2 3.2. e-Business Data Exchange ................................................................ 2 3.3. System Configuration ...................................................................... 2 3.4. Document Definition via DocBook XML ............................................. 2 3.5. Graphics Definition ......................................................................... 2 3.6. IT-Related XML Vocabularies ........................................................... 3 3.6.1. The Simple Object Access protocol (SOAP) .............................. 3 3.6.2. RSS (RDF Site Summary, or "Really Simple Syndication") .......... 3 3.6.3. XMI ................................................................................... 3 3.6.4. XForms .............................................................................. 3 3.6.5. Structured Graph Format (SGF) ............................................... 3 4. Main Features of XML ............................................................................... 3 4.1. The XML syntax ............................................................................. 4 4.2. XML Schemas ................................................................................ 4 4.3. Document Type Definitions (DTDs) ................................................... 4 4.4. The Document Object Model (DOM) .................................................. 5 4.5. The Simple API for XML (SAX) ....................................................... 5 4.6. Name Spaces .................................................................................. 5 4.7. Linking ......................................................................................... 5 4.8. Querying ....................................................................................... 5 4.9. XML Transformations ...................................................................... 5 4.9.1. XML Styling ....................................................................... 6 5. Transmitting XML Documents ..................................................................... 6 6. Tools for XML .......................................................................................... 6 6.1. Editors .......................................................................................... 6 6.2. Browsers ....................................................................................... 6 6.3. Parsers .......................................................................................... 6 2. The XML Syntax ............................................................................................... 8 1. Introduction .............................................................................................. 8 2. A Simple Example ..................................................................................... 8 3. Characters and Names ................................................................................ 9 3.1. XML characters and Unicode ............................................................ 9 3.1.1. The Unicode character set. ...................................................... 9 3.2. White Space ..................................................................................10 3.3. End-Of-Line Delimiters ...................................................................10 3.4. XML Names .................................................................................10 4. Document Structure, Comments and Processing Instructions .............................11 4.1. The Prolog ....................................................................................12 4.2. The Body ......................................................................................12 4.3. The Epilog ....................................................................................13 iv
XML
5. XML Element Tags ...................................................................................13 5.1. Tags must be properly nested ............................................................14 5.2. Element Attributes ..........................................................................14 5.2.1. xml:space Attribute ..............................................................15 5.2.2. xml:lang Attribute ................................................................15 5.3. Empty Elements .............................................................................15 6. CDATA Sections ......................................................................................16 7. XML Processing Instructions ......................................................................16 8. Exercises .................................................................................................17 3. XML Name Spaces ...........................................................................................18 1. Introduction .............................................................................................18 1.1. How Namespaces Work ..................................................................18 2. Syntax ....................................................................................................20 2.1. Naming ........................................................................................20 2.1.1. Uniform Resource Locators ...................................................20 2.1.2. Uniform Resource Identifiers .................................................20 2.2. Declaring Namespaces ....................................................................20 2.3. Default Namespaces .......................................................................21 3. Examples of XML Namespaces ...................................................................21 3.1. Route coordinates within a web page .................................................21 4. Exercises .................................................................................................22 4. XML Schemas .................................................................................................24 1. Introduction .............................................................................................24 1.1. Schemas are defined in XML ............................................................24 1.2. Namespace Support ........................................................................24 1.3. Schemas are Extensible ...................................................................24 1.4. Schema Specialization .....................................................................24 1.5. Data Types ....................................................................................25 2. A Simple Example ....................................................................................25 2.1. Importing the XML Schema Namespace .............................................26 2.2. Simple versus Complex Types ..........................................................26 2.3. Complex Types ..............................................................................26 2.4. Specifying Multiplicities ..................................................................27 3. Associating documents with XML Schemas ...................................................27 3.1. Associating schemas with name spaces ...............................................27 3.2. Associating schemas without name spaces ..........................................28 4. Simple Types ...........................................................................................28 4.1. Primitive Data Types ......................................................................28 4.2. List Types .....................................................................................30 5. Regular expressions ..................................................................................30 5.1. Matching on characters by defining a character class .............................30 5.1.1. Matching on a range of characters ...........................................31 5.1.2. Excluding certain characters ..................................................31 5.1.3. Matching on any character .....................................................31 5.1.4. Escaped characters ...............................................................32 5.1.5. POSIX character classes ........................................................32 5.2. Multiplicity constraints ....................................................................32 5.3. Positional characters .......................................................................33 5.4. Optional matches and groupings ........................................................34 5.5. Greedy versus non-greedy matching ..................................................34 5.6. Exercises ......................................................................................34 6. Using regular expressions to validate element values .......................................35 7. Complex Types ........................................................................................35 7.1. Attributes ......................................................................................36 7.1.1. Attribute Groups ..................................................................36 7.2. Schemas/AnonymousTypes ..............................................................37 7.3. Mixing Elements and Text ...............................................................37 7.4. Any Types ....................................................................................39 7.5. Specifying Elements with Empty Content ...........................................39 7.6. Any attributes ................................................................................39 8. NameSpaces ............................................................................................40 8.1. Specifying location of schemas not assigned to a NameSpace .................40 8.2. Assigning Vocabulary to a NameSpace ..............................................41 v
XML
8.2.1. Using globally unique namespaces ..........................................42 8.3. Qualification .................................................................................42 8.3.1. Unqualified Locals ...............................................................42 8.3.2. Qualified Locals ..................................................................43 8.4. Qualification Requirements at Element/Attribute Level .........................44 9. Specialized Complex Types ........................................................................44 9.1. Extensive Specialization ..................................................................44 9.1.1. Complex Specializations of Simple Types ................................45 9.2. Restrictive Specialization .................................................................46 9.3. Substitution ...................................................................................46 9.4. Abstract Types ...............................................................................47 10. Simple implementation of Object Graphs .....................................................49 11. An example for XML-based documentation .................................................51 12. Uniqueness Constraints ............................................................................61 12.1. Specifying a Single Uniqueness Constraint ........................................61 12.2. Specifying a Uniqueness Constraint for a Combination of Fields ...........63 13. Keys and Key References .........................................................................65 13.1. Defining Keys ..............................................................................65 13.2. Defining Key References ...............................................................65 13.3. Implementing one-to-many and many-to-many Associations .................66 13.4. An Example ................................................................................66 14. Importing Schemas into Schemas ...............................................................69 5. The XML Linking Language ...............................................................................71 1. Introduction .............................................................................................71 2. Simple Links ...........................................................................................71 2.1. Simple Link Elements .....................................................................71 2.2. Attributes of Simple Links ...............................................................71 2.2.1. xlink:type ...........................................................................71 2.2.2. xlink:href ...........................................................................72 2.2.3. xlink:role ...........................................................................72 2.2.4. xlink:title ...........................................................................72 2.2.5. xlink:show ..........................................................................72 2.2.6. xlink:actuate .......................................................................72 3. Current Support for XLink ..........................................................................72 4. Extended Links ........................................................................................73 4.1. Advantages of Being Able to Define Links in a Dfferent Document .........74 4.2. An Example Extended Link ..............................................................74 4.3. Resources .....................................................................................76 4.4. Local Versus Remote Resources .......................................................76 4.4.1. Remote Resources (Locators) .................................................76 4.4.2. Local Resources ..................................................................77 4.4.3. Resource Labels ..................................................................77 4.5. Arcs .............................................................................................77 4.5.1. Arcs can point from resources which do not support links ............77 4.5.2. Arcs as instances of association classes ....................................78 4.5.3. Arc roles ............................................................................78 4.5.4. Consider the example ...........................................................78 5. Exercise ..................................................................................................78 6. XPath .............................................................................................................79 1. Introduction .............................................................................................79 2. Context, and the Context Node ....................................................................79 3. Location Paths .........................................................................................79 3.1. Axis Specifiers ..............................................................................80 3.1.1. descendant ..........................................................................81 3.1.2. parent ................................................................................81 3.1.3. ancestor .............................................................................81 3.1.4. following-sibling .................................................................81 3.1.5. preceding-sibling .................................................................81 3.1.6. following ...........................................................................81 3.1.7. preceding ...........................................................................81 3.1.8. attribute .............................................................................81 3.1.9. namespace ..........................................................................81 3.1.10. self ..................................................................................82 vi
XML
3.1.11. descendant-or-self ..............................................................82 3.1.12. ancestor-or-self ..................................................................82 3.2. Node Tests ....................................................................................82 3.2.1. Types of node test ................................................................82 3.2.2. Examle Node Tests ..............................................................83 3.3. Abbreviated syntax for Location Paths ...............................................83 3.4. Predicates .....................................................................................84 3.4.1. Predicate Example ...............................................................84 3.4.2. XPath Operators for Predicates ...............................................85 4. The core XPath function library ...................................................................86 4.1. Node set functions ..........................................................................86 4.2. String functions .............................................................................86 4.3. Boolean functions ..........................................................................87 4.4. Numeric functions ..........................................................................87 5. XPath Tools .............................................................................................87 7. XSLT (XSL Transformation) ..............................................................................89 1. Introduction .............................................................................................89 2. Common uses of XSLT ..............................................................................89 3. Why XSLT ? ...........................................................................................89 4. XSLT Templates ......................................................................................90 4.1. Template Concepts .........................................................................90 4.1.1. Template Matching ..............................................................90 4.1.2. Nodes and Trees ..................................................................90 4.1.3. Processing child elements ......................................................91 4.1.4. Processing text ....................................................................91 4.1.5. Hiding Content ....................................................................91 4.1.6. Default Child Processing .......................................................92 4.1.7. Selective and Repeated Processing ..........................................92 4.1.8. Prefixes and Suffixes ............................................................92 4.1.9. Tag Replacement .................................................................93 4.1.10. Values of other elements / attributes ......................................94 4.1.11. Document manipulation ......................................................94 5. Writing XSLT Stylesheets ..........................................................................95 5.1. Using Stylesheets ...........................................................................95 5.1.1. Embedded Stylesheets ..........................................................95 5.1.2. Referenced Stylesheets .........................................................96 5.1.3. Unlinked Stylesheets ............................................................96 8. Styling XML/XHTML with Cascading Style Sheets ................................................97 1. Introduction to CSS ...................................................................................97 2. Attaching CSS to documents .......................................................................98 2.1. Attaching CSS to arbitrary XML .......................................................98 2.2. Attaching CSS to legacy (X)HTML ...................................................98 3. CSS Syntax and Rules ...............................................................................99 3.1. Comments ....................................................................................99 3.2. CSS Selectors .............................................................................. 100 4. CSS Style Properties ............................................................................... 101 4.1. Text Properties ............................................................................ 102 4.1.1. 'font-family' ...................................................................... 102 4.1.2. 'font-size' ......................................................................... 102 4.1.3. 'font-weight' ...................................................................... 102 4.1.4. 'font-style' ........................................................................ 102 4.1.5. 'text-decoration' ................................................................. 102 4.1.6. 'text-transform' .................................................................. 102 4.1.7. Examples ......................................................................... 103 4.2. Colour and Background Properties .................................................. 103 4.2.1. Colour ............................................................................. 103 4.3. Box Properties ............................................................................. 104 4.3.1. Margins and Padding .......................................................... 104 4.3.2. The Box Model ................................................................. 104 4.3.3. CSS Borders ..................................................................... 105 4.4. Display Properties ........................................................................ 105 4.4.1. 'display' ............................................................................ 106 5. CSS Example: Restyling Google.com ......................................................... 106 vii
XML
9. XSL-FO (XSL Formatting Objects) .................................................................... 111 1. Introduction to XSL-FO ........................................................................... 111 10. Web Services ............................................................................................... 114 1. Conceptual Overview of Web Services ....................................................... 114 2. SOAP ................................................................................................... 115 2.1. Introduction to SOAP .................................................................... 115 2.2. The Benefits of SOAP ................................................................... 116 2.3. SOAP Message Structure ............................................................... 117 2.3.1. The SOAP Body ................................................................ 118 2.3.2. The SOAP Header ............................................................. 121 2.4. SOAP Faults ............................................................................... 124 2.4.1. The 'faultcode' element ....................................................... 125 2.4.2. The 'faultstring' element ...................................................... 125 2.4.3. The 'faultactor' element ....................................................... 125 2.4.4. The 'detail' element ............................................................ 126 3. WSDL (The Web Services Contract) .......................................................... 126 3.1. The structure of a WSDL Document ................................................ 126 3.2. A simple example WSDL .............................................................. 127 3.3. The 'types' section ........................................................................ 130 3.4. The 'message' section .................................................................... 130 3.5. The 'portType' section ................................................................... 130 3.6. The 'binding' section ..................................................................... 131 3.7. The 'service' section ...................................................................... 131 4. UDDI (Web Services Discovery ................................................................ 132
viii
List of Figures
1.1. XML Example ................................................................................................ 4 2.1. XML Document Structure ................................................................................11 2.2. As is shown by the simple DocBook example, an XML document forms a tree structure. ..........................................................................................................................13 2.3. The tree structure of the travel log document. ......................................................13 4.1. An Object graph with attributes and composition, association and specialization relationships. .................................................................................................................49 5.1. Possible linking for a property sale contract. ........................................................74 8.1. The CSS Concept ...........................................................................................97 8.2. Example Text CSS Properties ......................................................................... 103 8.3. The CSS Box Model ..................................................................................... 104 8.4. Redefining Google: Pure functionality .............................................................. 107 8.5. Redefining Google: Style sheet applied ............................................................ 108 9.1. Simple PDF produced from XSL-FO ............................................................... 111 10.1. SOAP Message Structure ............................................................................. 117 10.2. Document versus RPC Messaging .................................................................. 119 10.3. The structure of a WSDL Document ............................................................... 127
ix
List of Tables
2.1. Unicode structure ............................................................................................ 9 2.2. Legal XML names ..........................................................................................10 2.3. Illegal XML names .........................................................................................10 4.1. Date/Time primitives defined for XML Schema. ..................................................29 4.2. ....................................................................................................................32 4.3. Multiplicity constraints supported by regular expressions .......................................33 6.1. ....................................................................................................................83 8.1. Common selection patterns for targeting style rules at elements ............................. 100 8.2. Pseudo-classes and Pseudo-elements ................................................................ 100
1. Copyright
We at Solms Training, Consulting and Development (STCD) believe in the open and free sharing of knowledge for public benefit. To this end, we make all our knowledge and educational material freely available. The material may be used subject to the Solms Public License (SPL).
1.1.1. Terms
Material below, refers to any such knowledge components, documentation, including educational and training materials, or other work. Work based on the Material means either the Material or any derivative work under copyright law: that is to say, a work containing the Material or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Licensee means the person or organization which makes use of the material.
1.1.3. Conditions
The licensee may freely copy, print and distribute this material or any part thereof provided that the licensee 1. prominently displays (e.g. on a title page, below the header, or on the header or footer of a page or slide) the author's attribution information, which includes the author's name, affiliation and URL or e-mail address, and conspicuously and appropriately publishes on each copy the Solms Public License.
2.
Any material which makes use of material subject to the Solms Public License (SPL) must itself be published under the SPL. The knowledge is provided free of charge. There is no warranty for the knowledge unless otherwise stated in writing: the copyright holders provide the knowledge as is without warranty of any kind. In no event, unless required by applicable law or agreed to in writing, will the copyright holder or any party which modifies or redistributes the material, as permitted above, be liable for damages including any general, special, incidental or consequential damages arising from using the information. xi
Neither the name nor any trademark of the Author may be used to endorse or promote products derived from this material without specific prior written permission.
2. Overview
The training pillar of Solms TCD focuses on vendor-neutral training which provides both, short- and long-term value to the attendees. We provide training for architects, designers, developers, business analysts and project managers.
xii
the company. Besides the management role, he is particularly focused on architecture, design, software development processes and requirements management. Fritz Solms has a PhD and BSc degrees in Theoretical Physics from the University of Pretoria and a Masters degree in Physics from UNISA. After completing a short post-doc, he took up a senior lectureship in Applied Mathematics at the Rand Afrikaans University. There he founded, together with Prof W.-H. Steeb, the International School for Scientific Computing - developing a large number of courses focused on the immediate needs of industry. These include C++, Java, Object-Oriented Analysis and Design, CORBA, Neural Networks, Fuzzy Logic and Information Theory and Maximum Entropy Inference. The ISSC was the first institution in South Africa offering courses in Java and the Unified Modeling Language. During this period he was also responsible for presenting the OO Training for the Education Division of IBM South Africa. In 1998 he joined the Quantitative Applications Division of the Standard Corporate and Merchant Bank (SCMB). Here he was the key person developing the architecture and infrastructure for the QAD library and applications. These were based on Java and CORBA technologies, with a robust object-oriented analysis and design backbone. In 2000 he and Ellen Solms founded Solms TCD. E-Mail: Tel: Mobile: fritz@solms.co.za 011 646 6459 072 128 2314
applications in the financial industry, and specialises in Application Architecture, Graphical Interxiii
faces (web and otherwise), and usability. In addition to object-oriented and web technologies, he also takes great interest in various graphical formats and frameworks - especially the application of XML for this purpose. After a career at the JSE Securities Exchange (formerly Johannesburg Stock Exchange) as Systems Analyst and Web Architect, he joined Solms TCD in order to explore a larger field for applying interesting technologies to solve interesting problems, and to share this passion with others through mentoring and consulting. E-Mail: Tel: Mobile: dawidl@solms.co.za 011 646 6459 082 655 0876
Please feel free to discuss any complaints you may have with us. We will do our best to address your complaints. Should you feel that your complaints are not satisfactorarily addressed within our organization, then you can raise your complaints with: Professor W.-H. Steeb from the University of Johannesburg, Tel: +27 (11) 486-4270, EMail: whs@rau.ac.za Dr A. Gerber from the Computer Science Department of the University of South Africa, E-Mail: gerberaj@unisa.ac.za
At the time of writing we are in the process of obtaining ISETT SETA accreditation. Complaints can be raised directly to that institution.
xiv
1. What is XML?
XML, the Extensible Markup Language, is a tagged markup language for structured, self-describing data. As an extensible markup language, it allows for the definition of new markup tags (semantics). This means that, instead of prescribing a specific vocabulary for e.g. Web Pages or Purchase Orders, it simply defines a standard for the markup language itself - anybody can thus create their own vocabulary (or tags) to describe anything. An important realisation concerning XML is that it logically consists of two aspects: A general, and very precise, set of rules (and, usually, tools) which can be used to express any information you can think of in a logical, self-describing tree structure. (The XML Information Set). Rules for the serialisation of this information set as human- and computer-readable text, which can either be stored in a file, transmitted over a network, or anything else you may care to do with it.
XML is a public standard, developed and maintained by the World Wide Web Consortium (W3C). The W3C develops inter-operable technologies (specifications, guidelines, software and tools) to increase the potential of the Web as a forum for information, commerce, communication and collective understanding. Part of the overwhelming abundance of XML can be attributed to the fact that no single organisation controls the standard or the associated tools.
Introduction to XML
cessing and communication of self-describing data. A very wide spectrum of domain-specific vocabularies have been (and will be) developed. These cover the exchange of scientific, legal and business information through to software setup information and the service request information in the case of distributed software systems. In this section we shall look at a few example applications of XML.
Introduction to XML
application user interfaces, and even animation (similar to the Macromedia Flash format). General support for SVG in applications is still somewhat lacking (but it is growing,) and is already practical for many purposes.
3.6.3. XMI
XMI is a public standard used to convey the information contained in an object-oriented model created using the UML (Unified Modeling Language). This allows for the sharing of design and requirements information between several modeling toolkits, and it is even used by tools that generate source code in an implementation language (Java, C++) based on the model.
3.6.4. XForms
XForms is an extremely promising standard proposed by the W3C (and already implemented by some web browsers, and available in others using plugins). It is a pure vocabulary with which to define a user interface that creates or edits a particular piece of XML information. Though primarily intended for the web, where web servers can now receive properly structured XML from the client, instead of the archaic name-value pairs generated by today's web forms, XForms in itself represents a declarative way of creating a proper MVC (Model-View-Controller) user interface of the first order. Because it is so abstract, it can be used in any number of application or visual frameworks, and is already being used to develop interactive user interfaces drawn with SVG (Scalable Vector Graphics).
Introduction to XML
Note
4
Introduction to XML
Though the industry seems to be slowly moving towards XML Schema, many established standards such as XHTML are still based on DTD.
4.7. Linking
The feature of HTML which most probably contributed more to its success than any other feature is its ability to link one HTML document with another (or even with non-HTML information) which is accessible through a Uniform Resource Locator (URL). Though not initially part of the XML standard, XML linking (XLink) has grown to be a powerful mechanism which far exceeds the capability of simple links in web documents. The software support for this standard has, however, grown slowly - not all tools support it yet.
4.8. Querying
The W3C has also developed a query language, XQuery which can be used to formulate queries on XML documents which are independent of the source of the XML document, i.e. the type of repository (e.g. relational or object database, or file).
Introduction to XML
domain. In such cases one requires tools for transforming the one document type to the other. Such tools are specified and are available. They are called XSLT (XSL Transformation).
Easy element selection with automatic provision of end-tags (Auto-completion). Parsing and syntax verification.
Free XML editors can be obtained from IBM, Microsoft and other sources. Examples of commercial XML editors are: oXygen XML (www.oxygenxml.com), XML Pro (www.vervet.com), XML Authority (www.extensibility.com), XMetaL (www.xmetal.com), Clip! XML Editor (www.t2000-usa.com) and XMLSpy (www.altova.com).
6.2. Browsers
At the time of writing, XML support in various web browsers range from partial to excellent. Standards-compliant browsers such as Mozilla (Firefox), Opera and those based on the KHTML engine (Safari, Konqueror) tend to provide better support than the Microsoft family of browsers, especially for recent standards such as Transformation and XLink. Most modern browsers, however, do support the core XML standards plus simple styling (through CSS).
6.3. Parsers
6
Introduction to XML
There is a myriad of free parsers available. IBM, Microsoft, Sun and many other (e.g. www.jimclark.com) have XML parsers - Many of them are written in Java, which means they run on all platforms with a Java Runtime Environment. These parsers typically implement one of the two standard APIs: DOM or SAX.
Note
Recent versions of the Java runtime (1.4 and later) contain XML parsing and processing tools as standard - largely eliminating the need to use any third-party components.
1. Introduction
Every XML document must conform to the exact XML syntax specification in order for it to be a legal XML document. In this chapter we look at the XML syntax including: XML document structure, XML elements and attributes, XML Processing instructions.
2. A Simple Example
Have a look at the following entirely arbitrary XML document. At first glance, an XML document seems similar to an HTML document (if you have had exposure to them before) <document> <title>XML course</title> <heading>XML Course</heading> <paragraph> The course is a 3 day course covering the XML funadamentals and giving candidates ample practical exposure. </paragraph> <paragraph> Topics covered include XML syntax, Document Type Definitions (DTDs), the Document Object Model (DOM), the Simple API for XML (SAX) and many others. </paragraph> </document>
XML allows you define your own vocabulary for data. The example above seems to obviously describe a document of some kind. XML tags enable you to use XML for exchanging self-describing data. Consider, for example, the following simple XML document for a travel log: <?xml version="1.0" encoding="UTF-8"?> <!-- Solms Training & Consulting: Simple Example --> <travelLog> <trip> <from>Kensington</from> <to>Johannesburg Airport</to> <start>15-2-2001 11:30</start> <end>15-2-2001 12:00</end> <odoMeterStart>22345</odoMeterStart> <odoMeterEnd>22375</odoMeterEnd> </trip> <trip> <from>Johannesburg Airport</from> <to>Rosebank Mall</to> <start>15-2-2001 12:35</start> <end>15-2-2001 13:15</end> <odoMeterStart>22375</odoMeterStart> <odoMeterEnd>22403</odoMeterEnd> </trip> </travelLog>
Most people who would read that file would immediately understand the content. The information is presented in a very structured manner which makes automatic processing of the information a simple task. As in HTML, tags are used to define the markup. Every opening tag has a corresponding end tag. Between the tag is the content of that tag. The syntax is thus simply: <tagName> content </tagName>
. Note that the private areas can be used for your own character definitions. Of course, you then have to convey the information about your character mappings to whoever processes your document. Generally this is neither recommended nor necessary.
Unicode are not required. XML processors are required to support both, UTF-8 and UTF-16 and they typically recognize automatically which of the two character sets is used. Other character sets may also be used for XML documents but then the character set used must be specified in the prolog of the document and you should be aware that not all XML processors are required to support your choice of character set.
is treated in XML as a single white space. All white space characters within the content of elements is preserved by parsers, while multiple white space characters within element tags and attribute values may be removed and replaced with a single white space character.
2_INVOICE (invoice)number
-invoiceNo invoiceAmountIn$
XMLinvoice *invoiceNo
The XML document structure is illustrated in figure Figure 2.1, XML Document Structure.
11
An XML document thus forms a tree structure which could be graphically depicted as is done in figure Figure 2.2, As is shown by the simple DocBook example, an XML document forms a tree structure. .
Figure 2.2. As is shown by the simple DocBook example, an XML document forms a tree structure.
Similarly, the travel log example discussed in section ??? represents a tree structure as shown in figure Figure 2.3, The tree structure of the travel log document. .
13
contains 3 elements. The outer element, <item> contains two nested elements,<name> and <price>.
14
Here the attribute is independent of the content and another company might use the same supplier list but have a different preferred supplier. In XML the attribute value has to always be within single or double quotation marks (i.e. enclosed within a pair of " or '). An element can contain multiple attributes. For example, <room width="50" height="10"> <window>2</window> <door>1</door> <singleBed>3</singleBed> </room> XML currently defines two special attributes. They are xml:space andxml:lang.
Note
Note that this is simply a request to the XML processor who might take note of it or might simply ignore it.
ISO 639 with ISO 3166 sub-code: In this notation the 2 letter language code is followed by a ISO 3166 country sub-code. For example, en-US or en-GB specify english for Unites States of America or Great Britain. ISO 639 with regional description: Here the ISO 639 language code is followed by a regional description, typically for a local dialect, for example en-cockney or no-bokmaal. IETF RCF 1766 language code: These may be 3 or 8 letter codes registered with IANA. For example, I-LUX is the IANA code for Luxembourgish. User-defined code: Should you, for example, write a fable or novel about a fantasy community, you might develop your own language, or should you want to refer ta a local campus slang, you might have to define your own language code. In XML you do this via a code prepended with X-. For example, you might have an imaginary language used by mongoose in your fable. You might refer to the language as X-MONGOOSE.
Empty elements have no content. Often they are used for the value of their attributes. The syntax is slightly dfferent to standard HTML syntax because XML always requires an end tag. This, the image and e-mail tags in a certain XML vocabulary could look like this: <img src="Flasher.gif"></img> <email href="mailto:info@solms.co.za"></email>
XML also offers a short-hand notation where the begin and end tag are merged into one. This notation is shown in the following emptry element tags: <img src="Flasher.gif"/> <email href="mailto:info@solms.co.za"/>
Note
Both forms or writing empty elements are interchangeable, and are considered identical.
6. CDATA Sections
If you want a block of characters taken as a literal string without any XML interpretation, you can use character data (CDATA) sections. For example, mathematical expressions may often contain the less-than ( < ) or greater-than ( > ) characters, which an XML parser would interpret as tag delimiters. Alternatively you may want to include some XML code as a literal string within the content of an XML tag. CDATA section start with <![CDATA[ and end with ]]>. For example, if you wrote an XML book in XML, you would most probably want to include XML example code. These should be interpreted as literal string: <?xml version="1.0" encoding="UTF-8"?> <book> <chapter> <title>XML Syntax</title> <part> <title>An Example</title> <para> The following is a simple example of a fictitious shopping list: <![CDATA[ <shoppingList> <item> <name>Sugar</name> <quantity>1kg</quantity> </item> <item> <name>Potatoes</name> <quantity>10kg</quantity> </item> </shoppingList> ]]> </para> </part> </chapter> </book>
The purpose of XML is to be a simple descriptive language containing information independent from any form of processing (or presentation) of that information. As such, processing instructions (PIs) are not in the spirit of XML. However, for some very limited applications PI are acceptable within XML. The XML syntax for processing instructions is: <?targetProcessor informationForTargetProcessor?> For example, if you have an application, say PrintXML, which can print XML documents in tree or standard text format you could include a processing instruction which could look something like: <?PrintXML format="tree"?> It is, however, very questionable whether you should do that. It would most probably be a better idea to keep the XML document pure content only and to provide the processing parameters externally. We have encountered one very common processing instruction - that of the XML declaration header. It typically looks something like the following: <?xml version="1.0" encoding="UTF-8"?> Another common use of processing instructions is that of assigning style sheets to XML documents. For example: <?xml-stylesheet href="ManagerView.xsl" type="text/xsl"?> assigns the ManagerView.xsl style sheet to the XML document.
8. Exercises
1. Create an XML document that describes your music collection in a tree structure. For each album, it should contain name, year, label, and a track listing. The artist of each track also needs to be present. Make use of both attributes and elements, and validate your document for wellformedness using your parser of choice. Choose an area with which you are familiar, e.g. relating to your work or interest. Create an XML file which describes a set of data or objects from that domain, and validate this file for well-formedness using your parser of choice.
2.
17
1. Introduction
In order to be usable as truly universal data interchange mechanism, the basic XML standard needs two important capabilities: The ability to construct document using several sets of tags, i.e. a mechanism to mix different vocabularies. The ability to identify, target and process tags correctly in such a mixed document, even if they have the same name.
The XML Namespaces standard provides a very simple mechanism to address both these issues, and is certainly one of the most important base technologies in XML. It enables the componentisation of XML, and is also fundamental to mechanisms such as XML Schema which allows us to treat XML in a object-oriented way.
<address>192.164.12.34</address> </server> <server> <type>web</type> <name>plymouth</name> <address>192.164.12.34</address> </server> </branch> The problem is that we are unable to distinguish between the name element in the contact details, and in the server information. Similarly, the address is ambiguous. It is thus impossible for software to reliably tell us the names of all mail servers in the company, or provide us a list in XML of all the branch names, and the names of their web (HTT) servers. By placing the tags in name spaces, we assign them a unique URI. We could, therefore, create a URI for contact details (company.com/contact), and a URI for server information (company.com/networking). We assign the URI to the tags using the xmlns attribute. We can now uniquely identify the tags: <?xml version="1.0" encoding="UTF-8"?> <branch> <contactDetails xmlns="company.com/contact"> <name xmlns="company.com/contact">Rivonia Consultancy</name> <address xmlns="company.com/contact"> 12 Gemsbok ave, Rivonia, Sandton </address> </contactDetails> <server xmlns="company.com/networking"> <type xmlns="company.com/networking">mail</type> <name xmlns="company.com/networking">scoot</name> <address xmlns="company.com/networking">192.164.12.34</address> </server> <server xmlns="company.com/networking"> <type xmlns="company.com/networking">web</type> <name xmlns="company.com/networking">plymouth</name> <address xmlns="company.com/networking">192.164.12.34</address> </server> </branch> A namespace-aware parser will recognise the xmlns attribute, and will not treat it as merely another attribute. Instead, each tag name is now qualified by the namespace it resides in, and can be processed as such. Literally assigning name spaces to the tags is very verbose and difficult to read - this is solved by aliasing the namespaces, for example: <?xml version="1.0" encoding="UTF-8"?> <branch xmlns:c="company.com/contact" xmlns:n="company.com/networking"> <c:contactDetails> <c:name>Rivonia Consultancy</c:name> <c:address> 12 Gemsbok ave, Rivonia, Sandton </c:address> </c:contactDetails> <n:server> <n:type>mail</n:type> <n:name>scoot</n:name> <n:address>192.164.12.34</n:address> </n:server> <n:server> <n:type>web</n:type> <n:name>plymouth</n:name> 19
2. Syntax
2.1. Naming
XML namespaces use Uniform Resource Identifiers (URIs) for their naming. URIs are often Uniform Resource Locators (URLs), but they can be more general than that. Let us first revisit URLs before we explain the dfference between a URL and a URI:
20
Note
The URI mechanism is used for namespaces in order to guarantee a certain level of uniqueness. It is common to use HTTP web addresses for namespaces, but note that the parser does not resolve this address, for example to read the contents thereof. The namespace itself can be any string whatsoever, as long as it is unique (for a particular domain). In that sense, namespaces are analogous to packaging in Java - it is merely used to qualify items in order to prevent name collisions, and to establish a grouping. Namespaces can be declared at any level in the XML document, but it can only be used within the element on which it is declared, or any level of elements nested within.
<head> <title>How to reach us</title> </head> <body> <h1>Want to pay us a visit?</h1> <p> Our doors here at ACME Products are always open to our customers! Whether it's souncd advice you want, or just want to have a look at our line of superb products, finding us is easy. We've provided MPS coordinates for the two most common routes taken by our customers. If this does not suit your needs, please <a href="directions@acme.com">contact us</a>. </p> <h2>Coming from Pretoria</h2> <map:route desc="Finding ACME coming from Pretoria"> <map:waypoint> <map:coord desc="M1 Highway" lon="3.232" lat="44.623"/> </map:waypoint> <map:waypoint> <map:coord desc="Shell garage" lon="4.512" lat="45.629"/> </map:waypoint> <map:waypoint> <map:coord desc="ACME Offices" lon="4.932" lat="45.999"/> </map:waypoint> </map:route> <h2>Coming from the West Rand</h2> <map:route desc="Finding ACME coming from Pretoria"> <map:waypoint> <map:coord desc="Ontdekkers Road" lon="6.78" lat="44.623"/> </map:waypoint> <map:waypoint> <map:coord desc="Barley Road" lon="6.11" lat="44.001"/> </map:waypoint> <map:waypoint> <map:coord desc="ACME Offices" lon="4.932" lat="45.999"/> </map:waypoint> </map:route> </body> </html> The tags from the map namespace are inserted into the XHTML document. They are, however, unaware of XHTML, and can easily embedded into any other XML information set. Though this type of integration may seem far off, it is already being used to embed MathML (Mathematical equations) and SVG (Scalable Vector Graphics) into web pages. Similarly, RDF (The Resource Description Framework) is an XML vocabulary that defines semantic relationships between entities (in order to give meaning to them that computers can understand). RDF tags can be 'sprinkled' into XHTML documents to define these relationships, even though RDF itself is unaware of XHTML, and vice-versa.
4. Exercises
1. Create a document that represents DVD Movies (name, genre, director, age restriction) - all the tags must be within the same namespace, and this namespace merely prepresents movie information itself. For the namespace, use http://imdb.com/media/movies/
22
2.
Create a document that represents an order of several DVD Movies from an online retailer. The order has a shipping address and date, and each movie has a price and a quantity. The order information is in a new namespace, but the actual movies must be identical to, and use the same namespace as, the exercise above. You will thus effectively wrap the existing movies in a different context. The retailer uses the namespace http://omozon.com/orders/ for it's orders. Create a document that is automatically created by your DVD player that keeps track of your favourite movies. It keeps track of the amount of times you've watched a movie, and the last time watched. Use the same namespace and structure as per the first exercise, to represent the movies. The DVD player uses the namespace urn:hk:dvd21:favourites.
3.
23
1. Introduction
Ever since the adoption of XML it has been recognised that, in order to be of any value as a generic data description language, a mechanism of enforcing and constraining document structure was required. Furthermore, because XML does not specify any vocabulary as such, this constraint mechanism would be used to define a specific vocabulary - be it for web pages, or purchase orders. The initial mechanism for this was, of course, DTDs (Document Type Definitions), inherited from XML's origins in the SGML language. But DTDs are severely limited in many ways, and show their age as a non object-oriented mechanism. Schemas have several advantages over traditional DTDs, such as: They are written in XML. They support name spaces. They are extensible. They support schema specialization. They support data types.
24
XML Schemas
DTDs had no direct provision for specialization (or inheritance) and specialization relationships have to be mapped onto composition relationships. This limits not only the way in which we define information, but more importantly the ways in which we can view and process information. Schemas do support specialization, and hence facilitate the processing of information at various levels of abstraction. This means that we can now treat XML as a object-oriented data language.
2. A Simple Example
Reconsider the travel log example which we used to introduce DTDs. A simple example XML document looked like this: <?xml version="1.0" encoding="UTF-8"?> <!-- Solms Training, Consulting: Simple Example --> <!DOCTYPE travelLog SYSTEM "./travelLog.dtd"> <travelLog> <trip> <from>Kensington</from> <to>Johannesburg Airport</to> <start>15-2-2001 11:30</start> <end>15-2-2001 12:00</end> <odoMeterStart>22345</odoMeterStart> <odoMeterEnd>Jack</odoMeterEnd> </trip> <trip> <from>Johannesburg Airport</from> <to>Rosebank Mall</to> <start>15-2-2001 12:35</start> <end>15-2-2001 13:15</end> <odoMeterStart>22375</odoMeterStart> <odoMeterEnd>22403</odoMeterEnd> </trip> </travelLog> The DTD for this XML document looked like this: <!ELEMENT travelLog (trip+)> <!ELEMENT trip (from, to, start, end, odoMeterStart, odoMeterEnd)> <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT from to start end odoMeterStart odoMeterEnd (#PCDATA)> (#PCDATA)> (#PCDATA)> (#PCDATA)> (#PCDATA)> (#PCDATA)>
Now let us look at a way in which we could define a schema for travel logs: <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://www.TaxisUnlimited.co.za/schemas" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:travel="http://www.TaxisUnlimited.co.za/schemas" 25
XML Schemas
elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="TravelLog"> <xs:sequence> <xs:element name="trip" type="travel:Trip" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Trip"> <xs:sequence> <xs:element name="from" type="string"/> <xs:element name="to" type="string"/> <xs:element name="start" type="dateTime"/> <xs:element name="end" type="dateTime"/> <xs:element name="odoMeterStart" type="unsignedInt"/> <xs:element name="odoMeterEnd" type="unsignedInt"/> </xs:sequence> </xs:complexType> <xs:element name="travelLog" type="travel:TravelLog"/> </xs:schema> Because we use non-string data types we shall have to modify the XML document slightly, but we shall come back to that later. Let us first study the schema on its own terms. As mentioned in the introduction, it is of course an XML document and thus typically contains an XML header with the character set specification.
XML Schemas
<xsd:complexType name="Trip"> <xsd:sequence> <xsd:element name="from" type="xsd:string"/> <xsd:element name="to" type="xsd:string"/> <xsd:element name="start" type="xsd:timeInstant"/> <xsd:element name="end" type="xsd:timeInstant"/> <xsd:element name="odoMeterStart" type="xsd:unsignedInt"/> <xsd:element name="odoMeterEnd"type="xsd:unsignedInt"/> </xsd:sequence> </xsd:complexType> specifies that each trip must contain the six elements in the specified order.
Note
Complex types are the XML Schema equivalent of classes, as modeled using UML.
Here minOccurs and maxOccurs are standard element attributes which are used to specify lower and upper bounds on the multiplicities. If unspecified, they both acquire a default value of 1. Hence, if we do not modify the default multiplicity constraints (see, for example, the from field) then the field is required to appear once and once only.
27
XML Schemas
... </my:instance>
The schemaLocation attribute contains a set of whitespace-separated pairs, each pair indicating the URI where the schema for that namespace can be found. The URI may either be an absolute (file or http) URL (recommended), or a URI relative to the XML istance itself. Here is another example, which specifies the location of two schemas: <a:doc xmlns:a="urn:some:ns" xmlns:b="urn:other:ns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:some:ns ../schemas/some.xsd urn:other:ns ../schemas/foo/other.xsd"> ... </a:doc>
Note
It is relatively uncommon to have to specify the schema location for more than one namespace (typically the document's primary schema). This is because that schema, if it explicitly makes use of types defined in the other name spaces, would have indirectly imported the other schema(s) by URL already.
Using XML documents and schemas without name spaces is not recommended.
4. Simple Types
Simple types in XML schemas may not contain any elements or any attributes. They are either one of the primitive data types like string, float ordate or a specialization of one of these primitive types.
XML Schemas
XML Schema supports the standard primitives most object-oriented programming languages support as well as a few special ones for date/time/time-period support and support for XML concepts themselves. Primitive data types are, from the point of view of XML Schemas, atomic - that is indivisible. Below we summarize the primitive data types supported by XML Schemas: Text data. Either string or CDATA for un-parsed test data. Boolean. An element or attribute of type boolean can acquire the values false or true. Integral data types. XML schemas support signed and unsigned versions of byte, short, int and long (e.g. int and unsignedInt) as well as integer, negativeInteger, positiveInteger, nonNegativeInteger and NonPositiveInteger. Here the int and integer types are used for 32 bit integers, short for 16 bit, and long for 64 bit representations. Floating point data types. XML schemas support only the signed versions including float and double for the corresponding IEEE specifications and decimal which represents singleprecision 32-bit floating point numbers with NaN for not a number. Data types for specifying dates and times. The date/time support in XML Schema is very extensive. It is illustrated in table Table 4.1, Date/Time primitives defined for XML Schema.. Name Data Types. XML Schema introduces 3 data types for naming. These are Name for standard names adhering to XML's naming rules. QName for qualified names using XML name spaces. For example TravelLog:trip. NCName for names from other name spaces without the prefix, for example the trip of the above example.
Identities, Named Tokens, Notations and Entities. XML Schema defines standard data types for these via ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, NOTATION, ENTITY and ENTITIES.
timePeriod
XML Schemas
5. Regular expressions
Regular expressions are used to match text to certain patterns. They are supported by a wide range of tools ranging from many editors (including vi, emacs, jedit, ...) to text processing tools like Perl, grep, sed and awk and is even supported with the Java class libraries shipped with JDK version 1.4 onwards.
XML Schemas
Note
When matching on a hyphen, -, place the hyphen as first character in a character set so that it is not interpreted as range. Thus [-+*/^!&|] matches on any one of these characters while [+-*/^!&|] matches on any character between + and * or on /, ^, ! & or |.
XML Schemas
Table 4.2.
Character class [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper] [:xdigit:] Description Printable characters including white spaces. Alphabethic characters. Space and tab characters. Control characters. Numerical characters. Printable (visible) characters. Lower case characters. Alphanumeric characters. Punctuation characters. Whitespace characters. Uppercase characters. Hexadecimal digits.
For example, to search on Linux with grep for all lines in any of the files in the current directory which contain control characters, we can use grep [[:cntrl:]] *
XML Schemas
number of occurances. To this end regular expressions provide multiplicity constraints which are specified directly after the element to which they apply. The multiplicity constraints supported by regular expressions are listed in Table 4.3, Multiplicity constraints supported by regular expressions.
33
XML Schemas
matches on any XML/HTML comment which is preceded by nothing else but an arbitrary number of spaces. In a similar way we can look for a certain pattern at the end of a line. For example, if we want to remove all trailing spaces, we could use the following regular expression to find them: " +$" In the above expression we added inverted commas to highlight the space in the regular expression.
5.6. Exercises
1. Write a regular expression which matches on any date/time which is either in the format 22/09/2003 19:30 +2:00 or in the format 19h30 on 22 Sep 2003
2.
34
XML Schemas
3. 4.
Write a regular expression which can be used to validate e-mail addresses. Write a regular expression which finds all lines which start with a hash (with potentially spaces infront of the hash).
7. Complex Types
We have seen the definition of a complex data type in the simple example (see Section 2, A Simple Example). For example, the following listing contains the complex types Order, Address and Item: <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:annotation> <xsd:documentation>Simple schema for purchase orders.</xsd:documentation> </xsd:annotation> <xsd:element name="order" type="Order"/> <xsd:complexType name="Order"> <xsd:sequence> <xsd:element name="shipTo" type="Address"/> <xsd:element name="billTo" type="Address"/> <xsd:element name="item" type="Item" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="placementDate" type="xsd:date" use="required"/> </xsd:complexType> <xsd:complexType name="Address"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="steet" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="postalCode" type="xsd:string"/> 35
XML Schemas
<xsd:element name="country" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Item"> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity" type="xsd:positiveInteger"/> <xsd:element name="price" type="xsd:decimal"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
Note
We use the Camel convention which is a widely used standard for object-oriented modeling and development: class names start with capital letter, everything else with lower case letter and word boundaries are capitalized either way.
7.1. Attributes
Attributes are specified via the xsd:attribute element. The required attributes for attributes are a name and a type. For example in the above listing we define an placementDate attribute for orders via <xsd:complexType name="orderType"> ... <xsd:attribute name="placementDate" type="xsd:date"/> </xsd:complexType>
XML Schemas
</xsd:complexType>
7.2. Schemas/AnonymousTypes
When declaring elements of a particular type, we first declare the type and then declare elements of that type via the type attribute. This is similar to declaring a class and then creating instances of that class. In the vein of anonymous classes in Java, XML allows you to define a the data type within the context of an element definition. The data type itself is not given a name and is hence called an anonymous type. <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation> Travel log schema for taxi operator. </xsd:documentation> </xsd:annotation> <xsd:element name="travelLog" type="TravelLog"/> <xsd:complexType name="TravelLog"> <xsd:sequence> <xsd:element name="trip" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="from" type="xsd:string"/> <xsd:element name="to" type="xsd:string"/> <xsd:element name="start" type="xsd:dateTime"/> <xsd:element name="end" type="xsd:dateTime"/> <xsd:element name="odoMeterStart" type="xsd:unsignedInt"/> <xsd:element name="odoMeterEnd" type="xsd:unsignedInt"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:schema> If you compare the above listing to the schema listing, travelLog.xsd, discussed in section ???, then you will note that we no longer define a tripType. Nor is our trip element's type specified via a type attribute. Instead we insert the actual type information into the element itself. Anonymous types are used frequently when a specific data type only makes sense for a particular element and will not be used elsewhere. You have the advantage of not polluting the type name space with names which are not generally useful.
XML Schemas
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.SolmsTraining.co.za/Dispatch shippingConfirmation.xsd"> <salutation>Dear <recipient>Max</recipient></salutation> You will be pleased to hear that we dispatched the latest version of our anti-hijacker knobkerrie with order no <orderID>A1112</orderID> on <dateShipped>2001-02-27</dateShipped>. The tracking number is <trackingNumber>tr0917</trackingNumber>. <closing>Best wishes, <sender>Tarzan</sender></closing> </shippingConfirmation> A schema for the above XML document has to be able to specify elements with mixed content, containing both, character data and elements. The following schema uses anonymous types with mixed content to achieve this: <?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace="http://www.SolmsTraining.co.za/Dispatch" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.SolmsTraining.co.za/Dispatch" elementFormDefault="qualified"> <xsd:annotation> <xsd:documentation> Standard shipping confirming note which may be presented as e-mail, HTML, pdf, ... </xsd:documentation> </xsd:annotation> <xsd:element name="shippingConfirmation"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="salutation"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="recipient" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="orderID" type="xsd:string"/> <xsd:element name="dateShipped" type="xsd:date"/> <xsd:element name="trackingNumber" type="xsd:string"/> <xsd:element name="closing"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="sender" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> The specification of elements which may contain both, character data and sub-elements done via the mixed attribute of complexType.
38
XML Schemas
The following schema defines a model for the above element type: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'> <xs:element name="currency"> <xs:complexType> <xs:attribute name="name" type="xs:string"/> <xs:attribute name="code" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema> with an XML instance file something like the following: <?xml version="1.0" encoding="UTF-8"?> <currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="currencyEmpty.xsd" name="South African Rand" code="ZAR"/>
<xs:complexType name="KnowledgeComponent"> <xs:sequence> <xs:element name="resource" type="Resource"/> <xs:element name="preRequisite" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="ref" type="KnowledgeComponentId" use="requir </xs:complexType> </xs:element> <xs:element name="knowledgeComponent" type="KnowledgeComponent" minO maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="id" type="KnowledgeComponentId" use="required"/> 39
XML Schemas
8. NameSpaces
Namespaces are XML's form of packaging. Typically you always want to assign your vocabularies to name spaces so that they the elements of that vocabulary can be uniquely distinguished from any other elements of the same name defined in other schemas - recall that XML documents can import multiple schemas.
XML Schemas
</clients>
41
XML Schemas
</clients>
8.3. Qualification
In the previous example we imported our elements into the default namespace. If one imports the elements into a namespace prefix, the elements from that namespace are accessed through that prefix. How these elements are accessed depends on the qualification settings within the schema.
XML Schemas
<client> <name>Jill</name> <contact> <telNo>011 726 4860</telNo> <eMailNo>Jill@Jack.org.za</eMailNo> </contact> </client> </clients:clients>
Note
The qualification specified via the elementFormDefault andattributeFormDefault resembles a global setting which applies to all elements from that namespace.
XML Schemas
XML Schemas
</complexType> </element> </sequence> </complexType> <complexType name="Employee"> <complexContent> <extension base="pass:Person"> <sequence> <element name="employer" type="string"/> <element name="salary" type="decimal"/> </sequence> <attribute name="permanent" type="boolean" use="required"/> </extension> </complexContent> </complexType> </schema>
Note
Note that we have specified that the W3C schema namespace is the default namespace (we have not given it a prefix) and we assigned the contents of our schema to the pass prefix. Our employee type thus has pass:Person as super (base) type. An example XML document which would be parsed against the above schema is shown below: <?xml version="1.0" encoding="UTF-8"?> <persons xmlns="http://www.SAA.com/passportControl/persons" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.SAA.com/passportControl/persons persons.xsd"> <person> <name>Piet Pompies</name> <passport> <country>South Africa</country> <passportNumber>RSA9873288763</passportNumber> </passport> </person> <person> <name>Tandi Mkondo</name> <passport> <country>South Africa</country> <passportNumber>RSA53278632876</passportNumber> </passport> </person> <employee permanent="true"> <name>Marie Curie</name> <passport> <country>France</country> <passportNumber>FR7653276532765</passportNumber> </passport> <employer>University of X</employer> <salary>0</salary> </employee> </persons>
45
XML Schemas
To indicate that the content model of the sup type only contains a simple type, we use the simpleContent element: <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation> Simple schema for currencies. </xsd:documentation> </xsd:annotation> <xsd:complexType name="currencyType"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="code" type="currencyCode"/> </xsd:sequence> <xsd:attribute name="quoteAgainstUSD" type="xsd:boolean" default="true"/> </xsd:complexType> <xsd:simpleType name="currencyCode"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[A-Z]{3}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
9.3. Substitution
One of the fundamental concepts of specialization in object-oriented modeling is that of substitutetability, i.e. if we define a class, say Account, and a subclass,CreditCard then one can always substitute an instance of the specialized class (CreditCard) for an instance of the more generic class, Account. Substitutability forms the crux test for specialization. Consider the following schema where we define a credit card data type as a specialization of a more generic account data type: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.solms.co.za/clients" xmlns="http://www.solms.co.za/clients" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="client" type="Client"/> <xs:complexType name="Client"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="account" type="Account" maxOccurs="unbounded"/> 46
XML Schemas
</xs:sequence> </xs:complexType> <xs:complexType name="Account"> <xs:sequence> <xs:element name="balance" type="xs:decimal"/> <xs:element name="accountNo" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="CreditCardAccount"> <xs:complexContent> <xs:extension base="Account"> <xs:sequence> <xs:element name="expiryDate" type="xs:date"/> <xs:element name="issuer" type="xs:string"/> </xs:sequence> <xs:attribute name="type" type="xs:string" use="required"/> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="creditCardAccount" type="CreditCardAccount"/> </xs:schema> Even though credit cards are accounts one cannot assign credit cards in the standard way to clients, only vanilla accounts. However, the developer of XML instant documents may request that a credit card which extends an account be substituted for the latter. This is done by still inserting elements of the base type (Account), but substituting a Creditcard realization for theAccount: <?xml version="1.0" encoding="UTF-8"?> <cl:client xmlns:cl="http://www.solms.co.za/clients" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.solms.co.za/clients substitute.xsd"> <cl:name>Jack</cl:name> <cl:account> <cl:balance>5432.12</cl:balance> <cl:accountNo>acc101</cl:accountNo> </cl:account> <cl:account> <cl:balance>5432.12</cl:balance> <cl:accountNo>acc101</cl:accountNo> </cl:account> <cl:account type="VISA" xsi:type="cl:CreditCardAccount"> <cl:balance>12354</cl:balance> <cl:accountNo>14321</cl:accountNo> <cl:expiryDate>2001-01-14</cl:expiryDate> <cl:issuer>Investec</cl:issuer> </cl:account> <cl:account xsi:type="cl:CreditCardAccount" type="VISA"> <cl:balance>200</cl:balance> <cl:accountNo>76328376</cl:accountNo> <cl:expiryDate></cl:expiryDate> <cl:issuer></cl:issuer> </cl:account> </cl:client>
XML Schemas
ments. For example, in the following schema specification we specify that a client has one or more accounts, but we enforcing substitution in the XML instance document by one of the concrete subtypes, CreditCard orChequeAccount: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.solms.co.za/clients" xmlns="http://www.solms.co.za/clients" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="client" type="Client"/> <xs:complexType name="Client"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="account" type="Account" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Account" abstract="true"> <xs:sequence> <xs:element name="balance" type="xs:decimal"/> <xs:element name="accountNo" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="CreditCardAccount"> <xs:complexContent> <xs:extension base="Account"> <xs:sequence> <xs:element name="expiryDate" type="xs:date"/> <xs:element name="issuer" type="xs:string"/> </xs:sequence> <xs:attribute name="type" type="xs:string" use="required"/> </xs:extension> </xs:complexContent> </xs:complexType> <xs:complexType name="ChequeAccount"> <xs:complexContent> <xs:extension base="Account"> <xs:sequence> <xs:element name="chequeFee" type="xs:decimal"/> <xs:element name="overdraftLimit" type="xs:decimal"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> </xs:schema> Clients can no longer have instances of Account, but only instances of concrete sub-classes of Account: <?xml version="1.0" encoding="UTF-8"?> <client xmlns="http://www.solms.co.za/clients" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.solms.co.za/clients abstractType.xsd"> <name>Jack</name> <account xsi:type="ChequeAccount"> <balance>6542</balance> <accountNo>2765</accountNo> <chequeFee>23.34</chequeFee> 48
XML Schemas
<overdraftLimit>5000</overdraftLimit> </account> <account xsi:type="CreditCardAccount" type="MASTER"> <balance>222</balance> <accountNo>235324</accountNo> <expiryDate>2006-10-10</expiryDate> <issuer>FNB</issuer> </account> </client>
The object graph in figure Figure 4.1, An Object graph with attributes and composition, association and specialization relationships. depicts the structure of a parts catalog. Each part is identified by a code and has a name, possibly a description and possibly multiple assembly instructions. Furthermore, a part may be composed of further parts, its components. Every part is associated with one manufacturer and a manufacturer may be associated with multiple parts. A product is a special type of part which has everything a part has but adds a price (an amount in a currency). The XML schema for this design is shown below: <xsd:schema targetNamespace="http://www.ManufacturingUnlimited.co.za/partsCatalog" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.ManufacturingUnlimited.co.za/partsCatalog" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:element name="partsCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element name="part" type="Part" maxOccurs="unbounded"/> <xsd:element name="manufacturer" type="Manufacturer" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> 49
XML Schemas
</xsd:element> <xsd:complexType name="Part"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="description" type="xsd:string" minOccurs="0"/> <xsd:element name="assemblyInstruction" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="part" type="Part" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="manufacturer" type="xsd:IDREF" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="code" type="xsd:ID"/> </xsd:complexType> <xsd:complexType name="Product"> <xsd:complexContent> <xsd:extension base="Part"> <xsd:sequence> <xsd:element name="price"> <xsd:complexType> <xsd:sequence> <xsd:element name="amount" type="xsd:decimal"/> <xsd:element name="currency" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:complexType <xsd:sequence> <xsd:element <xsd:element <xsd:element name="Manufacturer"> name="name" type="xsd:string"/> name="address" type="xsd:string"/> name="part" type="xsd:IDREF" minOccurs="0" maxOccurs="unbounded"/>
An example XML file which will be parsed by this schema is shown below: <partsCatalog xmlns="http://www.ManufacturingUnlimited.co.za/partsCatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ManufacturingUnlimited.co.za/partsCatalog partsCatalog.xsd"> <part code="PC103"> <name>Pentium Flyer</name> <description>Pentium III based PC</description> <assemblyInstruction> Plug keyboard cable into green socket. </assemblyInstruction> <assemblyInstruction> Plug rat (mouse) cable into purple socket. </assemblyInstruction> <assemblyInstruction> Plug screen cable into whatever hole it fits in. </assemblyInstruction> <part code="kbd03"> <name>keyboard</name> 50
XML Schemas
</part> <part code="rat12"> <name>rat</name> <manufacturer>Log002</manufacturer> </part> <part code="scr001"> <name>screen</name> </part> <part> <name>processingBox</name> <part code="MB121"> <name>motherboard</name> <part code="PIII-800"> <name>CPU</name> </part> </part> </part> </part> <part xsi:type="Product"> <name>Crunch-It</name> <description>The number crunching monster.</description> <part> <name>Cube processor array</name> </part> <price> <amount>33223.98</amount> <currency>ZAR</currency> </price> </part> <manufacturer id="Log002"> <name>Logitech</name> <address>13 Cyber Street, Silicon Hill</address> <part>rat12</part> <part>MB121</part> </manufacturer> </partsCatalog>
XML Schemas
schemaLocation="../../../../../../../externalResources/w3c/xml/2001/xml.xsd <xs:attributeGroup name="identifier"> <xs:attribute name="id" type="xs:ID"/> <xs:attribute name="role" type="xs:string"/> </xs:attributeGroup> <xs:simpleType name="Abbrev"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="Literal"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:element name="literal" type="Literal"/> <xs:complexType name="XRef"> <xs:attribute name="linkend" type="xs:IDREF" use="required"/> </xs:complexType> <xs:element name="xref" type="XRef"/> <xs:complexType name="Link" mixed="true"> <xs:group ref="TextGroup"/> <xs:attribute name="linkend" type="xs:IDREF"/> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="link" type="Link"/> <xs:complexType name="ULink"> <xs:group ref="TextGroup"/> <xs:attribute name="url" type="xs:string"/> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:simpleType name="ProgramListing"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:element name="programlisting" type="ProgramListing"/> <xs:complexType name="Quote" mixed="true"> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="emphasis" type="Emphasis" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="abbrev" type="Abbrev" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="xref" type="XRef" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="quote" type="Quote"/> <xs:complexType name="Emphasis" mixed="true"> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="quote" type="Quote" minOccurs="0"/> <xs:element name="abbrev" type="Abbrev" minOccurs="0"/> <xs:element name="xref" type="XRef" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="emphasis" type="Emphasis"/> <xs:complexType name="Title" mixed="true"> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="emphasis" type="Emphasis" minOccurs="0"/> <xs:element name="abbrev" type="Abbrev" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> 52
XML Schemas
<xs:element name="title" type="Title"/> <xs:complexType name="FootNote"> <xs:sequence> <xs:element name="para" type="Para"/> </xs:sequence> </xs:complexType> <xs:group name="TextGroup"> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="quote" type="Quote"/> <xs:element name="blockquote" type="BlockQuote"/> <xs:element name="emphasis" type="Emphasis"/> <xs:element name="itemizedlist" type="ItemizedList"/> <xs:element name="orderedlist" type="OrderedList"/> <xs:element name="literal" type="Literal"/> <xs:element name="programlisting" type="ProgramListing"/> <xs:element name="inlinegraphic" type="InlineGraphic"/> <xs:element name="abbrev" type="Abbrev"/> <xs:element name="xref" type="XRef"/> <xs:element name="note" type="Note"/> <xs:element name="ulink" type="ULink"/> <xs:element name="footnote" type="FootNote"/> </xs:choice> </xs:sequence> </xs:group> <xs:complexType name="BlockQuote"> <xs:sequence> <xs:element name="attribution" type="xs:string" minOccurs="0"/> <xs:element name="para" type="Para" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Para" mixed="true"> <xs:group ref="TextGroup"/> <xs:attributeGroup ref="identifier"/> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="para" type="Para"/> <xs:complexType name="ItemizedList"> <xs:sequence> <xs:element name="listitem" type="ListItem" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="itemizedlist" type="ItemizedList"/> <xs:complexType name="OrderedList"> <xs:sequence> <xs:element name="listitem" type="ListItem" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="orderedlist" type="OrderedList"/> <xs:complexType name="Graphic"> <xs:attributeGroup ref="identifier"/> <xs:attribute name="fileref" type="xs:string"/> <xs:attribute name="align" type="xs:string"/> <xs:attribute name="scale" type="xs:string"/> <xs:attribute name="scalefit" type="xs:string"/> </xs:complexType> <xs:element name="graphic" type="Graphic"/> <xs:complexType name="InlineGraphic"> <xs:attributeGroup ref="identifier"/> 53
XML Schemas
<xs:attribute name="fileref" type="xs:string"/> <xs:attribute name="align" type="xs:string"/> <xs:attribute name="scale" type="xs:string"/> </xs:complexType> <xs:element name="inlinegraphic" type="InlineGraphic"/> <xs:complexType name="Figure"> <xs:sequence> <xs:element name="title" type="Title"/> <xs:choice> <xs:element name="graphic" type="Graphic"/> <xs:element name="programlisting" type="ProgramListing"/> </xs:choice> </xs:sequence> <xs:attributeGroup ref="identifier"/> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="figure" type="Figure"/> <xs:complexType name="ListItem"> <xs:choice> <xs:element name="para" type="Para"/> <xs:element name="formalpara" type="FormalPara"/> </xs:choice> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="listitem" type="ListItem"/> <xs:complexType name="FormalPara"> <xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="para" type="Para"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType>
<xs:complexType name="Table"> <xs:sequence> <xs:element name="title" type="Title" minOccurs="0"/> <xs:element name="tgroup" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="thead" minOccurs="0"> <xs:complexType> <xs:sequence> <!--<xs:element name="colspec" type="ColSpec" minOccurs="0"/><xs:element name="row" type="TableRow"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="tbody" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="row" type="TableRow" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="tfoot" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="row" type="TableRow"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="cols" type="xs:positiveInteger" use="required"/> </xs:complexType> </xs:element> 54
XML Schemas
</xs:sequence> <xs:attributeGroup ref="identifier"/> <xs:attribute name="frame" type="Frame"/> </xs:complexType> <xs:element name="table" type="Table"/> <xs:complexType name="InformalTable"> <xs:sequence> <xs:element name="tgroup" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="tbody" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="row" type="TableRow" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="cols" type="xs:positiveInteger" use="required"/> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="identifier"/> <xs:attribute name="frame" type="Frame"/> </xs:complexType> <xs:element name="informaltable" type="InformalTable"/> <xs:complexType name="TableRow"> <xs:sequence> <xs:element name="entry" type="Entry" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Entry"> <xs:sequence> <xs:element name="para" type="Para" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="align" type="Alignment"/> </xs:complexType> <xs:complexType name="Note"> <xs:sequence> <xs:element name="title" type="Title" minOccurs="0"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:group ref="ComponentGroup"/> </xs:sequence> </xs:sequence> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="note" type="Note"/> <xs:group name="ComponentGroup"> <xs:choice> <xs:element name="para" type="Para" minOccurs="0"/> <xs:element name="figure" type="Figure" minOccurs="0"/> <xs:element name="table" type="Table" minOccurs="0"/> <xs:element name="informaltable" type="InformalTable" minOccurs="0"/> <xs:element name="formalpara" type="FormalPara" minOccurs="0"/> </xs:choice> </xs:group> <xs:complexType name="Section"> <xs:sequence> <xs:element name="title" type="Title"/> 55
XML Schemas
<xs:element name="subtitle" type="Title" minOccurs="0"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:choice> <xs:group ref="ComponentGroup"/> <xs:element name="note" type="Note" minOccurs="0"/> </xs:choice> </xs:sequence> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="section" type="Section" /> <xs:element name="example" type="Section"/> </xs:choice> </xs:sequence> </xs:sequence> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="example" type="Section"/> <xs:element name="section" type="Section"/> <xs:complexType name="Abstract"> <xs:sequence> <xs:element name="title" type="Title" minOccurs="0"/> <xs:element name="para" type="Para" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="figure" type="Figure" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="abstract" type="Abstract"/> <xs:complexType name="Address"> <xs:sequence> <xs:element name="street" type="xs:string" minOccurs="0"/> <xs:element name="city" type="xs:string" minOccurs="0"/> <xs:element name="postcode" type="xs:string" minOccurs="0"/> <xs:element name="state" type="xs:string" minOccurs="0"/> <xs:element name="country" type="xs:string" minOccurs="0"/> <xs:element name="phone" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="fax" type="xs:string" minOccurs="0"/> <xs:element name="email" type="xs:string" minOccurs="0"/> <xs:element name="homepage" type="xs:string" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="address" type="Address"/> <xs:complexType name="Affiliation"> <xs:sequence> <xs:element name="orgname" type="xs:string" minOccurs="0"/> <xs:element name="orgdiv" type="xs:string" minOccurs="0"/> <xs:element name="address" type="Address" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="affiliation" type="Affiliation"/> <xs:complexType name="Author"> <xs:sequence> <xs:element name="honorific" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="firstname" type="xs:string" minOccurs="0"/> <xs:element name="othername" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="surname" type="xs:string" 56
XML Schemas
minOccurs="0"/> <xs:element name="email" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="affiliation" type="Affiliation" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="author" type="Author"/> <!-- TODO: Discuss Authorgroup vs just listing several authors --> <xs:complexType name="AuthorGroup"> <xs:sequence> <xs:element name="author" type="Author" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="CopyRight"> <xs:sequence> <xs:element name="date" type="xs:string"/> <xs:element name="holder" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="copyright" type="CopyRight"/> <xs:complexType name="RevisionHistory"> <xs:sequence> <xs:element name="revision" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="revnumber" type="xs:string"/> <xs:element name="date" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:element name="revhistory" type="RevisionHistory"/> <xs:group name="CompilationInfo"> <xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="subtitle" type="Title" minOccurs="0"/> <xs:choice> <xs:element name="authorgroup" type="AuthorGroup"/> <xs:element name="author" type="Author" minOccurs="0"/> </xs:choice> <xs:element name="affiliation" type="Affiliation" minOccurs="0"/> <xs:element name="address" type="Address" minOccurs="0"/> <xs:element name="date" type="xs:string" minOccurs="0"/> <xs:element name="copyright" type="CopyRight" minOccurs="0"/> <xs:element name="revhistory" type="RevisionHistory" minOccurs="0"/> </xs:sequence> </xs:group> <xs:complexType name="ArticleInfo"> <xs:group ref="CompilationInfo"/> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="articleinfo" type="ArticleInfo"/> <xs:complexType name="Article"> <xs:sequence> <xs:choice> 57
XML Schemas
<xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="author" type="Author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:element name="articleinfo" type="ArticleInfo"/> </xs:choice> <xs:element name="abstract" type="Abstract" minOccurs="0"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:choice> <xs:group ref="ComponentGroup"/> <xs:element name="note" type="Note" minOccurs="0"/> </xs:choice> </xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="section" type="Section" /> <xs:element name="example" type="Section"/> </xs:choice> <xs:element name="appendix" type="Appendix" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="article" type="Article"/> <xs:complexType name="BookInfo"> <xs:group ref="CompilationInfo"/> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="bookinfo" type="BookInfo"/> <xs:complexType name="Book"> <xs:sequence> <xs:choice> <xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="author" type="Author" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:element name="bookinfo" type="BookInfo"/> </xs:choice> <xs:element name="preface" type="Preface" minOccurs="0"/> <!-- Book may contain either parts or chapters in any combination --> <xs:sequence minOccurs="1" maxOccurs="unbounded"> <xs:choice> <xs:element name="part" type="Part" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="chapter" type="Chapter" minOccurs="0" maxOccurs="unbounded"/> </xs:choice> </xs:sequence> <xs:element name="appendix" type="Appendix" minOccurs="0"/> <xs:element name="bibliography" type="Bibliography" minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="book" type="Book"/> <xs:group name="TitledSectionContainer"> <xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="abstract" type="Abstract" 58
XML Schemas
minOccurs="0"/> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="section" type="Section"/> <xs:element name="example" type="Section"/> </xs:choice> </xs:sequence> </xs:group> <xs:complexType name="Part"> <xs:sequence> <xs:element name="title" type="Title"/> <xs:element name="abstract" type="Abstract" minOccurs="0"/> <xs:element name="chapter" type="Chapter" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="part" type="Part"/> <xs:complexType name="Preface"> <xs:group ref="TitledSectionContainer"/> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="preface" type="Preface"/> <xs:complexType name="Chapter"> <xs:group ref="TitledSectionContainer"/> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="chapter" type="Chapter"/> <xs:complexType name="Appendix"> <xs:group ref="TitledSectionContainer"/> <xs:attributeGroup ref="identifier"/> <!-- Allow xml:base, etc. --> <xs:attributeGroup ref="xml:specialAttrs"/> </xs:complexType> <xs:element name="appendix" type="Appendix"/> <xs:group name="BiblioBook"> <xs:sequence> <xs:choice> <xs:element name="authorgroup" type="AuthorGroup"/> <xs:element name="author" type="Author"/> </xs:choice> <xs:element name="title" type="Title"/> <xs:element name="subtitle" type="Title" minOccurs="0"/> <xs:element name="edition" type="xs:string" minOccurs="0"/> <xs:element name="productname" type="xs:string" minOccurs="0"/> <xs:element name="pagenums" type="xs:string" minOccurs="0"/> <xs:element name="publisher" type="Publisher" minOccurs="0"/> <xs:element name="isbn" type="xs:string" minOccurs="0"/> <xs:element name="pubdate" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:group> <xs:complexType name="Publisher"> <xs:sequence> <xs:element name="publishername" type="xs:string"/> </xs:sequence> </xs:complexType>
59
XML Schemas
<xs:complexType name="BiblioSet"> <xs:sequence> <xs:choice minOccurs="0"> <xs:element name="authorgroup" type="AuthorGroup"/> <xs:element name="author" type="Author"/> </xs:choice> <xs:element name="title" type="Title"/> <xs:element name="subtitle" type="Title" minOccurs="0"/> <xs:element name="productname" type="xs:string" minOccurs="0"/> <xs:element name="pagenums" type="xs:string" minOccurs="0"/> <xs:element name="volumenum" type="xs:string" minOccurs="0"/> <xs:element name="issuenum" type="xs:string" minOccurs="0"/> <xs:element name="date" type="xs:string" minOccurs="0"/> </xs:sequence> <xs:attribute name="relation" type="BiblioSetRelation" use="required"/> </xs:complexType> <xs:simpleType name="BiblioSetRelation"> <xs:restriction base="xs:string"> <xs:enumeration value="article"/> <xs:enumeration value="journal"/> </xs:restriction> </xs:simpleType>
<xs:complexType name="Bibliography"> <xs:sequence> <xs:element name="biblioentry" maxOccurs="unbounded"> <xs:complexType> <xs:choice> <xs:group ref="BiblioBook"/> <xs:element name="biblioset" type="BiblioSet" maxOccurs="unbounded"/ </xs:choice> <xs:attributeGroup ref="identifier"/> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="xml:specialAttrs"/> <xs:attributeGroup ref="identifier"/> </xs:complexType> <xs:element name="bibliography" type="Bibliography"/> <xs:complexType name="CiteTitle"> <xs:sequence> <xs:element name="linkend" type="xs:string"/> </xs:sequence> <xs:attribute name="role" fixed="dependency"/> </xs:complexType> <xs:simpleType name="Alignment"> <xs:restriction base="xs:string"> <xs:enumeration value="left"/> <xs:enumeration value="right"/> <xs:enumeration value="center"/> <xs:enumeration value="justify"/> <xs:enumeration value="char"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="Frame"> <xs:restriction base="xs:string"> <xs:enumeration value="all"/> <xs:enumeration value="none"/> <xs:enumeration value="bottom"/> <xs:enumeration value="top"/> <xs:enumeration value="topbot"/> <xs:enumeration value="sides"/> </xs:restriction> </xs:simpleType> 60
XML Schemas
</xs:schema>
In order to address these shortcomings XML Schema introduces uniqueness or identity constraints. They offer a number of advantages over IDs: Uniqueness constraints can be applied to both, attribute values and element contents. The uniqueness can be scoped and does not have to apply across the entire XML document. For example, an attribute's uniqueness can be enforced across all elements of a certain type. The unique value may be any XML Schema datatype and may start with a numeral. One can enforce in XML Schema a uniqueness constraint across elements and attributes such that they do not have to be unique individually, but that a combination of element/attribute values need only be unique.
XML Schemas
<name>Screws Loose</name> </manufacturer> </manufacturers> </partsCatalog> Here we may want to enforce that each part must have a unique id. Similarly we may want to enforce that each manufacturer has a unique id, but a manufacturer could have the same id as one of the parts. In order to specify the uniqueness constraint we need to Specify a name for the constraint. This name is used to identify the constraint itself. Specify the scope of the constraint. The constraint has to be specified within the element to which it applies. If the constraint should apply across the entire document it should be placed within the root element. In our case we want to specify the part ids should be unique within the parts element and that manufacturer ids should be unique within the manufacturers element. The two constraints should be positioned within the corresponding elements. Define the selector to which the constraint applies. The selector is the actual element to which the constraint should apply. Define the field to which the uniqueness constraint applies. Here one specifies the element or attribute whose value must be unique within the scope of the constraint. The following schema definition enforces unique part ids within <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="partsCatalog"> <xs:complexType> <xs:sequence> <xs:element name="parts" type="Parts"> <xs:unique name="uniquePartID"> <xs:selector xpath="./part"/> <xs:field xpath="@id"/> </xs:unique> </xs:element> <xs:element name="manufacturers" type="Manufacturers"> <xs:unique name="uniqueManufacturerID"> <xs:selector xpath="manufacturer"/> <xs:field xpath="@id"/> </xs:unique> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="Parts"> <xs:sequence> <xs:element name="part" type="Part" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Manufacturers"> <xs:sequence> <xs:element name="manufacturer" type="Manufacturer" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="Part"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> <xs:element name="manufacturer" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer"/> 62
XML Schemas
</xs:complexType> <xs:complexType name="Manufacturer"> <xs:sequence> <xs:element name="name" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer"/> </xs:complexType> </xs:schema>
Note
We use standard relative paths for XPaths.
XML Schemas
<xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> <xs:element name="manufacturer" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer"/> </xs:complexType> <xs:complexType name="Manufacturer"> <xs:sequence> <xs:element name="name" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer"/> </xs:complexType> </xs:schema> The following document is legal if one of the part's id, or the manufacturer for the part
differ. If both are equal, the document will not be parsed against the schema. <?xml version="1.0" encoding="UTF-8"?> <partsCatalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="partsCatalogMultiFieldIds.xsd"> <parts> <part id="121"> <name>Screw</name> <price>12.40</price> <manufacturer>8</manufacturer> </part> <part id="121"> <name>Nut</name> <price>4.40</price> <manufacturer>9</manufacturer> </part> <part id="122"> <name>Bolt</name> <price>4.40</price> <manufacturer>7</manufacturer> </part> </parts> <manufacturers> <manufacturer id="7"> <name>Screw Loose</name> </manufacturer> <manufacturer id="8"> <name>Screws Loose</name> </manufacturer> </manufacturers> </partsCatalog>
64
XML Schemas
Furthermore, because keys are scoped within a certain subtree (element) in the XML document tree, they can provide a performance benefit over using IDs and IDREFs because the entire document need not be searched for a map or for verifying uniqueness.
Then the key has to resolve a unique manufacturer from its id attribute: <xsd:key name="manufacturerID"> <xsd:selector xpath="manufacturers/manufacturer"/> <xsd:field xpath="@id"/> </xsd:key>
Note that defining a key is analogous to defining a uniqueness constraint. In fact a key always implies a uniqueness constraint.
XML Schemas
<xsd:sequence> <xsd:element name="name" type="xsd:string"/> ... <xsd:element name="manufacturer" type="xsd:string" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="code" type="xsd:string"/> </xsd:complexType>
Then the reference is the manufacturer field of the part which refers to a manufacturer id (key): <xsd:keyref name="partManufacturerRef" refer="manufacturerID"> <xsd:selector xpath="parts/part"/> <xsd:field xpath="manufacturer"/> </xsd:keyref>
13.4. An Example
Below we show the complete listing of our example parts catalog schema: <?xml version="1.0" encoding="UTF-8"?> <!-<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> --> <xsd:schema targetNamespace="http://www.ManufacturingUnlimited.co.za/partsCatalog" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:parts="http://www.ManufacturingUnlimited.co.za/partsCatalog" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:element name="partsCatalog"> <xsd:complexType> <xsd:sequence> 66
XML Schemas
<xsd:element name="parts" type="parts:Parts"/> <xsd:element name="manufacturers" type="parts:Manufacturers"/> </xsd:sequence> </xsd:complexType> <xsd:key name="partID"> <xsd:selector xpath="parts:parts/parts:part"/> <xsd:field xpath="@code"/> </xsd:key> <xsd:key name="manufacturerID"> <xsd:selector xpath="parts:manufacturers/parts:manufacturer"/> <xsd:field xpath="@id"/> </xsd:key> <xsd:keyref name="manufacturerPartRef" refer="parts:partID"> <xsd:selector xpath="parts:manufacturers/parts:manufacturer"/> <xsd:field xpath="parts:part/@ref"/> </xsd:keyref> <xsd:keyref name="partManufacturerRef" refer="parts:manufacturerID"> <xsd:selector xpath="parts:parts/parts:part"/> <xsd:field xpath="parts:manufacturer"/> </xsd:keyref> </xsd:element> <xsd:complexType name="Parts"> <xsd:sequence> <xsd:element name="part" type="parts:Part" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Products"> <xsd:sequence> <xsd:element name="product" type="parts:Product" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Manufacturers"> <xsd:sequence> <xsd:element name="manufacturer" type="parts:Manufacturer" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Part"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="description" type="xsd:string" minOccurs="0"/> <xsd:element name="assemblyInstruction" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="part" type="parts:Part" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="manufacturer" type="xsd:string" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="code" type="xsd:string"/> </xsd:complexType> <xsd:complexType name="Product"> <xsd:complexContent> <xsd:extension base="parts:Part"> <xsd:sequence> <xsd:element name="price"> <xsd:complexType> <xsd:sequence> <xsd:element name="amount" type="xsd:decimal"/> 67
XML Schemas
<xsd:element name="currency" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:complexType <xsd:sequence> <xsd:element <xsd:element <xsd:element name="Manufacturer">
name="name" type="xsd:string"/> name="address" type="xsd:string"/> name="part" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:attribute name="ref" type="xsd:string"/> </xsd:complexType> </xsd:element> </xsd:sequence> <xsd:attribute name="id" type="xsd:string"/> </xsd:complexType> </xsd:schema> A XML document which will be parsed by this schema is shown below: <?xml version="1.0" encoding="UTF-8"?> <!-<partsCatalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="partsCatalog.xsd"> --> <partsCatalog xmlns="http://www.ManufacturingUnlimited.co.za/partsCatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ManufacturingUnlimited.co.za/partsCatalog partsCatalog.xsd"> <parts> <part code="0112"> <name>PC-Flash</name> <description> The ultimate -- at least for this week </description> <assemblyInstruction> Plug keyboard in green socket </assemblyInstruction> <assemblyInstruction> Plug mouse or rat in purple socket </assemblyInstruction> <part code="0113"> <name>GrindAlong Motherboard</name> <description> The board your mother wished she had. </description> <manufacturer>128</manufacturer> </part> <manufacturer>123</manufacturer> </part> <part xsi:type="Product" code="1111"> <name>Deep Thought</name> <description> A model capable of very deep thinking. </description> <manufacturer>123</manufacturer> <price> <amount>231</amount> <currency>ZAR</currency> </price> 68
XML Schemas
</part> <part code="0114"> <name>MickyMouse</name> <manufacturer>123</manufacturer> </part> </parts> <manufacturers> <manufacturer id="123"> <name>Slap bam</name> <address>15 Semble-It Road, Industria</address> <part ref="0112"/> <part ref="1111"/> <part ref="0114"/> </manufacturer> <manufacturer id="124"> <name>Slap bam</name> <address>15 Semble-It Road, Industria</address> </manufacturer> </manufacturers> </partsCatalog>
XML Schemas
<xsd:documentation> Schema for Client </xsd:documentation> </xsd:annotation> <xsd:import namespace="http://www.solms.co.za/finance/account" schemaLocation="account.xsd"/> <xsd:complexType name="Client"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="account" type="acc:Account"/> </xsd:sequence> </xsd:complexType> <xsd:element name="client" type="cl:Client"/> </xsd:schema> An example XML file which would be parsed against that schema is shown below: <?xml version="1.0" encoding="UTF-8"?> <client xmlns="http://www.solms.co.za/finance/client" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.solms.co.za/finance/client client.xsd"> <name>Thabo</name> <account xmlns:acc="http://www.solms.co.za/finance/account"> <acc:name>currentAccount</acc:name> <acc:balance>100</acc:balance> </account> </client>
70
1. Introduction
The XML Linking Language (XLINK) facilitates simple uni-directional and more complex linking structures between resources or parts of resources (for example an element within another XML document). A resource can be regarded as any addressable unit of information or service. As of XHTML (the web page markup language) version 2.0, XLink will be the primary linking mechanism (the vocabulary no longer contains it's own linking tags). The elements of the XLink standard are in the following namespace: http://www.w3.org/1999/xlink
XLink defines both extended and simple links. The latter are really a shorthand notation for a special type of uni-directional, extended link to a single external resource. We shall first look at the case of simple XLinks before discussing the more general class of extended links.
2. Simple Links
Simple links are uni-directional links between 2 resources (or parts of resources). Furthermore, they are out-bounding and have to hence be defined in the source document which should reference the second document. They are in many ways similar to HTML links. Consider the following simple XHTML link: <a href="http://www.solms.co.za/downloads/file.pdf">The File</a>
Note that XML links are embedded into any other XML element, in any other vocabulary.
2.2.1. xlink:type
71
the type attribute specifies whether the link is a simple or an extended link.
2.2.2. xlink:href
This attribute contains the destination URI the link is linking to.
2.2.3. xlink:role
The role attribute has been included to allow for semantic classification of links. The possible values of the role are left to the XML developers and has not been specified by the W3C.
2.2.4. xlink:title
This attribute represents the title which should be displayed to users. It is similar to the title of HTML links which is usually rendered in blue (possibly underlines) by browsers. As browsers start supporting extended XML links they are most probably going to use the title to render them in a similar way to which HTML links are rendered.
2.2.5. xlink:show
This attribute specifies how the target URI will be rendered to users. It can assume the following values: replace. This value specifies that the target content should replace the content of the source context. If the rendering is done in a browser it would populate the content of the window or the frame containing the link with the content of the target URI. new. This value specifies that the target URI should be rendered in a separate context. If the rendering is done in a browser it means that the content of the target URI should be shown in a separate window launched by the browser. embedded. This value specifies that the content of the target URI should be embedded within the context containing the link at the link position. This is similar to the <img src="HiThere.png"> tags in HTML, where the image is embedded at the specified position within the source document. none. This is the default value. It specifies that the target URI should not be shown, i.e. that the link is their for semantic or design reasons and has no presentation consequences.
2.2.6. xlink:actuate
This attribute is used to specify what triggers the traversal of the link. Currently it can take on the following values: onRequest. This specifies that the link is traversed upon user request. If the presentation is done in a browser the link could be traversed when the user clicks on the link title. onLoad. This value is used to specify that the link should be traversed upon loading the context which contains the link. You may want to use this attribute when the show attribute is set to embedded or even when it is new, potentially opening two browser windows simultaneously. none. This value is used if the link should not be traversed.
Browser) as well as X-Smiles, do support simple linking. The XLink support in the latest Internet Explorer and Opera web browsers is simply not there. Not even simple links are currently supported. Doczilla, a Mozilla based browser from CiTEC, is the only browser with extended link support. It is currently in release candidate II (the last non-production release) and can be downloaded for Linux and Windows. The following XML linked XML documents demonstrate the support for simple links in Mozillabased browsers. It establishes links from the contract XML element to a photo and an XML document describing the agent. <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="contract.css" type="text/css" media="screen"?> <contract id="2003-02-12-a11"> <date>2003-01-12</date> <property propertyId="65412"> <stand> <standNumber>1332</standNumber> <suburb>Greenside</suburb> </stand> <streetAddress>25 Greenhill Str, Greenside</streetAddress> <photo xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="./property1.jpeg" xlink:role="photo" xlink:title="photo of property" xlink:show="embedded" xlink:label="agent details" xlink:actuate="onRequest">photo of property </photo> </property> <agent> <name>Carolyn Carolus</name> <agentDetails xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="./agent1.xml" xlink:role="agent description" xlink:title="agent" xlink:show="new" xlink:label="agent details" xlink:actuate="onRequest">agent details </agentDetails> </agent> </contract>
4. Extended Links
Extended links are much more powerful than simple links. In particular, they can be used to Define links between more than 2 resources. Define bi-directional links. Define in-bound links. Define links external to the source document (third-party links). Allow the association of meta-data to links.
73
For example, an extended link could link a property sale contract with a the property, its photo, the buyer, the seller and the transfer attorney and the financing bank as is depicted in figure Figure 5.1, Possible linking for a property sale contract..
<agent>Cindy Lee</agent> <contract xlink:type="resource" xlink:role="http://www.propdealer.co.za/roles/contract" xlink:title="The dead of sale." xlink:label="contract"id="May2001_102"> <date>2001/05/10</date> <purchasePrice>320000</purchasePrice> </contract> <client xlink:type="locator" xlink:href="http://www.propdealer.co.za/clientdatabase/PeterSmith.xml" xlink:role="http://www.propdealer.co.za/roles/buyer" xlink:title="The buyer of the property" xlink:label="buyer"/> <client xlink:type="locator" xlink:href="http://www.propdealer.co.za/clientdatabase/TandiKhumalo.xml" xlink:role="http://www.propdealer.co.za/roles/seller" xlink:title="The property seller" xlink:label="seller"/> <asset xlink:type="locator" xlink:href="http://www.propdealer.co.za/propertydatabase/Kensington_erf187 xlink:role="http://www.propdealer.co.za/roles/property" xlink:title="The property" xlink:label="property"/> <image xlink:type="locator" xlink:href="http://www.propdealer.co.za/propertydatabase/Kensington_erf187 xlink:role="http://www.propdealer.co.za/roles/photo" xlink:title="Photo of the property" xlink:label="photo"/> <serviceProvider xlink:type="locator" xlink:href="http://www.propdealer.co.za/attorneydatabase/SamDeBeer.xml" xlink:role="http://www.propdealer.co.za/roles/attorney" xlink:title="The transfer attorney" xlink:label="attorney"/> <contract xlink:type="locator" xlink:href="http://www.propdealer.co.za/contractdatabase/May2001_102.xml xlink:role="http://www.propdealer.co.za/roles/contract" xlink:title="The dead of sale." xlink:label="contract"/> <bank xlink:type="resource" xlink:role="http://www.propdealer.co.za/banks/bank" xlink:title="The buyer's bank." xlink:label="bank"> <name>Second National Bank</name> <branch>Johannesburg</branch> </bank> <aLink xlink:type="arc" xlink:from="contract" xlink:to="buyer" xlink:show="embed" xlink:actuate="onLoad"/> <aLink xlink:type="arc" xlink:from="contract" xlink:to="seller" xlink:show="embed" xlink:actuate="onLoad"/> <aLink xlink:type="arc" xlink:from="contract" xlink:to="property" xlink:show="replace" xlink:actuate="onRequest"/> <aLink xlink:type="arc" xlink:from="property" xlink:to="photo" xlink:show="replace" xlink:actuate="onRequest"/> <aLink xlink:type="arc" xlink:from="photo" xlink:to="property" xlink:show="replace" 75
xlink:actuate="onRequest"/> <aLink xlink:type="arc" xlink:from="contract" xlink:to="attorney" xlink:show="replace" xlink:actuate="onRequest" xlink:title="the dead of sale"/> <aLink xlink:type="arc" xlink:from="buyer" xlink:to="bank" xlink:show="replace" xlink:actuate="onRequest" xlink:title="Buyer's bank" xlink:arcrole="http://www.propdealer.co.za/roles/financier"/> </propertySaleContract>
4.3. Resources
A resource is any addressable unit of information or service. The following are examples of resources: An XML or HTML document. An element within an XML document. Any file. An image. A CORBA server. An application. Query results.
Note that even though XLinks must be hosted by XML documents they can link any types of resources. Since XLinks can be defined outside the source resource none of the resources being linked need themselves be an XML document.
xlink:role="http://www.propdealer.co.za/roles/buyer" xlink:title="The buyer of the property" xlink:label="Buyer" /> specifies a remote resource. In our example we use locators for the following remote resources: buyer seller transferAttorney property photo
For example <contract xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="resource" xlink:role="http://www.propdealer.co.za/roles/contract" xlink:title="The dead of sale" xlink:label="contract" id="May2001_102"> <date>2004/05/10</date> <purchasePrice>820000</purchasePrice> </contract> specifies a local resource element with attribute, id and sub-elements, date andpurchasePrice.
4.5. Arcs
Arcs provide the transversal paths between resources. Each arc is a one-way path between two resources, but an extended link can have arbitrarily many paths.
4.5.1. Arcs can point from resources which do not support links
For example, we have defined a binary link between the property XML resource and the photo for that property: <aLink 77
xlink:type="arc" xlink:from="property" xlink:to="photo" xlink:show="replace" xlink:actuate="onRequest"/> <aLink xlink:type="arc" xlink:from="photo" xlink:to="property" xlink:show="replace" xlink:actuate="onRequest"/>
5. Exercise
Draw a diagram showing the associations between the role players in a payment transaction (including payer, receiver, their accounts, their banks, and the transaction).
78
Chapter 6. XPath
1. Introduction
XPath is a language for addressing parts of an XML document. It is used in various XML standards, specifically XSLT and XPointer. In support of it's primary purpose, it also consists of a set of functions for basic String, Numeric and Boolean manipulation. Like a URL or a File Path may describe an element in a hierarchical file system, so an XPath targets a piece of information in the abstract, hierarchical structure of an XML document. This path-like structure is the reason for its name. An XPath expression is a non-XML, URL-like string which may, for example, be used as an attribute value in an XML vocabulary. This expression is evaluated to yield an object, which may be one of four basic types: node-set: An unordered collection of nodes without duplicates. In other words, the set of elements (tags) in the document which match the expression. boolean: true or false. number: A floating-point number string: A sequence of characters (in the Unicode Character System, which supports all the world's languages and glyphs)
Note
XPath, on its own, is purely a standard specification. It is quite useless unless it is used within another framework, such as XSLT. Having a single, abstract specification for the purpose of targeting XML elements is very beneficial, however, in terms of standardised tools and developer skill (no need to learn multiple addressing standards).
3. Location Paths
Locations paths are the most general, and also the most important, constructs in XPath. Location paths start with an optional '/', and consist of zero or more location steps separated by '/'. It can be seen that they are somewhat similar to UNIX file system paths, and indeed can be viewed as such. The primary difference is the location steps: One a file system, they are simple directory identifiers, whereas XPath has a more powerful set of constructs. The main types of location steps are: Axis Specifiers
79
XPath
The node test and predicate parts of a location step let you select a subset of the group of nodes that a particular axis specifier points to.
<wine grape="Cabernet Sauvignon"> <winery>Groot Constantia</winery> <year>1998</year> <prices> <!-- << Our Context Node --> <list>36.99</list> <discounted>29.99</discounted> <case>345.50</case> </prices> </wine> In the example above, we are going to navigate to the price of a case of wine in two steps. To make prices the context node (the node we're looking at / "selected" node), we could use the expression /child::wine/child::prices which, in shorthand form, can also be written as: /wine/prices Then, to select the price of a case of wine, we use child::case This will refer to an object (node) of type element which can indicate to us what a case of wine will cost. Besides child, other available tree relationships are: descendant:: parent:: ancestor:: following-sibling:: preceding-sibling:: following:: preceding::
80
XPath
Note
Despite the singular form of axis specifiers names like "ancestor" and "precedingsibling", only "self" and "parent" always refer to a single node. The others might be more aptly named "children", "ancestors", "preceding-siblings", and so forth, so that's how you should think of them: as ways of accessing those particular sets of nodes.
3.1.1. descendant
The descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes.
3.1.2. parent
The parent axis contains the parent of the context node, if there is one.
3.1.3. ancestor
The ancestor axis contains the ancestors of the context node; the ancestors of the context node consist of the parent of context node and the parent's parent and so on; thus, the ancestor axis will always include the root node, unless the context node is itself the root node.
3.1.4. following-sibling
The following-sibling axis contains all the following siblings of the context node. (Nodes with the same parent as the context node).
3.1.5. preceding-sibling
The preceding-sibling axis contains all the preceding siblings of the context node.
3.1.6. following
The following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants.
3.1.7. preceding
The preceding axis contains all nodes in the same document as the context node that are before the context node in document order, excluding any ancestors.
3.1.8. attribute
The attribute axis contains the attributes of the context node; the axis will be empty unless the context node is an element.
3.1.9. namespace
The namespace axis contains the namespace nodes of the context node; the axis will be empty un81
XPath
3.1.10. self
The self axis contains just the context node itself.
3.1.11. descendant-or-self
The descendant-or-self axis contains the context node and all the descendants of the context node.
3.1.12. ancestor-or-self
The ancestor-or-self axis contains the context node and all the ancestors of the context node; thus, the ancestor axis will thus always include the root node.
A node test that is a QName (Qualified Name, including namespace prefix if any) is true if and only if the type of the node is the principal node type, and has an expanded-name equal to the expanded-name specified by the QName. For example, child::para selects the para element children of the context node; if the context node has no para children, it will select an empty set of nodes. attribute::href selects the href attribute of the context node; if the context node has no href attribute, it will select an empty set of nodes.
82
XPath
Table 6.1.
Abbreviated Syntax (nothing) @ // . .. / Full Syntax Equivalent child:: attribute:: /descendant-or-self::node()/ self::node() parent::node() (Node tree root)
83
XPath
Note
In practice, the shortened versions of location paths are almost exclusively used. For more complex axis however, such as following-sibling::, there are no shorthand variations.
3.4. Predicates
A predicate filters a node-set which is specified by a location path. It is an expression which is tested against all nodes in the set, resulting in a new set of nodes who satisfy the expression. Predicates are appended to location paths by placing the expression in square brackets after the path, for example: /elementA/elementB[predicateExpr] will only include the elementBs that reside within any /elementAs, for which the expression predicateExpr holds true. A predicate expression is evaluated, and the result is converted to boolean: If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise; if the result is not a number, then the result will be converted as if by a call to the boolean function. Thus a location path para[3] is equivalent to para[position()=3].
Note
The functions mentioned in the explanation above (such as position()), form part of the XPath core function library - a standardised library of functions which adds significant functionality for the XPath developer.
XPath
We may now selectively describe node-sets in this document using XPath, based on any of the following criteria: Items that have a 'count' attribute, anywhere in the tree: //item[@count] Items that do not have a 'count' attribute, anywhere in the tree: //item[ not( @count ) ] Items with a currency specified in GB pounds: //item[ price[ @currency='ZAR' ] ] The descriptions of all items with a price less than 1000.00. //items/description[ following-sibling::price[ text() > 1000 ] ]
Note
Two important aspects of predicates are illustrated above: First, notice that predicates can be nested arbitrarily deep. Secondly, it's important to decide what node is being targeted, and then write each predicate in the context of that node (i.e. at that point, it is the context node).
85
XPath
Strings do not have any operators of their own in XPath, they are manipulated using the core function library.
number position()
number count(node-set)
node-set id(object)
string local-name(node-set?)
string namespace(node-set?)
string name(node-set?)
86
XPath
number string-length(string?)
string normalize-space(string?)
boolean not(object)
boolean true()
boolean false()
boolean lang(string)
number sum(node-set)
number floor(number)
number ceiling(number)
number round(number)
5. XPath Tools
XPath is for the most part a standards specification, and it is incorporated into XPointer, XSLT and XML Schema. As such, there are few tools directly related to it. Many professional XML editing tools (such as oXygen XML) have the functionality to apply an ar87
XPath
bitrary XPath expression to any XML document, and indicate the results. This is an excellent aid for development and testing.
88
1. Introduction
XSLT is an XML transformation language. It allows one to transform an XML document to another format (often also XML) in a standardised way by writing a Style Sheet (a set of transformation instructions) which is used by an XSLT processor to create a new transformed document. Together with XSL-FO (Formatting Objects), XSLT forms part of the XSL standard. Whereas XSL-FO is concerned with visual formatting, XSLT is a more generic language, concerned with syntactic formatting (the structure of data), which may of course ultimately produce a visual document. Using XSLT, however, one could produce any of the following from an XML document: Another XML document, either simply restructured, or conforming to a completely different vocabulary (Schema/DTD). Browseable HTML content: Legacy HTML (4.x) is considered a dead technology (with XHTML being preferred, thus falling into the category above), but it is possible to easily output legacy HTML, automatically taking all the language's quirks and considerations into account. Arbitrary text content, such as LaTeX, Plain text, Java source code, Rich text (RTF), etc.
3. Why XSLT ?
With other style languages such as the historical DSSSL and the more widespread CSS (Cascading Style Sheets) in use, the question arises as to what exactly the benefits of XSLT is? Does it replace those languages? When used to generate visually-oriented output, XSLT is significantly more powerful than both DSSSL and CSS, as it can fundamentally restructure or change document content. CSS, though powerful (and pleasantly simple), can only perform visual layour of data (i.e. as it is rendered in a browser window). It is important to realise, however, that the name Extensible Style Language for 89
Transformation is misleading: XSLT is not specifically a visual language, but merely a powerful tool for transforming your documents to any other vocabulary, which might be (and traditionally always was) something like XSL-FO (or XHTML). XSLT style sheets are written in XML, which means that standard XML processing and authoring tools may be used. Furthermore, and additional language syntax need not be learnt. A public standard for transformation means that a wide variety of compatible tools are developed throughout the community. For example, almost all modern web browsers contain XSLT processors, all adhering to the same standard, yeat each being implemented in isolation. Lastly, by making use of XSLT, you are ensuring that your organisation does not lock itself into a particular vendor's products: If you do not like a particular editor or processor, simply make use of another one. The alternative (restructuring XML data using program code) directly couples you to a particular platform (not to mention increased complexity).
4. XSLT Templates
The concept of the template is fundamental to XSLT. A template specifies the transformation that is to be applied to a specific part of the source document. Every stylesheet must contain at least one template if it is to be useful, and often they may contain a large number of them.
<title>Some Title</title> <para>A source paragraph.</para> <note>A note.</note> <para>Another source paragraph.</para> The value of the match attribute is an XPath string. Thus, the example above indicates that the template should be applied to all elements named para that are children of the current element.
90
This distinction is necessary because an element that may contain text, is often also able to contain other elements. (So-called 'mixed content'). When an embedded element is placed in the middle of a text string, for example: <para>This is some <emphasis>nice</emphasis> paragraph text.</para> the para now contains three nodes.
Note
This instruction is added even to elements that do not contain other (child) elements, as it is needed to process the text content of those elements.
Note
The text() keyword is a XPath axis node test that matches on text nodes. Also, the value-of element is explained later on.
by anybody. </classified> The following template will hide all classified text: <template match="classified"> <!-- Do nothing (hide text) --> </template>
Note
The select attribute only specify which descendants are eligible to templatting: A suitable template still has to match (through its 'match' expression) in order to be applied. The apply-templates element can occur as many times as you like, enabling you to control exactly which elements are allowed to be templatted, as well as the order in which they are output: <template match="employee"> <apply-templates select="contact-details"/> <apply-templates select="financial-details"/> <apply-templates select="birthday"/> </template> This is the first basic mechanism that enables the re-ordering of XML content.
The apply-templates element is the barrier between the part of the template that is processed before the invocation of child templates (and their output), and the part that is processed afterwards. <template match="beer"> <!-- BEFORE CONTENT OF BEER --> <apply-templates/> <!-- AFTER CONTENT OF BEER --> </template> Placing text at these positions, effectively specifies the prefix and suffix to appear around the content, e.g. <template match="note"> NOTE: <apply-templates/> </template> 'Template text' such as this may be enclosed in the text element: Usually this element has no effect on the text output, but it can be used (though additional attributes) to control what white space is output, as well as other additional benefits. <template match="note"> <text>NOTE: </text><apply-templates/> </template>
93
<template match="book/title"> <element name="h1"><xsl:apply-templates/></element> </template> This approach avoids the namespace issues, but is more complex (and less readable) in almost all circumstances. The element tag is useful for developing DTD-based documents, but Schemabased documents, combined with namespaces, are usually the superior approach.
Note
At all times, remember that an XSL stylesheet must be a valid XML document, including any elements that are inserted. Elements may not overlap, and all opening tags require closing tags.
sequencing capabilities can be used to great effect to fundamentally alter document structures. In the following example, a person is transformed into a different vocabulary (with the same fundamental information): <?xml version="1.0" encoding="UTF-8"?> <employee> <name>Jane</name> <surname>Soranson</surname> <phone-no>+27 (11) 555-5555</phone-no> <empno>236-8134</empno> </employee> Transformed with the following stylesheet: <?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="employee"> <emp> <xsl:attribute name="number"> <xsl:value-of select="empno"/> </xsl:attribute> <phone> <xsl:value-of select="phone-no"/> </phone> <fullname> <xsl:value-of select="name"/> <xsl:text xml:space="preserve"> </xsl:text> <xsl:value-of select="surname"/> </fullname> </emp> </xsl:template> </xsl:stylesheet> The output: <?xml version="1.0"?> <emp number="236-8134"> <phone>+27 (11) 555-5555</phone> <fullname>Jane Soranson</fullname> </emp>
95
Strengths: A single file needs to be distributed. Useful for environments where no more than a single file is feasible. Weaknesses: The embedded stylesheet may have to be replicated across a large number of documents, because embedded XSLT code is non-reusable. This makes maintenance very difficult.
96
1. Introduction to CSS
CSS is a simple language defined by the W3C to apply visual formatting properties to the elements of an XML or HTML document. As opposed to earlier versions of HTML that embedded formatting tags (such as font, b (bold) and i (italic)) together with meaningful tags such as lists, tables, quotes and paragraphs, CSS takes an external approach to styling documents. This technique has several benefits: Documents remain simple, pure containers of information. They are both smaller and simpler to parse. A specific set of information is not directly tied to its visual presentation. A different style sheet can simply be attached, and information can be rendered differently for different output formats, e.g. for screen versus printer. The CSS technique of creating selectors to apply style based rules (that are matched against the style sheet) if far more flexible and powerful than any one-to-one direct application of style in a document. Large style sheets can modularly be assembled from smaller style sheets with localised responsibility. This allows the author to take a component-based approach to style creation.
Note
CSS is a relatively straightforward and compact style language, designed only to specify the display characteristics of a document. The fundamental document structure cannot be altered with CSS, and for such purposes one has to look at XSL Transformation. CSS is however very popular, as it is ideally suited to the web in which it is widely deployed to style (X)HTML documents. Modern browser support is quite good (with the exception of 97
Styling XML/XHTML with Cascading Style Sheets the Microsoft Internet Explorer browsers, which support a notably smaller subset of CSS than most other browsers).
Styling XML/XHTML with Cascading Style Sheets In addition to attaching external style sheets, style information can be directly embedded into the document if so required. This done within the style element in the document head: <style type="text/css"> h1 { color: green; background-color: #99FF99; font-size: 24pt; } </style> One would generally only embed style information in the document if it is not feasible (or practical) to reference an external file, for example an automatically generated e-mailed report. In most cases, however, it is beneficial to separate the style information from the XHTML document.
Note
It is advisable to attach style sheets using this legacy mechanism only if the target browser does not support the generic (XML-style) linking mechanism.
The CSS format: selector { style-property: value; style-property: value; ... } anotherSelector { style-property: value; style-property: value; ... }
3.1. Comments
Comments can be inserted anywhere in CSS code, and these follow the standard Java / C blocked comment syntax: /* This is a comment, ignored by the parser */
99
Table 8.1. Common selection patterns for targeting style rules at elements
Selection Pattern * e e1, e1 e1 + e2 The Selection Matches on all elements. Matches on all elements of type e. Matches any element of type e1 or e2. Matches on any element e2 which follows an element e1. Matches on any element e2 which is a descendant of an element e1. Matches on any element e2 which is a direct child element of e1. Matches on any element e which has an attribute with name a. Matches on any element e which has an attribute with name a which has value val. Matches on any element e which has an attribute with name a which has a space seperated value list with one of the values equal to val. Matches on any element with a class attribute which has value className. Matches on an element of type e with a class attribute which has value className. Matches on any element with an id attribute which has value idValue. Matches on element e with an id attribute which has value idValue.
e1 e2
e1 > e2
e[a]
e[a="val"]
e[a-="val"]
.className
e.className
#idValue
e#idValue
In addition to the standard selectors, there are a range of pseudo-selectors which allows for further dynamic styling possibilities based on information not explicitly defined in the document at authoring-time. Many of these selectors are not yet widely supported:
100
The Selection Matches on an element which is the first child element of its parent element, for example: div > p:first-child matches on all paragraphs which are the first child element within a div element. Matches on all elements of type hyperlink, e.g. a elements. Matches on all elements of type hyperlink that have already been visited by the user. Matches on elements while the user designates them with a pointing device, without activating them. For example, while the mouse hovers over a hyperlink. The :active pseudo-class applies while an element is being activated by the user. For example, between the times the user presses the mouse button and releases it. The :focus pseudo-class applies while an element has the focus (accepts keyboard events or other forms of text input). It is possible to write selectors in CSS that match an element based on its language. For example, to match on all french paragraphs (as determined by the xml:lang property) one could write a selector: p:lang(fr) Allows you to apply special styles to the first formatted line of a block-level element, e.g. a paragraph. For instance, div.intro:first-line matches on the first line of any div with class intro. Allows you to apply special styles to the first letter of a block-level element, e.g. a paragraph. The ':before' and ':after' pseudo-elements can be used to insert generated content before or after an element's content.
:link
:visited
:hover
:active
:focus
:lang
:first-line
:first-letter
101
Styling XML/XHTML with Cascading Style Sheets The W3C publishes the CSS specification, and version 2.1 can be found at: http://www.w3.org/TR/CSS21/ The following sections explain the commonly used style properties:
4.1.1. 'font-family'
This is the font itself, such as 'Times New Roman', 'Arial' or 'Verdana'. The font you specify must be on the user's computer, so there is little point in using obscure fonts. There are a select few 'safe' fonts (the most commonly used are arial, verdana and times new roman), but you can specify more than one font, separated by commas. The purpose of this is that if the user does not have the first font you specified, the renderer (e.g. browser) will go through the list until it finds one it does have. This is useful because different computers sometimes have different fonts installed. So font-family: arial, helvetica, for example, is used so that similar fonts are used on PC (which usually has arial, but not helvetica) and Apple Mac (which does not usually have arial, and so helvetica, which it does normally have, will be used).
Note
If the name of a font is more than one word, it should be put in quotation marks, such as font-family: "Times New Roman".
4.1.2. 'font-size'
The size of the font. Specified in any of the standard CSS units of measure.
4.1.3. 'font-weight'
This states whether the text is bold or not. In practice this usually only works as font-weight: bold or font-weight: normal. In theory it can also be bolder, lighter, 100, 200, 300, 400, 500, 600, 700, 800 or 900, but many renderers do yet support this precise level of control.
4.1.4. 'font-style'
This states whether the text is italic or not. It can be font-style: italic, font-style: normal or font-style: oblique .
4.1.5. 'text-decoration'
This states whether the text is underlined or not. This can be: text-decoration: overline, which places a line above the text. text-decoration: line-through, or strike-through, which puts a line through the text. text-decoration: underlineshould only be used for links, because users generally expect underlined text to be links. This property is usually used to decorate links, such as specifying no underline with text-decoration: none.
4.1.6. 'text-transform'
102
Styling XML/XHTML with Cascading Style Sheets This will change the case of the text. text-transform: capitalize turns the first letter of every word into uppercase. text-transform: uppercase turns everything into uppercase. text-transform: lowercase turns everything into lowercase.
4.1.7. Examples
Figure 8.2. Example Text CSS Properties
body { } h1 { } h2 { } a { } strong { }
font-size: 2em;
font-size: 1.5em;
text-decoration: none;
4.2.1. Colour
Color specifications can take the form of a name, an 'rgb' (red/green/blue) value or a 'hex' code. The following are different ways to specify the color red: color: color: color: color: color: red; rgb(255,0,0); rgb(100%,0%,0%); #ff0000; #f00;
103
Styling XML/XHTML with Cascading Style Sheets There are 16 valid predefined colour names. These are aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, purple, red, silver, teal, white, and yellow. transparent is also a valid value.
You will see that this leaves one-character width space around the secondary header and the header itself is fat from the three-character width padding. The four sides of an element can also be set individually: margin-top, margin-right, margin-bottom, margin-left, padding-top, padding-right, padding-bottom and padding-left are the self-explanatory properties you can use for this.
Note
A useful mechanism to centre-align a block within another block is to set the left and right margins to auto.
104
Note
You don't have to use all three surrounding 'boxes', but it can be helpful to remember that the box model can be applied to any element on the document.
border-style: dashed; border-width: 3px; border-left-width: 10px; border-right-width: 10px; border-color: red;
4.4.1. 'display'
The most fundamental types of display are inline, block and none. inline does just what it says - elements that are displayed inline follow the flow of a line. In XHTML, strong, anchor (a) and emphasis (em) elements are traditionally displayed inline. block puts a line break before and after the element. In XHTML, Header (h*) and paragraph (p) elements are examples of elements that are traditionally displayed block-line. none means that an element is not displayed at all, and that the display space it would have occupied is free for use by other elements.
Styling XML/XHTML with Cascading Style Sheets tabindex="1" accesskey="e" title="Enter keyword(s)" /> </div> <div id="buttons"> <input name="btnG" id="btnG" type="submit" value="Google Search" tabindex="2" accesskey="s" title="Start your search" /> <input name="btnI" id="btnI" type="submit" value="I'm Feeling Lucky" tabindex="3" accesskey="l" title="Automatically opens the best match" /> </div> </form> <div class="options"> <a href="http://www.google.com/advanced_search"> Advanced Search</a> <a href="http://www.google.com/preferences" accesskey="p"> Preferences</a> <a href="http://www.google.com/language_tools" accesskey="l"> Language Tools</a> </div> <div class="services"> <a href="http://www.google.com/ads/"> Advertising Programs</a> <a href="http://www.google.com/services/"> Business Solutions</a> <a href="http://www.google.com/about.html"> About Google</a> </div> <div class="copyright"> © 2004 Google - Search 4 trillion web pages </div> </body> </html> And in your browser, it may look like:
107
Styling XML/XHTML with Cascading Style Sheets By changing one line of code in the XHTML (adding the reference to the stylesheet), we can provide a look/feel:
Note:
This style deliberately mimics the actual style of Google. It is also possible to entirely restructure the page layout. The source code of the style sheet is presented below: /* Cascading Style Sheet for the simulated Google front page Author: Solms TCD Date: May 2004 */ body { text-align: center; background-color: white; font-size: 14px; font-family: sans-serif; margin: auto; width: 550px; } /* Google Heading */ h1 { font-family: fantasy; font-weight: normal; font-size: 4em; margin: 0; color: green; } h1:first-letter { font-family: cursive; font-style: italic; font-size: 1.5em; 108
/* Search Form Elements */ form#searchForm { margin: auto; text-align: center; background-color: #BBFFBB; border: 1px dotted green; width: 75%; padding: 5px; float: left; margin-bottom: 5px; } input[type="text"] { padding: 3px; margin: 3px; border: solid green 2px; width: 95%; } input[type="text"]:focus, input[type="text"]:hover { border-color: red; } input[type="submit"] { background-color: green; border: 1px solid white; color: white; padding: 2px; margin: 3px; font-size: 0.75em; } div#buttons { text-align: right; padding-right: 5px; } /* Links and Menus */ div.sections { border-color: green; border-style: solid; border-width: 0 0 1px 0; padding: 5px; margin-bottom: 5px; } div.options { float: right; text-align: left; width: 20%; font-size: 0.8em; } div.options a { text-decoration: none; display: block; margin: 0.5em 0 0.5em 0; } 109
div.services { clear: both; border-color: green; border-style: solid; border-width: 1px 0 0 0; padding: 5px; } a { }
color: #FF6600;
a:hover { color: red; text-decoration: none; } div.sections a { font-weight: bold; margin: 0 1em 0 1em; } div.sections em[title="New!"] a:after { content: "*"; } div.copyright { margin: 10px; font-size: 0.8em; color: grey; }
110
1. Introduction to XSL-FO
XSL Formatting Objects is itself an XML-based markup language that lets you specify in great detail the pagination, layout, and styling information that will be applied to your content. The XSL-FO markup is quite complex. It is also verbose; virtually the only practical way to produce an XSL-FO file is to use XSLT to produce a source document. Finally, once you have this XSL-FO file, you need some way to render it to an output medium. The XSL-FO vocabulary is a large one, and it was designed to describe exactly how the information is presented, but in the process, the meaning of the data (i.e. what it represents) is lost. Consider the following XML data: <?xml version="1.0" encoding="UTF-8"?> <temparatureData> <measurementLocation> <country>South Africa</country> <stateOrProvince>Gauteng</stateOrProvince> <city>Nelspruit</city> </measurementLocation> <measurements> <measurement date="01-Jan-2003 12:09"> <c>34</c> </measurement> <measurement date="02-Jan-2003 11:51"> <c>32</c> </measurement> <measurement date="03-Jan-2003 12:03"> <c>36</c> </measurement> <measurement date="04-Jan-2003 12:14"> <c>31</c> </measurement> <measurement date="05-Jan-2003 11:27"> <c>29</c> </measurement> </measurements> </temparatureData> A specific XML vocabulary (for example, the temperature data above) only conveys the information in it's "pure" state - with no regard to presentation or distribution. Though easy to understand to human eyes (due to the verbosity of XML), it is primarily processed and understood by machine. XSLFO, on the other hand, is an XML vocabulary which could, for instance, specify precisely how the above data should be represented on a nice glossy print-out, intended for human consumption. It is concerned entirely with the presentation of data, and would no longer contain any tags which could describe to a machine that this is indeed temperature data. Consider, for example, that we want to render the temperature data to a PDF file which represents the data in a simple table on a 12cmx12cm CD cover, for example:
111
The XSL-FO data involved is quite verbose, and one can see clearly that the data is no longer described by the vocabulary - only its appearance is: <?xml version="1.0" encoding="UTF-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master margin-bottom="0.5cm" margin-top="0.5cm" margin-right="0.5cm" margin-left="1cm" page-width="12cm" page-height="12cm" master-name="myPage"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence initial-page-number="1" master-reference="myPage"> <fo:flow flow-name="xsl-region-body"> <fo:block text-align="left" font-size="14pt" font-family="Helvetica" space-after="2.5em" border-style="solid" border-width="0pt 0pt 1pt 0pt" border-color="grey"> Temperature readings for: <fo:inline font-weight="bold">Nelspruit (Gauteng)</fo:inline> </fo:block> <fo:table> <fo:table-column column-width="3cm"/> <fo:table-column column-width="6cm"/> <fo:table-body> <fo:table-row> <fo:table-cell text-align="center" font-weight="bold" border-style="solid" border-width="1pt"> <fo:block>Temp (C)</fo:block> </fo:table-cell> <fo:table-cell text-align="center" font-weight="bold" border-style="solid" border-width="1pt"> <fo:block>Date</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell border-style="solid" border-width="1pt" font-weight="bold"> <fo:block>34</fo:block> </fo:table-cell> <fo:table-cell border-style="solid" border-width="1pt"> <fo:block>01-Jan-2003 12:09</fo:block> 112
</fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell border-style="solid" border-width="1pt" font-weight="bold"> <fo:block>32</fo:block> </fo:table-cell> <fo:table-cell border-style="solid" border-width="1pt"> <fo:block>02-Jan-2003 11:51</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell border-style="solid" border-width="1pt" font-weight="bold" color="red"> <fo:block>36</fo:block> </fo:table-cell> <fo:table-cell border-style="solid" border-width="1pt"> <fo:block>03-Jan-2003 12:03</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell border-style="solid" border-width="1pt" font-weight="bold"> <fo:block>31</fo:block> </fo:table-cell> <fo:table-cell border-style="solid" border-width="1pt"> <fo:block>04-Jan-2003 12:14</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell border-style="solid" border-width="1pt" font-weight="bold"> <fo:block>29</fo:block> </fo:table-cell> <fo:table-cell border-style="solid" border-width="1pt"> <fo:block>05-Jan-2003 11:27</fo:block> </fo:table-cell> </fo:table-row> </fo:table-body> </fo:table> </fo:flow> </fo:page-sequence> </fo:root> It is apparent why one would almost never want to hand-code XSL-FO data - it is best transformed from some other XML vocabulary using XSLT. The example above is quite simplistic - the real power of XSL-FO becomes apparent when one considers it's quite advanced manipulation of page sets and masters, international capabilities (e.g. Japanese text), handling of page numbers, and the like. These features justify the complexity, and if not required, it is often useful to use a simpler styling technology such as CSS.
113
Web Services comprises, technically, of the following: XML Messaging (SOAP). The use of XML to represent a universally understood message. Though other legacy standards exist (such as XML-RPC and REST), SOAP (adhering to the WSI profile) is the preferred standard that enables strongly typed, object-oriented message exchange (based on W3C XML Schema). Service Transport. XML Messages can be transported on a variety of protocols, such as HTTP, SMTP (e-mail), FTP, etc. HTTP is by far the most common. Service Description (WSDL). A formal description of the service, which provides complete information (including messages, exchange patterns, data types (links to XML Schemas), and endpoint addresses) for a machine to invoke the service. To this end, WSDL descriptors can be used to generate all the data and user interface components to invoke a SOAP service. It is, in effect, the web services contract. Service Discovery (UDDI). A platform-independent, XML-based registry for businesses 114
Web Services
worldwide to list themselves on the Internet. UDDI is an open industry initiative (sponsored by OASIS) enabling businesses to discover each other and define how they interact over the Internet. UDDI is, at this stage, the least successful of the web services standards, as it assumes a business model and level of interoperability which has not yet materialised. In addition, the ebXML standard (also managed by OASIS) encompasses a lot of functionality which overlaps with UDDI, and even WSDL. Because Web Services is an application of XML, it is completely extensible. To this end, several industry standards are emerging (typically as extensions or components of the SOAP messaging standard): WS-Security. Enables authentication of actors and confidentiality of the messages sent. This is accomplished by applying existing standards (such as X.509 digital certificates) to SOAP messages. WS-Reliability. A specification that fulfills reliable messaging requirements, critical to some applications of Web Services. WS-Management. Provides a universal language that all types of devices can use to share data about themselves so they can be managed more easily. This would enable better management and auditing of devices on, say, a corporate network.
Note
These standards are above and beyond the standardisation which is possible with XML Schema, which already provides the mechanism needed to develop standardised data vocabularies. These industry standards specifically apply to the exchange of the messages, which typically have behavioural implications to the nodes which process the messages. One of the most important aspects of working with web services is the awareness that it, in itself, defines no new technological elements. It is simply a standardised usage of existing networking and XML technology, in order to enable that elusive goal: Interoperability. Furthermore, it itself becomes a lower-level component of architectures such as ESB (Enterprise Services Bus).
2. SOAP
2.1. Introduction to SOAP
SOAP was originally an acronym for Simple Object Access Protocol, but this acronym has been dropped by the W3C (now it's just a name). SOAP is the de-facto standard for Web Services in general, and is specifically designed for Application-to-Application (A2A) communication - it is the ideal technology to perform Business-to-Business (B2B) and Enterprise Application Integration (EAI) tasks. To be truly effective, an integration protocol must be: platform-independent, flexible, based on standard, ubiquitous technology.
Platform independence, in this case, refers to development model (e.g. object-oriented or procedural),
115
Web Services
Unlike some earlier integration technologies (CORBA, EDI, Microsoft DCOM, Java RMI) SOAP meets all these requirements. It enjoys widespread use, and is endorsed by most enterprise software vendors, and major standards organisations (W3C, WS-I, OASIS, etc). SOAP Consists of: A (simple) XML markup language A set of rules which dictate its use
116
Web Services
For example, here is the most basic (empty) SOAP message, which specifies both a header, and a body: <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> <!-- Header elements here --> </soap:Header> <soap:Body> <!-- Body element here --> </soap:Body> </soap:Envelope> Or, as a more concrete example, a course enrollment (with no headers): <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tr="http://solms.co.za/example/training" xmlns:p="http://solms.co.za/example/people"> <soap:Body> <tr:enrollment> <tr:event> <tr:course> <tr:name>Programming in Java</tr:name> <tr:id>672</tr:id> </tr:course> <tr:date>2005-01-31</tr:date> </tr:event> <tr:candidate> <p:name>Koos</p:name> <p:surName>Kombuis</p:surName> <p:email>koos@kombuis.co.za</p:email> <p:phone>555-5555</p:phone> </tr:candidate> <tr:billingAddress> ... </tr:billingAddress> 117
Web Services
</tr:enrollment> </soap:Body> </soap:Envelope> A SOAP Message adheres to the SOAP 1.1 XML Schema (which can be obtained from http://schemas.xmlsoap.org/soap/envelope/) which requires that elements and attributes be fully qualified (use namespace prefixes). A SOAP message (Envelope) may have: a single Body element (containing the actual message), preceded by an optional single Header element (containing optional header elements)
While both of these modes are technically permissable, RPC messaging is available mostly for historical purposes, and does not provide any of the flexibility or elegance implied by the Document mode. Furthermore, there are historically two ways of representing values in a SOAP message, based on the encoding style used: A SOAP-specific type system, which caused great interoperability issues Literal usage of the XML Schema type system
SOAP Encoding is explicitly prohibited by the WS-I Basic Profile, and should be avoided at all costs. The term Literal, on the other hand, means that the XML document fragment can be validated against its XML Schema.
The drawback of this method is a much less flexible service interface: Every time a new variation of a request (method) is required, a whole new service interface needs to be published. Furthermore, conceptually, the full information (of the invocation) is not contained within the data itself, as the method name plays an important role. The SOAP Body is structured according to a fixed convention to represent these items (e.g. parameters). Contrast to this the Document messaging style, which is based on the concept of polymorphism on message arguments, a feature which is still lacking from most object-oriented programming lan118
Web Services
guages. Conceptually, one still has a contract that specifies which messages a component is willing to accept (in the case of SOAP, this is contained in the WSDL) but these messages can be freely extended (to add additional information or state) by clients without the server's knowledge. If a receiver does not understand the extended message, it may continue to process the message at the level of abstraction that it does understand. If the component that receives the messages is developed in an environment that is aware of polymorphism on message arguments (that is, it will automatically route the messages to the module or method that processes the message at the most specific (concrete) level of abstraction) there are significant benefits in terms of simplicity and flexibility. In effect, this biggest difference between RPC and Document style is that Document style makes no assumptions as to the structure of the request message - it is simply a self-contained request message, which describes its own structure by its associated XML Schema.
Note
A SOAP message itself does not explicitly contain (or advertise) it's encoding or messaging style. Though this can, to some extent, be inferred from a message, it is the web services contract (WSDL) the explicitly contains this information.
Furthermore, the Schema Instance namespace, used to add attributes to indicate the type of an element (if the type extends an expected base type), is
119
Web Services
http://www.w3.org/2001/XMLSchema-instance
2.3.1.2.1. Namespace Examples Consider the following scenario: A Web Service accepts a message to add a Person to a repository of some kind (say, a Human Resources system). Our schema defines an Employee type (a special type of person, with added state). The schema containing the core types do not know about the web service messages, and a separate schema is created to define an AddPersonRequest message. This is the schema defining our primary types. It is in no way linked to SOAP, and may be re-used throughout the organisation (or the world): <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:people="http://example.org/hr/people" targetNamespace="http://example.org/hr/people" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="Person"> <xs:sequence> <xs:element name="firstName" type="xs:string"/> <xs:element name="lastName" type="xs:string"/> <xs:element name="birthDate" type="xs:date" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:complexType name="Employee"> <xs:complexContent> <xs:extension base="people:Person"> <xs:sequence> <xs:element name="joinDate" type="xs:date"/> <xs:element name="empNo" type="xs:positiveInteger"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="person" type="people:Person"/> </xs:schema> This is the schema which defines the message types for our SOAP service. It imports the schema above, and merely defines types for the messages (which usually correspond to the use cases of our system): <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:people="http://example.org/hr/people" xmlns:service="http://example.org/hr/webservice" targetNamespace="http://example.org/hr/webservice" elementFormDefault="qualified" attributeFormDefault="unqualified"> <!-- Import the people (core) types --> <xs:import namespace="http://example.org/hr/people" schemaLocation="people.xsd"/> <!-- Define a request for the 'add person' use-case --> <xs:complexType name="AddPersonRequest"> <xs:sequence> <xs:element name="person" type="people:Person"/> 120
Web Services
<xs:element name="addPersonRequest"></xs:element> </xs:schema> The following SOAP message containing the request (to add an Employee): <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <soap:Body> <service:addPersonRequest xmlns:service="http://example.org/hr/webservice" xmlns:people="http://example.org/hr/people"> <service:person xsi:type="people:Employee"> <people:firstName>John</people:firstName> <people:lastName>Deere</people:lastName> <people:birthDate>1971-06-07</people:birthDate> <people:joinDate>2004-01-01</people:joinDate> <people:empNo>1337</people:empNo> </service:person> </service:addPersonRequest> </soap:Body> </soap:Envelope> is, from an XML and SOAP point of view, absolutely identical to the following: <?xml version="1.0" encoding="UTF-8"?> <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:service="http://example.org/hr/webservice" xmlns:people="http://example.org/hr/people"> <Body> <addPersonRequest xmlns="http://example.org/hr/webservice"> <person xsi:type="people:Employee"> <people:firstName>John</people:firstName> <people:lastName>Deere</people:lastName> <people:birthDate>1971-06-07</people:birthDate> <people:joinDate>2004-01-01</people:joinDate> <people:empNo>1337</people:empNo> </person> </addPersonRequest> </Body> </Envelope> Note that, as long as namespace prefixes are visible to elements using them (by nesting those elements inside an element which contains the declaration) it doesn't matter where we declare the namespaces. Also, from a service contract point of view, the service may or may not care that we are in fact sending an employee. The only guarantee is that it accepts Person instances, though we may choose to send it any specialised type.
121
Web Services
Like the stamps and adhesives (and possibly even the address) placed on the outside of a traditional mail envelope, SOAP Headers provide information about the transit of the message. This information is available (and possibly modifiable) to all nodes that participate in a SOAP request: sender, ultimate receiver, and any intermediaries (routing, security, logging, etc) in between. The applications along the message path (sender, ultimate receiver, intermediaries) are also known as SOAP Nodes. The rules are straightforward: intermediaries should not alter the contents of the Body intermediaries are allowed to add to, or remove from, the Header
Like the Body, the nature of the (required) Header elements for a particular message invocation may be specified in the web service contract (WSDL), although any number of unspecified headers may be present as well. Like the body, the structure of headers are specified by a schema. Consider the following example: <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:dl="http://example.org/systems/distributed-logging"> <soap:Header> <dl:logRequest logger="http://162.34.52.199/log" id="02-34-97"/> </soap:Header> <soap:Body> <!-- Body content here --> </soap:Body> </soap:Envelope> A client uses a well-known distributed logging system, and to that end, inserts a header requesting that each node that processes the message, send a log message (containing the unique message identifier) to the indicated logging server. This way, the client can track the progress of his message. Nodes that do not understand the ehader, may simply ignore it. Furthermore, a node at a company network boundary may remove the header from the incoming message if it so wishes, in order to prevent internal nodes from trying to respond. You can see that Header nodes do not contain critical information relating to the Body, but they are ideal for (and are actively being applied in some of these areas): security. The header can include authentication information, or even a digital certificate metadata. For example, a header to indicate a claim that the message conforms the the WS-I basic profile logging/tracking. A header may request that each node add information about itself to the header, and the header may then be sent back to the client which now has a full trace of nodes along the message path unique identification. By assigning each message a unique message ID business process steps, e.g. authorisation. Assuming a business process (workflow) is composed of a message traveling through many nodes (processing steps), it could, for example, request nodes to gaim human authorisation before passing a message on
Web Services
2.3.2.1.1. The 'actor' attribute The actor attribute is defined in the standard SOAP namespace, and is used to identify a function to be performed by a particular node.
Note
Just as a person can perform one or more roles, i.e. both parent and employer, a node can play one or more roles. For example, the same node can perform both logging and security services. To that end, the actor attribute really indicates a role. The designers of SOAP recognises this mistake, and in the formalising SOAP 1.2 specification, this header has been renamed role (retaining the existing meaning in terms of header processing) The actor attribute uses a URI to identify the role that a node must perform, in order for it to act on a given header block. For example, let's presume an established role is that of a message age validator, i.e. it must check that a message has been in transit for no longer than a certain amount of time, or otherwise it must reject the message. This role might be called http://realtime.org/soap/agevalidator Let's presume a certain real-time-messaging application sent a message that must be guaranteed delivery within one second, otherwise it must be sent back to the client. The message might look like: <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tm="http://example.org/soap/message-expiry" xmlns:msg="http://example.org/messaging/sms"> <soap:Header> <tm:maxAge from="2005-03-15T17:32:01+02:00" maxAge="1" soap:actor="http://realtime.org/soap/agevalidator"/> </soap:Header> <soap:Body> <msg:urgentMessage> ... </msg:urgentMessage> </soap:Body> </soap:Envelope> Any node that plays the age validation role (and understands the header block) may then act upon the header (sending a fault to the client if the message has expired). Nodes which do not play an indicated role, should never act upon or alter header blocks which do not target them. In addition to custom URIs, SOAP defines two built-in actors (roles): next and ultimate receiver: next. Identified by the URI http://schemas.xmlsoap.org/soap/actor/next, this actor implies that the next node, regardless of role, should act on the header. ultimate receiver. This role is implied if no actor header is present, and it indicates the only the ultimate receiver (web service) should act on the header.
When a node processes a header block, it must remove it from the SOAP message. It may also add new header blocks, or even add the block it has removed, though this must then be a conscious processing step, i.e. with the specific intention that another node should also process that header.
123
Web Services
2.3.2.1.2. The 'mustUnderstand' attribute Header blocks may indicate whether processing is mandatory by adding the mustUnderstand attribute (also in the standard SOAP namespace), similar to how the actor attribute is added. This attribute may contain the value "0" (for false) or "1" (for true). The "Understand" in mustUnderstand means that a node playing the indicated role, must understand the structure of the header block, and know how to process it. If a node doesn't understand a mandatory header block, it must generate a SOAP Fault, and discard the message. It must not forward the message to the next node in the message path.
Web Services
</cars:invalidRegistration> </detail> </soap:Fault> </soap:Body> </soap:Envelope> Note that the Fault element and its children are part of the SOAP namespace, just as the SOAP Envelope and Body elements are. Also, note that the children of the Fault element weren't qualified with the soap prefix. The children of the Fault element may be unqualified, according to the WS-I Basic Profile. In other words, they need not be prefixed with the SOAP 1.1 namespace. Note as well that it's forbidden for the Fault element to contain any immediate child elements other than faultcode, faultstring, faultactor, and detail.
Note
It is also possible (thought not recommended) to use non-standard fault codes that are prescribed by other organizations and belong to a separate namespace (fault codes must be prefixed with a namespace bound in the document).
Web Services
authentication node, did not recognize a mandatory (mustUnderstand="1") header block, so it generated a MustUnderstand fault. In this case, the authentication node must identify itself using the faultactor element.
Note
For programming languages that contain Exceptions (such as Java), the detail element is typically used to represent an Exception object, as serialized by a XML to Java mapper. This provides a very convenient and natural mechanism to transfer thrown exceptions, without having to work at a SOAP level.
The widely used abstraction, together with the highly verbose and detailed depiction of service elements, combine to make WSDL documents sufficient enough to be the cornerstone of any web service development effort. Almost an entire client for a particular service could be automatically generated (by IDE or toolset) using just the WSDL. The disadvantage of this is the very steep learning curve, combined with massive redundancy in terms of features which are not used, as they introduce too much variation (and hence stifle integration). Often, the best way to write a WSDL is by example.
126
Web Services
http://schemas.xmlsoap.org/wsdl/ Like an XML Schema, a WSDL targets a particular namespace, which is used to uniquely identify the service elements (also using the targetNamespace attribute). Because of the extremely loose coupling between logical domains, a WSDL document contains five high-level sections:
1. 2. 3. 4. 5.
types. Contains or references all the structural types to which the XML SOAP messages conform. Typically based on XML Schema. messages. A listing of all the messages to be exchanged anywhere in the service, and mapping these to Schema types. portType. The abstract contract for the service. Indicates the operations (use cases) the service provides, and what messages are to be exchanged in order to make use of them. binding. Specifies how the abstract portType (and all the operations) is realised in a particular transport layer and messaging style. service. Indicates to clients where they can connect to start interacting with a particular binding of a particular portType, by means of a URL (in the case of HTTP).
127
Web Services
The following WSDL describes the contract for our service, which is both sufficient to generate the framework (skeelton) of our server implementation, as well most of the client. The document is followed by a brief guide tot he five sections. <?xml version="1.0" encoding="UTF-8"?> <wsdl:definitions xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ws="http://www.example.com/webservice" targetNamespace="http://www.example.com/webservice"> <!-- Define Types --> <wsdl:types> <xs:schema targetNamespace="http://www.example.com/webservice" elementFormDefault="qualified" attributeFormDefault="unqualified"> <!-- Search Request --> <xs:complexType name="InventorySearchRequest"> <xs:sequence> <xs:element name="keyWords" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="inventorySearchRequest" type="ws:InventorySearchRequest"/> <!-- Search Results --> <xs:complexType name="InventorySearchResponse"> <xs:sequence> <xs:element name="item" type="ws:InventoryItem"/> </xs:sequence> </xs:complexType> <xs:element name="inventorySearchResponse" type="ws:InventorySearchResponse"/> <!-- A single (abstract) inventory item --> <xs:complexType name="InventoryItem"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="price" type="xs:decimal"/> <xs:element name="noInStock" type="xs:int"/> </xs:sequence> </xs:complexType> <!-- Fault to indicate inventory server failure --> <xs:complexType name="ServerFailure"> <xs:sequence> <xs:element name="reason" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="serverFailure" type="ws:ServerFailure"/> </xs:schema> </wsdl:types> <!-- Abstract Messages Definition (and mapping to type system, e.g. XML Schema) --> <wsdl:message name="InventorySearchRequest"> <wsdl:part name="message" 128
Web Services
element="ws:inventorySearchRequest"/> </wsdl:message> <wsdl:message name="InventorySearchResponse"> <wsdl:part name="message" element="ws:inventorySearchResponse"/> </wsdl:message> <wsdl:message name="SearchFault"> <wsdl:part name="message" element="ws:serverFailure"/> </wsdl:message> <!-- Abstract Service Interface Definition --> <wsdl:portType name="InventoryServicePort"> <!-- Specify the use cases, and the messages used --> <wsdl:operation name="searchInventory"> <wsdl:input message="ws:InventorySearchRequest"/> <wsdl:output message="ws:InventorySearchResponse"/> <wsdl:fault name="fault" message="ws:SearchFault"/> </wsdl:operation> </wsdl:portType> <!-- Specify how our abstract service should be bound to the HTTP transport layer --> <wsdl:binding name="InventoryServiceHTTPBinding" type="ws:InventoryServicePort"> <!-- Specify document-style messaging --> <soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document" /> <!-- Specify messaging style for each use case --> <wsdl:operation name="searchInventory"> <soap:operation soapAction="" style="document"/> <wsdl:input> <soap:body use="literal" /> </wsdl:input> <wsdl:output> <soap:body use="literal" /> </wsdl:output> <wsdl:fault name="fault"> <soap:fault name="fault" use="literal" /> </wsdl:fault> </wsdl:operation> </wsdl:binding> <!-- Specify where the bound service can be physically found --> <wsdl:service name="ExampleInventoryService"> <wsdl:port name="InventoryServicePort" binding="ws:InventoryServiceHTTPBinding"> <soap:address location="http://localhost/inventory/service" /> </wsdl:port> </wsdl:service> </wsdl:definitions>
1.
types. In this simple example, we define the schema types in-line, without splitting between core (business) and service (message definitions) domains. Note that we define elements for the message types, as the WSDL will contain references to these.
129
Web Services
2. 3. 4. 5.
messages. We define three simple messages, a search request, search response, and an error. These are without context, and may be re-used by several operations. portType. An abstract, yet conceptually complete, definition of our service, the one use case it supports (searchInventory), and what messages are transferred (including faults) binding. A Document/Literal, SOAP binding to HTTP (the scheme which one should almost exclusively use). service. Indicates the web address where an HTTP binding of the inventory service can be found. Conversely, a web site on this server most likely provided a link to this WSDL in the first place.
The syntax for defining a message is as follows. The message-typing attributes (which may vary depending on the type system used) are element or type. <definitions .... > <message name="nmtoken"> * <part name="nmtoken" element="qname"? type="qname"?/> * </message> </definitions>
The message name attribute provides a unique name among all messages defined within the enclosing WSDL document. The part name attribute provides a unique name among all the parts of the enclosing message. (usually not relevant, since Document/Literal SOAP messages only have one part.
130
Web Services
<wsdl:definitions .... > <wsdl:portType name="nmtoken"> <wsdl:operation name="nmtoken" .... /> * </wsdl:portType> </wsdl:definitions>
The port type name attribute provides a unique name among all port types defined within in the enclosing WSDL document. An operation is named via the name attribute. WSDL has four transmission primitives (message exchange patterns) that an endpoint (web service) can support: One-way. The endpoint receives a message. Request-response. The endpoint receives a message, and sends a correlated message. Solicit-response. The endpoint sends a message, and receives a correlated message. Notification. The endpoint sends a message.
These are specified by altering the order and/or presence of the input, output and fault child elements.
The name attribute provides a unique name among all bindings defined within in the enclosing WSDL document. A binding references the portType that it binds using the type attribute. Binding relies heavily on extensibility elements to provide binding-specific information, e.g. SOAP/ HTTP.
Web Services
<wsdl:definitions .... > <wsdl:service name="nmtoken"> * <wsdl:port .... />* </wsdl:service> </wsdl:definitions> The name attribute provides a unique name among all services defined within in the enclosing WSDL document. Ports within a service have the following relationship: None of the ports communicate with each other (e.g. the output of one port is not the input of another). If a service has several ports that share a port type, but employ different bindings or addresses, the ports are alternatives. Each port provides semantically equivalent behavior (within the transport and message format limitations imposed by each binding). This allows a consumer of a WSDL document to choose particular port(s) to communicate with based on some criteria (protocol, distance, etc.). By examining it's ports, we can determine a service's port types. This allows a consumer of a WSDL document to determine if it wishes to communicate to a particular service based whether or not it supports several port types. This is useful if there is some implied relationship between the operations of the port types, and that the entire set of port types must be present in order to accomplish a particular task.
UDDI is one of the core Web Services standards. It is designed to be interrogated by using SOAP messages, and to provide access to WSDL documents describing the protocol bindings and message formats required to interact with the web services listed in its directory. UDDI was written in 1999/2000, at a time when the authors had a vision of a world in which consumers of Web Services would be linked up with providers through a dynamic brokerage system. In this vision, anyone needing a service such as credit card authentication, would go to their broker and select one supporting the desired SOAP service interface and meeting other criteria. In such a world, the broker would be critical for everyone. For the consumer, trusted brokers would only return reliable/trusted services, while for a service producer, getting a good placement in the brokerage would be critical for effective placement. This vision has not come to pass. Instead, services tend to write custom service endpoints with custom WSDL descriptions. Consumers then hard-code the URLs to their SOAP endpoints, working only with specific systems. Fault tolerance is only achieved through DNS and router tricks, not through dynamic selection of available service endpoints through the brokerage.
132
Web Services
The most common place that UDDI is currently employed is inside a company (controlled environment) where it is used to dynamically bind client systems to implementations. Much of the search metadata permitted in UDDI is not used for this relatively simple role. UDDI can probably be cited as an example of over-prescriptive design, in which a naming system was built around an entirely unproven business model the service broker.
Note
Since there is a distint overlap in the role of UDDI, and some parts of the (much larger and more ambitios) EbXML specification, and both are now under the management of the OASIS organisation, there are bound to be interesting developments as far as UDDI is concerned.
133