Sie sind auf Seite 1von 214

IBM Global Business Services

Welcome!

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML Introduction Day 1

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Day 1: Objectives

After completing this course, you should be able to: Define what is XML Identify the document type definitions and validity Describe attribute declarations in DTDs Explain entities and external DTD subsets Define embedding non XML data Describe XML namespaces and parsers Identify SAX parser

Explain XML schemas

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Housekeeping
Breaks Washrooms Transportation / parking No pagers or cell phones

Participation
Parking lot issues Questions

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Document type definition (DTD)


Data sent along with a DTD is known as valid XML.
In this case, an XML parser could check incoming data against the rules defined in the DTD to make sure the data was structured correctly.

Data sent without a DTD is known as well-formed XML.


Here an XML-based document instance, such as the hierarchically structured weather data shown, can be used to implicitly describe itself.

With both valid and well-formed XML, XML encoded data is self-describing since descriptive tags are intermixed with the data. DTDs help ensure that different people and programs can read each others files. The DTD defines exactly what is and is not allowed to appear inside a document.

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DTD for our simple XML


A DTD consists of a left square bracket character ([) followed by a series of markup declarations, followed by a right square bracket character (]).
<?xml version="1.0" standalone="yes" ?> <!DOCTYPE Simple [ <!ELEMENT Simple ANY> ] > <Simple> This is the most simplest XML document I have ever seen </Simple>

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DTD declarations
Element type declarations Attribute-list declarations Entity declarations Notation declarations

Processing declarations
Comments Parameter entity references

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Element type <!ELEMENT Name Contentspec>


<!ELEMENT Title (#PCDATA)>
Title permitted to have only char data

<!ELEMENT General ANY> You can use any key word which is legal under General Root tag <!ELEMENT Image EMPTY> Element must be empty. It can not have anything <!ELEMENT Book (Title, Author, Publisher) Book element must have the 3 elements in the same order <!ELEMENT Prerequisite ( BE | ME | MS) Prerequisite element can have either only one of the above <!ELEMENT Candidate (Qualification+, XMLExposure?, OtherSkills*)
Qualification could be one or more, XMLExposure is optional and OtherSkills could be zero or more.
9 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Example
<?xml version="1.0" standalone="yes" ?>

<!DOCTYPE Collection [ <!ELEMENT Collection (CD)+> <!ELEMENT CD (#PCDATA)> ] > <Collection> <CD>Devotional Songs by Pankaj</CD> <CD>Kajal by Pankaj</CD> <CD>Classical Songs by Pankaj</CD> </Collection>
10 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Hello XML with DTD


<?xml version=1.0 standalone=yes?> <! DOCTYPE GREETING [

<! ELEMENT GREETING (#PCDATA)>


]> <GREETING> Hello XML! </GREETING>

11

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Validating against a DTD


A valid document must meet the constraints specified by the DTD. Furthermore, its root element must be the one specified in the document type declaration. Valid Document: <GREETING> Good Morning </GREETING>

12

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Validating against a DTD (continued)


Documents that are not valid: <GREETING> <GREETING> Some Text </GREETING> </GREETING> <GREETING> <sometag> some text </sometag> <someEmptyTag/> </GREETING>

13

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Listing the elements


The first step to creating a DTD appropriate for a particular document is to understand the structure of the information that will be encoded using the elements defined in the DTD. <?xml version=1.0 standalone=yes ?> <Root> <Element 1> <Element 11> </Element 11> </Element 1> </Root>

14

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Element declarations
Each tag used in a valid XML document must be declared with an element declaration in the DTD. This specifies the name and possible contents of an element. This list of contents is also called the content specification. * - may occur more than once (Zero or More Children) ? may or may not occur (Zero or One Children) + - must occur at least once (One or More Children)

15

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Element declarations (continued)


<! ELEMENT SEASON ANY>
All element type declarations begin with <! ELEMENT>. They include the name of the element being declared followed by the content specification. The ANY keyword says that all possible elements as well as parsed character data can be children of the SEASON element.

<! ELEMENT YEAR (#PCDATA) >


This declaration says that a YEAR may contain only parsed character data, i.e., text thats not markup. It may not contain children of its own.

16

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

CDATA sections
May contain text, reserved characters and whitespace
Reserved characters need not be replaced by entity references

Not processed by XML parser Commonly used for scripting code (e.g., JavaScript) Begin with <![CDATA[ Terminate with ]]>

17

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

CDATA section: Example


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
18

<?xml version = "1.0"?> <!-- Fig. 5.7 : cdata.xml <!-- CDATA section containing C++ code --> -->

<book title = "C++ How to Program" edition = "3">

<sample> // C++ comment if ( this-&gt;getX() &lt; 5 &amp;&amp; value[ 0 ] != 3 ) cerr &lt;&lt; this-&gt;displayError(); </sample> <sample> <![CDATA[

Entity references required if not in CDATA section

XML does not process CDATA section

// C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); Note the simplicity offered ]]> by CDATA section </sample> C++ How to Program by Deitel &amp; Deitel </book>
XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Using a CDATA section

19

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Sharing common DTDs among documents


The real power of XML comes from common DTDs that can be shared among many documents written by different people. If the DTD is not directly included in the document but is linked in from an external source, changes made to the DTD automatically propagate to all documents using that DTD. On the other hand, backward compatibility is not guaranteed when a DTD is modified.

<!DOCTYPE root_element_name SYSTEM DTD_URL>

Example:
<!DOCTYPE SEASON SYSTEM http://ibm/xml/dtds/sample.dtd>

20

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Public DTDs
The SYSTEM keyword is intended for private DTDs used by a single author or group. DTDs designed for writers outside the creating organization use the PUBLIC keyword instead of SYSTEM keyword. <! DOCTYPE root_element_name PUBLIC DTD_name DTD_URL >

Example:
<! DOCTYPE HTML PUBLIC -//W3C//DTD HTML //EN >

21

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

22

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

What is an Attribute?
Attributes are intended for extra information associated with an element (like an ID number) used only by programs that read and write the file, and not for the content of the element thats read and written by humans. The Attribute contains information about the content of the element, rather than the content itself. Example: <GREETING LANGUAGE=English> Hello XML! <MOVIE SOURCE=WavingHand.mov /> </GREETING> Attribute ( key = value)

23

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Examples
<RECTANGLE WIDTH=30 HEIGHT=45 /> <SCRIPT LANGUAGE=javascript ENCODING=8859_1>

.
</SCRIPT> Note: End Tags cannot possess Attributes. <SCRIPT>

</SCRIPT LANGUAGE=javascript ENCODING=8859_1>


The above mentioned syntax is illegal.

24

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring Attributes in DTDs


In a valid XML document you must also explicitly declare all attributes that you might intend to use with the documents elements. You define this by using a type of DTD markup known as an attribute-list declaration. This declaration does the following
Defines the names of the attributes associated with that element Specifies the data type of each attribute Specifies for each attribute whether that attribute is required.

25

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Attribute list Declaration


Attribute-list declaration has the following form:
<!ATTLIST Element_name Attribute_name Type Default_value> Element_name - is the name of the element associated with this attribute Attribute_name is the name of the Attribute. Type is the kind of Attribute. Default_value is the value the attribute takes on if no value is specified for the attribute.

26

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Attribute types
Type CDATA Enumerated ID IDREF IDREFS ENTITY ENTITIES
27 XML Day 1

Meaning Character Data text that is not markup A list of possible values from which exactly one will be chosen A unique name not shared by any other ID type attribute in the document The value of an ID type attribute of an element in the document Multiple IDs of elements separated by whitespace The name of an entity declared in the DTD The name of multiple entities declared in the DTD, separated by whitespace.
Copyright IBM Corporation 2009

IBM Global Business Services

Specifying default values for Attributes


Instead of specifying an explicit default attribute value, an attribute declaration can be provided a value, allow the value to be omitted completely, or even always use the default values. These requirements are specified with the three keywords
#REQUIRED #IMPLIED #FIXED

28

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

#REQUIRED
Instead of providing default values for the attributes, if you want to force anyone posting a document on the intranet to identify themselves, then we go for #REQUIRED. Example:
<!ELEMENT AUTHOR EMPTY>

<!ATTLIST AUTHOR NAME CDATA #REQUIRED>


<!ATTLIST AUTHOR EMAIL CDATA #REQUIRED> <!ATTLIST AUTHOR EXTENSION CDATA #REQUIRED>

29

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

#IMPLIED
Sometimes you may not have a good option for a default value, but you do not want to require the author of the document to include a value, either. For example, some of the people posting documents to your intranet are offsite freelancers who have email addresses but lack phone extensions. Therefore, you dont want to require them to include an extension attribute in their <AUTHOR/> tags.
<AUTHOR NAME=Harish EMAIL=harish.modadugu@in.ibm.com/> <!ELEMENT AUTHOR EMPTY>

<!ATTLIST AUTHOR NAME CDATA #REQUIRED>


<!ATTLIST AUTHOR EMAIL CDATA #REQUIRED> <!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED>

30

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

#FIXED
Used to provide a default value for the attribute without allowing the author to change it. For Example:
<AUTHOR NAME=Harish COMPANY=IBM EMAIL=hmodadug@in.ibm.com EXTENSION=57536 /> <!ELEMENT AUTHOR EMPTY> <!ATTLIST AUTHOR NAME CDATA #REQUIRED>

<!ATTLIST AUTHOR EMAIL CDATA #REQUIRED>


<!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED> <!ATTLIST AUTHOR COMPANY CDATA #FIXED IBM>

31

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Some examples
<!ATTLIST Film Class CDATA >
Simple form of defining an attribute. Attribute contains characters

<!ATTLIST Film Year CDATA #REQUIRED>


You must specify an attribute value

<!ATTLIST Film Color CDATA #IMPLIED>


You can either include or omit the attribute, no default value supplied

<!ATTLIST Film Language CDATA #FIXED "Hindi>


You can either include or omit. If you omit, the processor will use a special default value. If you include, you must specify.

32

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example
<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE VideoLibrary [ <!ELEMENT VideoLibrary (Film, Class, (Hero | Director | Heroine)+)> <!ATTLIST Film Color CDATA #IMPLIED Language CDATA #FIXED "Hindi" Year CDATA #REQUIRED> <!ELEMENT Film (#PCDATA)> <!ELEMENT Class (#PCDATA)>

<VideoLibrary>
<Film Year = "1994"> Hum Aapke Hain Kaun </Film> <Class>Love Story </Class> <Heroine>Madhuri Dixit</Heroine> </VideoLibrary>

<!ELEMENT Hero (#PCDATA)>


<!ELEMENT Heroine (#PCDATA)> <!ELEMENT Director (#PCDATA)> ]

>
33 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Predefined Attributes
XML has two predefined Attributes. They are identified by a name that begins with xml:.
xml:space describes how whitespace is treated in the element. xml:lang describes the language in which the element is written.

34

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

35

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

What is an Entity?
The storage units that contain particular parts of an XML document are called entities. An entity may consist of a file, a database record, or any other item that contains data. The primary purpose of an entity is to hold content: well-formed, other forms of text, or binary data. A CSS style sheet is not an entity. Every XML has at least one entity.

36

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Kinds of Entities

There are two kinds of entities. Internal Entity


They are defined completely within the document entity. Since the document itself is one such entity, all XML documents have at least one internal entity.

External Entity
They draw their content from another source located via a URL. In HTML, an IMG element represents an external entity while the document itself contained between the <HTML> and </HTML> tags is an internal entity.

37

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Internal general Entities


An <!ENTITY> tag in the DTD defines the abbreviation and the text the abbreviation stands for. Suppose, Instead of typing the same footer at the bottom of each page, we can simply define that text as footer entity in the DTD and then type &footer; at the bottom of each page. Suppose if you decide to change the footer block, you only need to make the change once in the DTD instead of on every page that shares the footer. General entity references begin with an ampersand (&) and end with a semicolon (;), with the entitys name between these two characters. For instance, &lt; is a general entity reference for the less than sign (<). The name of the entity is lt.

38

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example
<?xml version=1.0 standalone=yes?> <!DOCTYPE DOCUMENT [ <!ENTITY ELTP ENTRY LEVEL TRAINING PROGRAM> <!ELEMENT DOCUMENT (TITLE, COURSE)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COURSE (COURSE_CODE, DATE)> <!ELEMENT COURSE_CODE (#PCDATA)> <!ELEMENT DATE (#PCDATA)>

]>
<DOCUMENT> <TITLE> &ELTP; </TITLE> <COURSE> <COURSE_CODE> HYD_010 </COURSE_CODE>

<DATE> JULY 23, 2007 </DATE>


</COURSE> </DOCUMENT>
39 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

External general Entities


External entities are data outside the main file containing the root element / document entity. With XML, we can use an external general entity reference to embed one document in another.
<!ENTITY name SYSTEM URI>

40

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example
An XML signature file
<?xml version=1.0?>

External general entity reference


<?xml version=1.0 standalone=no?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (TITLE, SIGNATURE)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT SIGNATURE (NAME, EMPNO)> <!ELEMENT NAME (#PCDATA)>

<SIGNATURE>
<NAME> HARISH </NAME> <EMPNO> 034518 </EMPNO> </SIGNATURE>

<!ELEMENT EMPNO (#PCDATA)>


<!ENTITY SIG SYSTEM signature.xml> ]> <DOCUMENT> <TITLE> ELTP </TITLE> &SIG;</DOCUMENT>
41 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Parameter Entities
General entities become part of the document, not the DTD. They can be used in the DTD but only in places where they become part of the document body. Parameter entity references differs from general entity references in the following:
Parameter entity references begin with a percent sign (%) rather than an ampersand (&).

Parameter entity references can only appear in the DTD, not the document content.

42

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Syntax
<!ENTITY % name replacement text> Example:
<!ENTITY % IBM International Business Machines> <!ENTITY ACRON IBM stands for %IBM;>

Note: Parameter entity references must be declared before theyre used.


They work only with External DTDs.

43

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

External parameter Entities


External parameter Entities enable us to build large DTDs from smaller ones. Although cycles are prohibited DTD 1 may not refer to DTD 2 if DTD 2 refers to DTD 1. Such nested DTDs can become large and complex. Breaking a DTD into smaller, more manageable chunks makes the DTD easier to analyze. Both the document and its DTD become much easier to understand when split into separate files.

44

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example
Sign.dtd <!ELEMENT EMP (EMPNO, NAME)> <!ELEMENT EMPNO (#PCDATA)>

Emp.xml
<?xml version=1.0 standalone=no?> <!DOCTYPE EMP SYSTEM Sign.dtd> <EMP> <EMPNO> 12121 </EMPNO> <NAME> Akash </NAME> </EMP>

<!ELEMENT NAME (#PCDATA)>

45

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Entities and DTDs in well-formed documents


Internal Entities:
The primary advantage of using a DTD in invalid well-formed XML documents is that we may use internal general entity references other than the 5 pre-defined references &gt;, &lt;, &quot;, &apos; and &amp;.

We simply declare the entities we want as normal; then use them in our documents.

46

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DTD yielding a well-formed yet invalid document


<?xml version=1.0 standalone=yes?> <!DOCTYPE DOCUMENT [ <!ENTITY XML eXtensible Markup Language> ]> <DOCUMENT> <TITLE> &XML; </TITLE>

<EMP>
<EMPNO> 112233 </EMPNO> <NAME> Abhishek </NAME> </EMP>

</DOCUMENT>

47

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

External Entities
Sign.dtd <?xml version=1.0?> <EMP> A File that uses Sign.dtd
<?xml version=1.0 standalone=no?> <DOCTYPE DOCUMENT [ <!ENTITY % EMPS SYSTEM Sign.dtd> ]> <DOCUMENT> <TITLE> XML </TITLE>

<EMPNO> 112233 </EMPNO>


<NAME> HARISH </NAME> </EMP>

&EMPS;
</DOCUMENT>

48

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

49

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Notations
The first problem that we encounter when working with non-XML data in an XML document is identifying the format of the data and telling the XML application how to read and display the non-XML data. For ex., it would be inappropriate to try to draw an MP3 sound file on the screen. Furthermore, no application understands all possible file formats.

Ideally, we want documents to tell the application the format of the external entity so you dont have to rely on the application recognizing the file type by a magic number or a potentially unreliable file formats.

50

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Notations

It is used to provide a fixed and mandatory value to an attribute. The value is declared in the notation which can have a path using SYSTEM or a string using PUBLIC.

51

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Notations

<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT IMAGES (IMAGE+) > <!ELEMENT IMAGE (#PCDATA) > <!NOTATION iPATH SYSTEM "C:\windows\a.bmp" >

<!ATTLIST IMAGE SRC NOTATION (iPATH) #REQUIRED>

52

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Using notations
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE IMAGES SYSTEM "C:\N.dtd">


<IMAGES> <IMAGE SRC="iPATH">abc</IMAGE> </IMAGES>

Note: Because XML processor cannot parse bmp files, we need to use an external program for displaying or editing them. When the parser encounters a usage of the notation line name, it will simply provide the path of the application.

53

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Conditional sections
Include declarations
Keyword INCLUDE

Exclude declarations
Keyword IGNORE

Often used with entities


Parameter entities

Preceded by percent character (%)


Creates entities specific to DTD Can be used only inside DTD in which they are declared

54

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Entities and strings


Entities accept and reject represent strings INCLUDE and IGNORE, respectively
1 <!-- Fig.: conditional.dtd --> 2 <!-- DTD for conditional section example --> 3 Entities accept and reject 4 <!ENTITY % reject "IGNORE"> represent strings INCLUDE and IGNORE, respectively 5 <!ENTITY % accept "INCLUDE"> 6 Include this element 7 <![ %accept; [ message declaration 8 <!ELEMENT message ( approved, signature )> 9 ]]> Exclude this element 10 message declaration 11 <![ %reject; [

55

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Entities and strings (continued)


12 <!ELEMENT message ( approved, reason, signature )>

13 ]]> 14 15 <!ATTLIST <!ELEMENT approved flag EMPTY> 16 ( true | false ) "false"> 17 18 <!ELEMENT reason ( #PCDATA )>
56

19 <!ELEMENT signature ( #PCDATA )>


XML Day 1

56

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example conditional section (continued)


1 <?xml version = "1.0" standalone = "no"?> 2

3 <!-- Fig.: conditional.xml -->


4 <!-- Using conditional sections --> 5 6 <!DOCTYPE message SYSTEM "conditional.dtd"> 7 8 <message> 9 <approved flag = "true"/>

10

<signature>Chairman</signature>

11 </message>
57 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

XML document that conforms to conditional.dtd.

58

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Processing instructions
A processing instruction is a string of text between <? And ?> marks. The only required syntax for the text inside the processing instruction is that it must begin with an XML name followed by white space followed by data. Note: Processing Instructions may be placed almost anywhere in an XML document except inside a tag or a CDATA section.

59

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Processing instructions (continued)


Special instructions to the XML consumer application Example:
<?xml version="version" [standalone="DTDflag"] ?>

version
A string in the form n.n specifying the XML level of the file. Use the value 1.0.

DTDflag Optional.
A Boolean value indicating whether the XML file includes a reference to an external Document Type Definition (DTD). Script component XML files do not include such a reference, so the value for this attribute is always "yes."

60

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

61

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Conflicting issues
Namespaces ensure that element names do not conflict, and clarify who defined which term. Namespaces do not give instructions on how to process the elements. Readers still need to know what the elements mean and decide how to process them.

Namespaces simply keep the names straight.

62

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML Namespaces
Naming collisions
Two different elements have same name

<subject>Math</subject>
<subject>Thrombosis</subject>

Namespaces
Differentiate elements that have same name <school:subject>Math</school:subject> <medical:subject>Thrombosis</medical:subject> school and medical are namespace prefixes Prepended to elements and attribute names Tied to uniform resource identifier (URI) Series of characters for differentiating names
63 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

XML Namespaces (continued)

Creating namespaces Use xmlns keyword


xmlns:text = urn:deitel:textInfo
xmlns:image = urn:deitel:imageInfo Creates two namespace prefixes text and image urn:deitel:textInfo is URI for prefix text

urn:deitel:imageInfo is URI for prefix image

Default namespaces
Child elements of this namespace do not need prefix xmlns = urn:deitel:textInfo

64

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Introduction to XML parsers


XML tags are custom-defined thus enabling the generation of domain specific markup languages for diverse fields such as vector graphics, mathematics, music and technical documentation. A parser is a piece of software that makes sure the XML document is valid or at least well-formed. We use an XML parser to dissect XML documents and gain access to the data in them.

XML Parsers are software packages that comes as part of an application or as part of our own programs.
There are two types of XML Parsers.
DOM Parser.

SAX Parser.

65

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML Document Object Model (DOM)


W3C standard recommendation Build tree structure in memory for XML documents DOM-based parsers parse these structures
Exist in several languages (Java, C, C++, Python, Perl, etc.)

66

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DOM (continued)

DOM tree Each node represents an element, attribute, etc.


<?xml version = "1.0"?> <message from = "Paul" to = "Tem"> <body>Hi, Tim!</body> </message>

Node created for element message


Element message has child node for body element
Element body has child node for text "Hi, Tim!" Attributes from and to also have nodes in tree

67

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DOM implementations

DOM-based parsers
Microsofts msxml

Sun Microsystems JAXP

68

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Some DOM-based parsers


Parser JAXP Description Sun Microsystems Java API for XML Parsing (JAXP) is available at no charge from java.sun.com/xml. IBMs XML Parser for Java (XML4J) is available at no charge from www.alphaworks.ibm.com/tech/xml4j. Apaches Xerces Java Parser is available at no charge from xml.apache.org/xerces. Microsofts XML parser (msxml) version 2.0 is built-into Internet Explorer 5.5. Version 3.0 is also available at no charge from msdn.microsoft.com/xml. 4DOM is a parser for the Python programming language and is available at no charge from fourthought.com/4Suite/4DOM. XML::DOM is a Perl module that we use in Chapter 17 to manipulate XML documents using Perl. For additional information, visit www4.ibm.com/software/developer/library/xm l-perl2.

XML4J

Xerces

msxml

4DOM

XML::DOM

69

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

DOM and JavaScript

We use JavaScript and MSXML parser


XML document marks up article Use DOM API to display documents element names/values

70

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example
1 <?xml version = "1.0"?> 2 3 <!-- Fig.: article.xml --> 4 <!-- Article formatted with XML --> 5 6 <article> 7 8 <title>Simple XML</title> 9 10 <date>December 6, 2000</date> 11 12 <author> 13 <fname>Tem</fname> 14 <lname>Nieto</lname> Article marked up 15 </author> 16 with XML tags 17 <summary>XML is pretty easy.</summary> 18 19 <content>Once you have mastered HTML, XML is easily 20 learned. You must remember that XML is not for 21 displaying information but for managing 22 </content> information. 23 24 </article>
71 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Example - DOM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
72

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html>

<!-- Fig.: DOMExample.html --> <!-- DOM with JavaScript


<head> <title>A DOM Example</title> </head> <body>

--> Element script allows for including scripting code Instantiate Microsoft XML DOM object

<script type = "text/javascript" language = "JavaScript"> var xmlDocument = new ActiveXObject( "Microsoft.XMLDOM" ); Load article.xml into memory; msxml parses article.xml and xmlDocument.load( "article.xml" ); stores it as tree structure
XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Example (continued)
21 22 23 24 25 26 27 28 29 30 document.writeln( "<br>The following are its child elements:" ); document.writeln( // get the root element var element = xmlDocument.documentElement; Assign article as root element Place root elements name in element strong and write it to browser

"<p>Here is the root node of the document:" ); document.writeln( "<strong>" + element.nodeName + "</strong>" );

31
32
73

document.writeln( "</p><ul>" );

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example (continued)
33 34 35 36 37 38 39 40 41 42 43 44 45 46
74 XML Day 1
Copyright IBM Corporation 2009

// traverse all child nodes of root element for ( i = 0; i < element.childNodes.length; i++ ) { var curNode = element.childNodes.item( i ); Assign index to each child node of root node // print node name of each child element document.writeln( "<li><strong>" + curNode.nodeName + "</strong></li>" ); }

document.writeln( "</ul>" );

Retrieve root nodes first child node (title)

// get the first child node of root element var currentNode = element.firstChild;

IBM Global Business Services

Example (continued)
47 48 49 50 51 52 53 54 55 56 document.writeln( "<strong>" + nextSib.nodeName + "</strong>." ); document.writeln( "<p>The first child of root node is:" ); document.writeln( "<strong>" + currentNode.nodeName + "</strong>" ); document.writeln( "<br>whose next sibling is:" Siblings ); are nodes at same level in document (e.g., title, date, author, summary and content) // get the next sibling of first child Get first childs next sibling (date) var nextSib = currentNode.nextSibling;

75

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Example (continued)
57 58 59 60 61 62 63 64 65 // print the text value of the sibling document.writeln( "<em>" + value.nodeValue + "</em>" ); document.writeln( "<br>Parent node of " ); document.writeln( "<string>" + nextSib.nodeName var value = nextSib.firstChild; document.writeln( "<br>Value of <strong>" + nextSib.nodeName + "</strong> element is:" );

Get first child of date (December 6, 2000)

66
67 68 69

+ "</strong> is:" );
document.writeln( "<strong>" + nextSib.parentNode.nodeName + "</strong>.</p>" ); Get parent of date (article)

70 </script> 71 72 </body> 73 </html>


76 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Traversing article.xml with JavaScript

77

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

78

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML schema
To define the structure of an XML document. defines the list of elements and attributes than can be used in an XML Document It also specifies the order in which these elements appear in the XML document and their datatypes

Microsoft has developed this XML Schema Definition (XSD) language


It has become w3c recommendation for creating valid XML documents.

79

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Advantages of XML schemas over DTDs


Both are very similar and they are used to define the structure of an XML document.
Syntax for defining an XSD is the same as the syntax of XML document. It is easier to learn syntax It has more control over the type of the data

It enables the user to create own data types


It allows user to specify restrictions on data

80

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML schema - Datatypes


Primitive : String, decimal, float, boolean Derived : Integer, long, positiveInteger Atomic : List : These are datatypes that cannot be broken down into smaller units. These can be primitive or derived. These are derived datatypes that contain a set of values of an atomic data type.
- Example: pointlist 5,25,75

81

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML schema Custom defined datatypes


Simple data type
A data type that contains only value

Complex data type


A data type that contains child elements, attributes and also the mixed content

82

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

XML schema custom data types


The elements PRODUCTNAME, DESCRIPTION, PRICE and QUANTITY are simple type elements, which do not contain any child elements or attributes. The elements only contain textual value. The elements PRODUCTDATA and PRODUCT are complex type elements that contain child elements, attributes and mixed content.

83

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring a simple type element


Syntax
<xs:element name=element-name type=data type minOccurs=nonNegativeInteger maxOccurs=nonNegativeInteger>

Example:
<xs:element name=PRODUCTNAME type=xs:string/> <xs:element name=PRICE type=xs:positiveInteger/>

84

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring a simple type based on existing simple datatype


<xs:simpleType name=phoneno> <xs:restriction base=xs:string> <xs:length value=10/> <xs:pattern value=\d{3}-\d{3}-\d{4}/>

</xs:restriction>
</xs:simpleType>

It defines a simple datatype called phoneno. The string value can be 10 character long and must match the pattern ddd-ddd-dddd.

85

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Various options of string data type


Length minLength maxLength pattern

enumeration

86

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Creating user-defined simple datatype

num is the user-defined simple datatype <xs:simpleType name="num"> <xs:restriction base="xs:positiveInteger"> <xs:maxInclusive value="400"/>

<xs:minInclusive value="10" />


</xs:restriction> </xs:simpleType>

87

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Associating an element with a simple data type


<xs:element name=EMPNAME type=xs:string/> <xs:element name=EMPPHONE type=phoneno/>

88

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring a complex type element


Syntax <xs:complexType name=data type name> . . </xs:complexType> Example:

<xs:complexType name=prddata> <!---referenced by name prddata-- >


<xs:element name=PRODUCTNAME type=xs:string/> <xs:element name=DESCRIPTION type=xs:string/>

</xs:complexType>

89

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Namespace used in XML-schema


A namespace is a string that is used to refer to URI, such as http://www.microsoft.com and http://www.w3.org/2001/XMLSchema Example:
<xsd:schema xmlns:xs=http://www.w3.ord/2001/XMLSchema> It specifies that we have decided to use the prefix xs to identify the elements that are defined in XMLSchema.

90

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring an attribute in an XML schema


Syntax :
<attribute name=attribute-name ref=attributename type=datatypename use=value value=value> </attribute>

Name : Specify the name of a user-defined attribute Ref: used to reference a user-defined attribute declared Type: It takes a value, which specified the datatype
example: Type=xs:string or type=myphone

91

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Declaring an attribute in an XML schema


Use : specifies the way in which an attribute can be used in XML document
Optional <xs:attribute name=baseprice type=xs:integer use=optional />

default :
<xs:attribute name=baseprice type=xs:integer use=default value=25 />

required
<xs:attribute name=baseprice type=xs:integer use=required />

Fixed
<xs:attribute name=baseprice type=xs:integer use=fixed value=600 />

92

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Global Attributes
Global attributes are attributes that are declared outside all elements declarations. They facilitate reusability of attributes We need to use <xs:schema> element as the parent element
<xs:schema>

<xs:attribute name=NAME type=xs:string/>


</xs:schema>

Global Attributes cannot include the use attribute

93

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Global Attributes (continued)


<xs:element name=Book type=booktype/> <xs:complexType name=booktype> . . <xs:attribute ref=NAME/> </xs:complexType>

<xs:element name=Author type=Authortype/>


<xs:complexType name=Authortype> . .

<xs:attribute ref=NAME/>
</xs:complexType>
94 XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Mechanism to restrict the values


<xs:attribute name=PRODID type=pID use=required /> <xs:simpleType name=pID> <xs:restriction base=xs:string> <xs:pattern value=[P][1]\d{3}/> </xs:restriction> </xs:simpleType>

95

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Grouping elements and attributes


It allows us to combine the related elements and attributes into groups. This feature enables us to perform the following tasks
Create a reusable group of elements and attributes Select a single element from a group Specify the sequence of elements

96

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Grouping elements and attributes


The number of elements that can be used to group user-defined elements and attributes
Sequence Group Choice

All
attributeGroup

97

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Grouping elements
<xs:group name=empname>
<xs:sequence>
<xs:element name=FIRSTNAME type=xs:string/>
<xs:element name=LASTNAME type=xs:string/>

</xs:sequence>
<xs:element name=EMPLOYEE type=emptype/> <xs:complexType name=emptype> <xs:sequence> <xs:group ref=empname/> <xs:element name=ADDRESS type=xs:string/>

</xs:sequence>

</xs:complexType>

</xs:schema>

98

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

99

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Introduction

XML Path Language (XPath) Syntax for locating information in XML document
e.g., attribute values

String-based language of expressions


Not structural language like XML

Used by other XML technologies


XSLT XPointer

100

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Nodes in an XML document


Tree structure with nodes Each node represents part of XML document.

Seven types of nodes are:


Root Element Attribute

Text
Comment Processing instruction Namespace

Attributes and namespaces are not children of their parent node but they describe their parent node.
101 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Example for nodes


1 <?xml version = "1.0"?> Root node 2 3 <!-- Fig.: simple.xml --> Comment nodes 4 <!-- Simple XML document --> 5 6 <book title = "C++ How to Program" edition = "3"> 7 Attribute nodes 8 <sample> 9 <![CDATA[ 10 Element nodes 11 // C++ comment 12 if ( this->getX() < 5 && value[ 0 ] != 3 ) 13 cerr << this->displayError(); Text nodes 14 ]]> 15 </sample> 16 17 C++ How to Program by Deitel &amp; Deitel 18 </book>
102 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Xpath tree for figure simple.Xml


Root Comment Fig.: simple.xml Comment Simple XML document Element book Attribute Title C++ How to Program Attribute edition 3 Element sample Text // C++ comment if (this -> getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); Text C++ How to Program by Deitel & Deitel
103 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Example - Nodes
1 <?xml version = "1.0"?> Root node 2 3 <!-- Fig.: simple2.xml --> Comment nodes 4 <!-- Processing instructions and namespacess --> 5 6 <html xmlns = "http://www.w3.org/TR/REC-html40"> 7 8 <head> 9 <title>Processing Instruction and Namespace Nodes</title> 10 </head> Namespace nodes 11 Processing instruction node 12 <?deitelprocessor example = "fig11_03.xml"?> 13 Element nodes 14 <body> 15 Text nodes 16 <deitel:book deitel:edition = "1" 17 xmlns:deitel = "http://www.deitel.com/xmlhtp1"> 18 <deitel:title>XML How to Program</deitel:title> 19 </deitel:book> 20 Attribute nodes 21 </body> 22 23 </html>
104 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Tree diagram of an XML document with a processinginstruction node


Root Comment Fig.: simple2.xml Comment Processing instructions and namespaces Element html

Namespace http://www.w3.org/TR/REC-html40
Element head Element title Text Processing instructions and Namespace Nodes
105 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Tree diagram of an XML document with a processinginstruction node (continued)

Processing Instruction deitelprocessor example = "fig.xml" Element body Element book Attribute edition 1 Namespace http://www.deitel.com/xmlhtp1 Element title Text XML How to Program
106 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

XPath node types


Node Type root string-value expanded-name Description Represents the root of an XML document. This node exists only at the top of the tree and may contain element, comment or processorinstruction children. Represents an XML element and may contain element, text, comment or processorinstruction children. Represents an attribute of an element.

Determined by None. concatenating the string-values of all textnode descendents in document order. Determined by The element tag, concatenating the including the namespace string-values of all text- prefix (if applicable). node descendents in document order. The normalized value of the attribute. The name of the attribute, including the namespace prefix (if applicable).

element

attribute

107

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XPath node types (continued)


Node Type text string-value The character data contained in the text node. expanded-name Description None. Represents the character data content of an element.

comment

The content of the comment None. (not including <!-- and -->).

Represents an XML comment.

processing instruction namespace

The part of the processing instruction that follows the target and any whitespace.

The target of the processing instruction.

Represents an XML processing instruction. Represents an XML namespace.

The URI of the namespace. The namespace prefix.

108

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XPath example
XML document: <?xml version="1.0" encoding="ISO-8859-1"?>

<catalog>
<cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd>

109

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XPath example
<cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd>

<cd country="USA">
<title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
110 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

XPath expressions
To select the ROOT element catalog:
/catalog

To select all the cd elements of the catalog element:


/catalog/cd

To select all the price elements of all the cd elements of the catalog element:
/catalog/cd/price

Note: If the path starts with a slash (/) it represents an absolute path to an element
To select all the cd elements that have a price element with a value larger than 10.80:
/catalog/cd [price>10.80]

111

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Locating nodes
XML documents can be represented as a tree view of nodes XPath uses a pattern expression to identify nodes in an XML document. An XPath pattern is a slash-separated list of child element names that describe a path through the XML document. The pattern "selects" elements that match the path. The following XPath expression selects all the price elements of all the cd elements of the catalog element:
/catalog/cd/price

If the path starts with a slash ( / ) it represents an absolute path to an element.


112 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Locating nodes (continued)


If the path starts with two slashes ( // ) then all elements in the document that fulfill the criteria will be selected (even if they are at different levels in the XML tree) To select all the cd elements in the document:
//cd

113

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting unknown elements


Wildcards ( * ) can be used to select unknown XML elements. To select all the child elements of all the cd elements of the catalog element:
/catalog/cd/*

To select all the price elements that are grandchild elements of the catalog element:
/catalog/*/price

114

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting unknown elements (continued)


The following XPath expression selects all price elements which have 2 ancestors:
/*/*/price

The following XPath expression selects all elements in the document:


//*

115

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting branches
Square brackets in an XPath expression can specify an element further. To select the first cd child element of the catalog element:
/catalog/cd[1]

To select the last cd child element of the catalog element (Note: There is no
function named first()):

/catalog/cd[last()]

116

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting branches (continued)


To select all the cd elements of the catalog element that have a price element:
/catalog/cd[price]

To select all the cd elements of the catalog element that have a price element with a value of 10.90:
/catalog/cd[price=10.90]

To select all the price elements of all the cd elements of the catalog element that have a price element with a value > 10.90:
/catalog/cd[price>10.90]/price

117

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting Attributes
In XPath all attributes are specified by the @ prefix. To select all attributes named country:
//@country

To select all cd elements which have an attribute named country:


//cd[@country]

118

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Selecting Attributes (continued)


To select all cd elements which have any attribute:
//cd[@*]

To select all cd elements which have an attribute named country with a value of 'UK':
//cd[@country='UK']

119

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Course map
Module 1: Document type definitions and validity Module 2: Attribute declarations in DTDs Module 3: Entities and external DTD subsets

Module 4: Embedding non XML data


Module 5: XML namespaces and parsers Module 6: XML schema Module 7: XPath Module 8: XSL

120

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Need for XSL


XSL stands for EXtensible Stylesheet Language. The World Wide Web Consortium (W3C) started to develop XSL because there was a need for an XML-based Stylesheet Language. CSS = HTML Style Sheets XSL = XML Style Sheets

121

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

What is XSLT?
XSLT stands for XSL transformations XSLT is the most important part of XSL XSLT transforms an XML document into another XML document XSLT uses xpath to navigate in XML documents

XSLT is a W3C recommendation

122

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSL - More than a style sheet language

XSL consists of three parts: XSLT - a language for transforming XML documents XPath - a language for navigating in XML documents XSL-FO - a language for formatting XML documents

123

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSL

124

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Presenting XML
There are two style sheet languages available for use with XML in Internet Explorer
Cascading Style Sheets (CSS) Extensible Style Language (XSL)

An important point to consider in choosing a style sheet language for a particular document is whether the structure of the XML document is suitable for display. With CSS, the structure of the XML content must be virtually identical to the structure of the presentation. Since one of the goals of XML is a complete separation of content from display, many XML documents are difficult to display as you might wish using CSS.

125

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Difference between CSS and XSL?


XML does not use predefined tags (we can use any tag-names we like), and the meaning of these tags are not well understood. A <table> element could mean an HTML table, a piece of furniture, or something else - and a browser does not know how to display it. XSL describes how the XML document should be displayed!

126

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Books.XML
<?xml version="1.0"?> <!--DOCTYPE books SYSTEM "books.dtd"--> <?xml:stylesheet href="books.css" type="text/css"?> <books> <book> <title>Professional Active Server Pages 3.0</title> <authors> <author>Richard Anderson</author> <author>Chris Blexrud</author> <author>Andrea Chiarelli</author> <author>Dan Denault</author> </authors><price>us="$59.99"</price> </book></books>
127 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Books.CSS
authors
{ display:block; fontfamily:Arial,Helvetica; font-style:italic; font-size:10pt; color:#990099;

price
{ display:block; border:2px solid black; padding:1em; background-color:#888833; color:#FFFFDD;

text-align:left;
}

font-weight:bold
margin-bottom: .4em; }

128

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSL advantages
More sophisticated layout using HTML tables.

Data items appearing more than once in the style sheet


Access to information stored in attribute values Reordering of items Dynamic display behaviors not easily possible through CSS or modifying of the source

129

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSL elements
There are various XSL Elements that can be used for applying the styles to the XML Document. Here is the list of few: xsl:for-each xsl:value-of xsl:if xsl:sort xsl:choose etc.

130

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Interview.XML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href=Interview.xsl"?> <Interview xmlns:dt="urn:schemasmicrosoft-com:datatypes"> <Candidate> <Name>Mahesh</Name> <Project>Procter & Gamble</Project> <Score dt:dt="number">88</Score> </Candidate>
<Candidate>
<Name>Vishnu</Name> <Project>Banking</Project> <Score dt:dt="number">99</Score>

</Candidate>
<Candidate> <Name>Sridhar</Name> <Project>Telecom</Project>

<Score dt:dt="number">100</Score>
</Candidate> </Interview>

131

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Interview.XSL
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

<xsl:template match="/">
<HTML> <BODY> <TABLE BORDER="2"> <TR> <TD>Name</TD> <TD>Project</TD> <TD>Score</TD> </TR>

132

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Interview.XSL (continued)
<xsl:for-each select="Interview/Candidate"> <TR>

<TD><xsl:value-of select="Name"/></TD>
<TD><xsl:value-of select="Project"/></TD> <TD><xsl:value-of select="Score"/></TD> </TR> </xsl:for-each> </TABLE> </BODY>

</HTML>
</xsl:template> </xsl:stylesheet>
133 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Xsl:for-each
<xsl:for-each order-by="sort-criteria-list select="pattern" > order-by
Sort criteria in a semicolon-separated list. When the first sort results in two equal items, the second sort criterion is checked, and so on. The first non-white-space character in each sort criterion indicates whether the sort is ascending (optional +) or descending (-). The sort criterion is expressed as an XSL pattern, relative to the pattern described in the select attribute.

select
XSL pattern query evaluated the current context to determine the set of nodes to iterate over. The default value "node()" indicates selection of all children of the current node.

134

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Xsl:for-each (continued)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/"> <HTML> <BODY>

<TABLE>
<xsl:for-each select="customers/customer order-by="name; -address/state"><TR> <TD><xsl:value-of select="name" /></TD>

<TD><xsl:value-of select="address" /></TD>


<TD><xsl:value-of select="phone" /></TD> </TR></xsl:for-each> </TABLE></BODY></HTML> </xsl:template></xsl:stylesheet>

135

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Xsl:value-of
Inserts the value of the selected node as text.
<xsl:value-of select="pattern" >

select
XSL pattern to be matched against the current context. The default value is ".", which inserts the value of the current node.

136

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Accessing the Attribute (example)


Go to the Interview.XML and introduce an attribute called Skill to the Candidate
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <HTML> <BODY> <TABLE BORDER="2"> <TR> <TD>Name</TD> <TD>Project</TD> <TD>Score</TD></TR>
137 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Accessing the Attribute example (continued)


<xsl:for-each select="Interview/Candidate[@Skill='COM']"> <TR> <TD><xsl:value-of select="Name"/></TD> <TD><xsl:value-of select="Project"/></TD> <TD><xsl:value-of select="Score"/></TD> </TR> </xsl:for-each> </TABLE> </BODY> </HTML> </xsl:template> </xsl:stylesheet>

138

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Accessing Attribute of child Node


Now introduce attribute as DOB to Name and try this code.
<?xml version='1.0'?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/"> <HTML> <BODY>

<TABLE BORDER="2">
<TR> <TD>Name</TD> <TD>Date of Birth</TD>

<TD>Project</TD>
<TD>Score</TD> </TR>
139 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Accessing Attribute of child node (continued)


<xsl:for-each select="Interview/Candidate[@Skill='COM']"> <TR> <TD><xsl:value-of select="Name"/></TD> <TD><xsl:value-of select="Name/@DOB"/></TD> <TD><xsl:value-of select="Project"/></TD> <TD><xsl:value-of select="Score"/></TD> </TR> </xsl:for-each> </TABLE> </BODY> </HTML> </xsl:template> </xsl:stylesheet>
140 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

XSLT = XSL transformations


XSLT is the most important part of XSL. XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by transforming each XML element into an (X)HTML element. With XSLT we can add/remove elements and attributes to or from the output file. We can also rearrange and sort elements, perform tests and make decisions about which elements to hide and display, and a lot more. A common way to describe the transformation process is to say that XSLT transforms an XML source-tree into an XML result-tree.

141

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSLT uses XPath


XSLT uses XPath to find information in an XML document.

XPath is used to navigate through elements and attributes in XML documents.


In the transformation process, XSLT uses XPath to define parts of the source document that should match one or more predefined templates. When a match is found, XSLT will transform the matching part of the source document into the result document.

142

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Browsers supporting XML and XSLT


Mozilla Firefox
As of version 1.0.2, Firefox has support for XML and XSLT (and CSS).

Mozilla
Mozilla includes Expat for XML parsing and has support to display XML + CSS. Mozilla also has some support for Namespaces. Mozilla is available with an XSLT implementation.

Netscape
As of version 8, Netscape uses the Mozilla engine, and therefore it has the same XML / XSLT support as Mozilla.

Opera
As of version 9, Opera has support for XML and XSLT (and CSS). Version 8 supports only XML + CSS.

Internet Explorer
As of version 6, Internet Explorer supports XML, Namespaces, CSS, XSLT, and Xpath. Version 5 is NOT compatible with the official W3C XSL Recommendation.
143 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Style sheet declaration


The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or <xsl:transform>. Note:
<xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be used!

The correct way to declare an XSL style sheet according to the W3C XSLT Recommendation is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

or:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

144

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Overview of XML transformations : Tree


Every well-formed XML document is a tree. A tree is a data structure composed of connected nodes beginning with a top node called the root. The root is connected to its child nodes, each of which is connected to zero or more children of its own, and so forth. Nodes that have no children of their own are called leaves.

The most useful property of a tree is that each node and its children also form a tree. Thus, a tree is a hierarchical structure of trees in which each tree is built out of smaller trees.

145

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Overview of XML transformations:


View PeriodicTable.xml The PERIODIC_TABLE element contains two child nodes, both ATOM elements. Each ATOM element has an attribute node for its STATE attribute, and a variety of child element nodes.

Each child element contains a node for its contents, as well as nodes for any attributes, comments and processing instructions it possesses.
Notice in particular that many nodes are something other than elements. There are nodes for text, attributes, comments, namespaces and processing instructions.

146

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSLT transformation
The input must be an XML document XSLT can work with HTML and SGML documents XSLT is not a general-purpose regular expression language for transforming arbitrary data. The XSL transformation language contains operators for selecting nodes from the tree, reordering the nodes, and outputting nodes. Most of the time the output of an XSLT transformation is also an XML document. XSLT processors also support output as HTML and/or raw text, although the standard does not require them to do so.

147

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSLT transformation (continued)


For the purposes of XSLT, elements, attributes, namespaces, processing instructions, and comments are counted as nodes. Furthermore, the root of the document must be distinguished from the root element. Thus, XSLT processors model an XML document as a tree that contains seven kinds of nodes:
The root Elements

Text
Attributes Namespaces Processing instructions Comments

148

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSLT transformation (continued)


The root PERIODIC_TABLE element contains ATOM child elements. Each ATOM element contains several child elements providing the atomic number, atomic weight, symbol, boiling point, and so forth. A UNITS attribute specifies the units for those elements that have units.

149

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Style sheet declaration (continued)


To get access to the XSLT elements, attributes and features we must declare the XSLT namespace at the top of the document. The xmlns:xsl=http://www.w3.org/1999/XSL/Transform points to the official W3C XSLT namespace. If we use this namespace, we must also include the attribute version="1.0".

150

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Creating a raw XML document


We want to transform the following XML document ("cdcatalog.xml") into XHTML: <?xml version="1.0" encoding="ISO-8859-1"?>

<catalog>
<cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist>

<country>USA</country>
<company>Columbia</company> <price>10.90</price> <year>1985</year>

</cd>.
</catalog>
151 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Create an XSL style sheet


Then you create an XSL Style Sheet ("cdcatalog.xsl") with a transformation template:
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html>

<body>
<h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th align="left">Title</th> <th align="left">Artist</th></tr>
152 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Create an XSL style sheet (continued)


<xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet>

153

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Link the XSL style sheet to the XML document

Add the XSL style sheet reference to your XML document ("cdcatalog.xml"):
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>

View cdcatalog.xml View cdcatalog_with_xsl.xml

154

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

XSL templates
An XSL style sheet consists of one or more set of rules that are called templates. Each template contains rules to apply when a specified node is matched.

155

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:template> element


The <xsl:template> element is used to build templates. The match attribute is used to associate a template with an XML element. The match attribute can also be used to define a template for the entire XML document. The value of the match attribute is an XPath expression (i.e. match="/" defines the whole document).

156

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

An example
<?xml version="1.0"?> <xsl:stylesheet version=1.0 xmlns:xsl="http://www.w3.org/1999/XSL/Transfor m"> <xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th align="left">Title</th> <th align="left">Artist</th> </tr>
157 XML Day 2
Copyright IBM Corporation 2009

<tr>

<td>.</td>
<td>.</td> </tr> </table> </body> </html> </xsl:template> </xsl:stylesheet >

IBM Global Business Services

Explanation
Since an XSL style sheet is an XML document itself, it always begins with the XML declaration: <?xml version="1.0" encoding="ISO-8859-1"?>. The next element, <xsl:stylesheet>, defines that this document is an XSLT style sheet document (along with the version number and XSLT namespace attributes). The <xsl:template> element defines a template. The match="/" attribute associates the template with the root of the XML source document.

The content inside the <xsl:template> element defines some HTML to write to the output.
The last two lines define the end of the template and the end of the style sheet.

The result of the transformation above will look like this:

158

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Where does the XML transformation happen?


There are three primary ways to transform XML documents into other formats, such as HTML, with an XSLT style sheet:
The XML document and associated style sheet are both served to the client (Web browser), which then transforms the document as specified by the style sheet and presents it to the user. The server applies an XSLT style sheet to an XML document to transform it to some other format (generally HTML) and sends the transformed document to the client (Web browser). A third program transforms the original XML document into some other format (often HTML) before the document is placed on the server. Both server and client only deal with the transformed document.

159

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:value-of>
The <xsl:value-of> element is used to extract the value of a selected node. It can be used to extract the value of an XML element and add it to the output stream of the transformation View cdcatalog_valueof.xsl View cdcatalog_valueof.xml

160

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:for-each>
The <xsl:for-each> element allows you to do looping in XSLT. The element can be used to select every XML element of a specified nodeset: <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td>

<td><xsl:value-of select="artist"/></td>
</tr> </xsl:for-each> Note: The value of the select attribute is an XPath expression. An XPath expression works like navigating a file system; where a forward slash (/) selects subdirectories.
161 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Filtering the output


We can also filter the output from the XML file by adding a criterion to the select attribute in the <xsl:for-each> element. <xsl:for-each select="catalog/cd [artist='Bob Dylan']"> Legal filter operators are:
= != (equal) (not equal)

&lt; less than


&gt; greater than

162

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Filtering the output


Take a look at the adjusted XSL style sheet: <xsl:for-each select="catalog/cd [artist='Bob Dylan']"> <tr>

<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each>

163

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:sort>
To sort the output, simply add an <xsl:sort> element inside the <xsl:for-each> element in the XSL file: <xsl:for-each select="catalog/cd"> <xsl:sort select="artist"/> <tr> <td><xsl:value-of select="title"/></td>

<td><xsl:value-of select="artist"/></td>
</tr> </xsl:for-each> Note: The select attribute indicates what XML element to sort on.

164

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:if> element


To put a conditional if test against the content of the XML file, add an <xsl:if> element to the XSL document. Syntax <xsl:if test="expression"> ... ...

some output if the expression is true


... ... </xsl:if>

165

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Where to Put the <xsl:if> element


<xsl:for-each select="catalog/cd"> <xsl:if test="price &gt; 10"> <tr>

<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td> </tr> </xsl:if> </xsl:for-each>

Note: The value of the required test attribute contains the expression to be evaluated. The code above will only output the title and artist elements of the CDs that has a price that is higher than 10.
166 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:choose> element


The <xsl:choose> element is used in conjunction with <xsl:when> and <xsl:otherwise> to express multiple conditional tests.

Syntax <xsl:choose> <xsl:when test="expression"> ... some output ... </xsl:when> <xsl:otherwise> ... some output .... </xsl:otherwise> </xsl:choose>

167

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Where to put the choose condition

To insert a multiple conditional test against the XML file, add the <xsl:choose>, <xsl:when>, and <xsl:otherwise> elements to the XSL file:
<xsl:for-each select="catalog/cd"> <xsl:otherwise> <td> <xsl:value-of select="artist"/> </td> </xsl:otherwise> </xsl:choose> </tr> </xsl:for-each>

<tr>
<td> <xsl:value-of select="title"/></td> <xsl:choose> 10">

<xsl:when test="price &gt;

<td bgcolor="#ff00ff"> <xsl:value-of select="artist"/> </td> </xsl:when>


168 XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The <xsl:choose> with <xsl:when> element


<xsl:for-each select="catalog/cd">
<tr> <td> <xsl:value-of select="title"/> </td> <xsl:choose> <xsl:when test="price &gt; 10"> <td bgcolor="#ff00ff"> <xsl:value-of select="artist"/> </td> </xsl:when> <xsl:when test="price &gt; 9"> <td bgcolor="#cccccc"> <xsl:value-of select="artist"/> </td> </xsl:when> <xsl:otherwise> <td>

<xsl:value-of select="artist"/>
</td> </xsl:otherwise> </xsl:choose>

</tr>
</xsl:for-each>
169 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Processing the child elements : <xsl:apply-templates>


The <xsl:apply-templates> element applies a template to the current element or to the current element's child nodes. If we add a select attribute to the <xsl:apply-templates> element it will process only the child element that matches the value of the attribute. We can use the select attribute to specify the order in which the child nodes are processed.

170

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Processing the child elements : <xsl:apply-templates>


View cdcatalog_applytemplate.xsl View cdcatalog_applytemplate.xml

171

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Wild cards
Sometimes you want a single template to apply to more than one element. You can indicate that a template matches all elements by using the asterisk wildcard (*) in place of an element name in the match attribute. For example this template says that all elements should be wrapped in a P element: <xsl:template match="*"> <P>

<xsl:value-of select="."/>
</P> </xsl:template> Of course this is probably more than you want.

Wed like to use the template rules already defined for PERIODIC_TABLE and ATOM elements as well as the root node and only use this rule for the other elements.
172 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Matching by ID
We may want to apply a particular style to a particular single element without changing all other elements of that type. The simplest way to do that in XSLT is to attach a style to the element's ID type attribute. This is done with the id() selector, which contains the ID value in single quotes. For example, this rule makes the element with the ID e47 bold: <xsl:template match="id('e47')"> <b><xsl:value-of select="."/></b> </xsl:template>

173

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Matching attributes with @


The @ sign matches against attributes and selects nodes according to attribute names. Simply prefix the name of the attribute that you want to select with the @ sign. For example, this template rule matches UNITS attributes, and wraps them in an I element.

<xsl:template match="@UNITS">
<I><xsl:value-of select="."/></I> </xsl:template>

174

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Expression types
Every expression evaluates to a single value. there are five types of expressions in XSLT:
Node sets Booleans Numbers

Strings
Result tree fragments

175

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Node sets
A node set is an unordered group of nodes from the input document. The axes return a node set containing the nodes they match. Which nodes are in the node set depends on the context node, the node test, and the axis. For example, when the context node is the PERIODIC_TABLE element, the XPath expression
select="child::ATOM" returns a node set that contains both ATOM elements in that document.

select="child::ATOM/child::NAME" returns a node set containing the two element nodes <NAME>Hydrogen</NAME> and <NAME>Helium</NAME> when the context node is the PERIODIC_TABLE element.

176

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Context node
The context node is a member of the context node list. The context node list is that group of elements that all match the same rule at the same time, generally as a result of one xsl:apply-templates or xsl:for-each call.

177

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Functions that operate on or return node sets


Function:
position() last() count(node-set) id(string1 string2 string3) key(string name, Object value) document(string URI, string base)

Return Type:
number number number node set

Returns:
The position of the context node in the context node list; the first node in the list has position 1 The number of nodes in the context node list; this is the same as the position of the last node in the list The number of nodes in node-set. A node set containing all the elements anywhere in the same document that have an ID named in the argument list; the empty set if no element has the specified ID. A node set containing all nodes in this document that have a key with the specified value. Keys are set with the top-level xsl:key element. A node set in the document referred to by the URI; the nodes are chosen from the named anchor or XPointer used by the URI. If there is no named anchor or XPointer, then the root element of the named document is the node set. Relative URIs are relative to the base URI given in the second argument. If the second argument is omitted, then relative URIs are relative to the URI of the style sheet (not the source document!).

node set node set

178

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Functions that operate on or return node sets


Function:
local-name(node set)

Return Type:
String

Returns:
The local name (everything after the namespace prefix) of the first node in the node set argument; can be used without any arguments to get the local name of the context node. The URI of the namespace of the first node in the node set; can be used without any arguments to get the URI of the namespace of the context node; returns an empty string if the node is not in a namespace.

namespace-uri(node set)

String

name(node set)

String

The qualified name (both prefix and local part) of the first node in the node set argument; can be used without an argument to get the qualified name of the context node. A unique identifier for the first node in the argument node set; can be used without any argument to generate an ID for the context node.

generate-id(node set)

String

179

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Position(): Example
The position() function can be used to determine an element's position within a node set. Prefixes the name of each atom's name with its position in the document using
<xsl:value-of select="position()"/>.

View periodictable_position.xsl
View periodictable_position.xml

180

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Booleans
A Boolean has one of two values: True or False. XSLT allows any kind of data to be transformed into a Boolean. This is often done implicitly when a string or a number or a node set is used where a Boolean is expected, as in the test attribute of an xsl:if element. These conversions can also be performed by the boolean() function which converts an argument of any type to a boolean according to these rules:
A number is false if it's zero or NaN (a special symbol meaning Not a Number, used for the result of dividing by zero and similar illegal operations); true otherwise. An empty node set is false. All other node sets are true. An empty result tree fragment is false. All other result tree fragments are true.

A zero length string is false. All other strings are true.

181

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Booleans (continued)
Booleans are also produced as the result of expressions involving these operators:
= equal to != not equal to < less than (really &lt;) > greater than <= less than or equal to (really &lt;=)

>= greater than or equal to

Note : The < sign is illegal in attribute values. Consequently, it must be replaced by &lt; even when used as the less-than operator.

182

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Booleans (continued)
Child::ATOM selects all the ATOM children of the context node. Child::ATOM[position()=1] selects only the first ATOM child of the context node. [position()=1] is a predicate on the node test ATOM that returns a boolean result:
True if the position of the ATOM is equal to one; false otherwise.

Each node test can have any number of predicates. However, more than one is unusual.

183

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Example of boolean operators


For example, this template rule applies to the first ATOM element in the periodic table, but not to subsequent ones, by testing whether or not the position of the element equals 1.
<xsl:template match="PERIODIC_TABLE/ATOM[position()=1]"> <xsl:value-of select="."/> </xsl:template>

This template rule applies to all ATOM elements that are not the first child element of the PERIODIC_TABLE by testing whether the position is greater than 1:
<xsl:template match="PERIODIC_TABLE/ATOM[position()>1]"> <xsl:value-of select="."/>

</xsl:template>

184

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Example of boolean operators (continued)


<xsl:template match="ATOMIC_NUMBER[position()=1 and position()=last()]"> <xsl:value-of select="."/>

</xsl:template>

If the first condition is false, then the complete and expression is guaranteed to be false. Consequently, the second condition won't be checked. This template matches both the first and last ATOM elements in their parent by matching when the position is 1 or when the position is equal to the number of elements in the set:
<xsl:template match="ATOM[position()=1 or position()=last()]">

<xsl:value-of select="."/>
</xsl:template>
185 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Example of boolean operators (continued)


The not() function reverses the result of an operation. For example, this template rule matches all ATOM elements that are not the first child of their parents: <xsl:template match="ATOM[not(position()=1)]"> <xsl:value-of select="."/> </xsl:template> The same template rule could be written using the not equal operator != instead: <xsl:template match="ATOM[position()!=1]"> <xsl:value-of select="."/>

</xsl:template>

186

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions
XPath numbers are 64-bit IEEE 754 floating-point doubles. Even numbers like 42 or -7000 that look like integers are stored as doubles.

Nonnumber values such as strings and booleans are converted to numbers automatically as necessary, or at user request through the number() function using these rules:
Booleans are 1 if true; 0 if false. A string is trimmed of leading and trailing white space, then converted to a number in the fashion you would expect;
For example: The string "12" is converted to the number 12. If the string cannot be interpreted as a number, then it is converted to the special symbol NaN, which stands for Not a Number.

Node sets and result tree fragments are converted to strings; the string is then converted to a number.
187 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Number functions (continued)


For example: This template only outputs the non-naturally occurring transuranium elements; that is, those elements with atomic numbers greater than 92 (the atomic number of uranium).

The node set produced by ATOMIC_NUMBER is implicitly converted to the string value of the current ATOMIC_NUMBER node. This string is then converted into a number.
<xsl:template match="/PERIODIC_TABLE"> <HTML> <HEAD> <TITLE>The Transuranium Elements</TITLE> </HEAD> <BODY> <xsl:apply-templates select="ATOM[ATOMIC_NUMBER>92]"/> </BODY>

</HTML> </xsl:template>

188

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions (continued)


XPath provides the standard four arithmetic operators:
+ for addition - for subtraction * for multiplication div for division (the more common / is already used for other purposes in XPath)

189

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions: Example


For example, this rule selects those elements whose atomic weight is more than twice their atomic number: <xsl:template match="/PERIODIC_TABLE"> <HTML> <BODY> <H1>High Atomic Weight to Atomic Number Ratios</H1> <xsl:apply-templates select="ATOM[ATOMIC_WEIGHT > 2 * ATOMIC_NUMBER]"/> </BODY> </HTML> </xsl:template>

190

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions example (continued)


This template actually prints the ratio of atomic weight to atomic number: <xsl:template match="ATOM"> <p> <xsl:value-of select="NAME"/>

<xsl:value-of select="ATOMIC_WEIGHT div ATOMIC_NUMBER"/>


</p> </xsl:template>

191

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions example (continued)


XPath includes four functions that operate on numbers:
floor() returns the greatest integer less than or equal to the number ceiling() returns the smallest integer greater than or equal to the number round() rounds the number to the nearest integer sum() returns the sum of its arguments

192

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions example (continued)


For example: This template rule estimates the number of neutrons in an atom by subtracting the atomic number (the number of protons) from the atomic weight (the weighted average over the natural distribution of isotopes of the number of neutrons plus the number of protons) and rounding to the nearest integer: <xsl:template match="ATOM"> <p> <xsl:value-of select="NAME"/>

<xsl:value-of select="round(ATOMIC_WEIGHT - ATOMIC_NUMBER)"/> </p> </xsl:template>

193

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Number functions example (continued)


This rule calculates the average atomic weight of all the atoms in the table by adding all the atomic weights, and then dividing by the number of atoms: <xsl:template match="/PERIODIC_TABLE"> <HTML> <BODY> <H1>Average Atomic Weight</H1> <xsl:value-of select="sum(descendant::ATOMIC_WEIGHT) div count(descendant::ATOMIC_WEIGHT)"/> </BODY> </xsl:template>
194 XML Day 2
Copyright IBM Corporation 2009

</HTML>

IBM Global Business Services

String functions
A string is a sequence of Unicode characters. Other data types can be converted to strings using the string() function according to these rules: Node sets are converted to strings by using the value of the first node in the set as calculated by the xsl:value-of element. A number is converted to a European-style number string like -12 or 3.1415292. Boolean false is converted to the English word false. Boolean true is converted to the English word true.

195

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

String functions
Function: Return Type: Returns:

starts-with(main_string, prefix_string)
contains(containing_string, contained_string) substring(string, offset, length)

Boolean
Boolean String

True if main_string starts with prefix_string; false otherwise


True if the contained_string is part of the containing_string; false otherwise length characters from the specified offset in string; or all characters from the offset to the end of the string if length is omitted; length and offset are rounded to the nearest integer if necessary The part of the string from the first character up to (but not including) the first occurrence of marker-string The part of the string from the end of the first occurrence of marker-string to the end of string; the first character in the string is at offset 1 The number of characters in string The string after leading and trailing white space is stripped and runs of white space are replaced with a single space; if the argument is omitted the string value of the context node is normalized
Copyright IBM Corporation 2009

substring-before(string, markerstring) substring-after(string, markerstring) string-length(string) normalize-space(string)

String

String

Number String

196

XML Day 2

IBM Global Business Services

String functions (continued)


Function: translate(string, replaced_text, replacement_text) Return Type: String Returns: Returns string with occurrences of characters in replaced_text replaced by the corresponding characters from replacement_text Returns the concatenation of as many strings as are passed as arguments in the order they were passed Returns the string form of number formatted according to the specified format-string as if by Java 1.1's java.text.DecimalFormat class (see http://java.sun.com/products/jdk/1.1/docs/api/java.te xt.DecimalFormat.html); the locale-string is an optional argument that provides the name of the xsl:decimal-format element used to interpret the format-string

concat(string1, string2, . . . )

String

format-number(number, formatstring, locale-string)

String

197

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Numbering using xsl:number


Counting nodes with xsl:number The xsl:number element inserts a formatted integer into the output docuThe value of the integer is given ment. by the value attribute.

This contains a number, which is rounded to the nearest integer, then formatted according to the value of the format attribute.

198

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

An XSLT style sheet that counts atoms


<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="PERIODIC_TABLE">
<html> <head><title>The Elements</title></head> <body> <table> <tr><xsl:apply-templates select="ATOM"/></tr> </table></body></html> </xsl:template> <xsl:template match="ATOM">

<td><xsl:number value="ATOMIC_NUMBER"/></td>
<td><xsl:value-of select="NAME"/></td> </xsl:template></xsl:stylesheet>
199 XML Day 2
Copyright IBM Corporation 2009

IBM Global Business Services

Default numbers
If you use the value attribute to calculate the number, that's all you need. However, if the value attribute is omitted, then the position of the current node in the source tree is used as the number.

200

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Default numbers (continued)


For example: Produce a table of atoms that have boiling points less than or equal to the boiling point of nitrogen. View periodictable_number.xsl View periodictable_number.xml

201

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

xsl:number
We can change what xsl:number counts using these three attributes:
level count from

202

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

xsl:number- count
<xsl:template match="ATOM/*"> <td> <xsl:number count="*"/> </td>

<td>
<xsl:value-of select="."/> </td> </xsl:template>

203

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

xsl:number- level
By default, with no value attribute, xsl:number counts siblings of the source node with the same type. For instance, if the ATOMIC_NUMBER elements were numbered instead of ATOM elements, none would have a number higher than 1 because an ATOM never has more than one ATOMIC_NUMBER child. Although the document contains more than one ATOMIC_NUMBER element, these are not siblings. Setting the level attribute of xsl:number to any counts all of the elements of the same kind as the current node in the document. This includes not just the ones in the current node list, but all nodes of the same type. Even if you select only the atomic numbers of the gases, for example, the solids and liquids would still count, even if they weren't output. Consider these rules:

204

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

xsl:number- level (continued)


<xsl:template match="ATOM"> <tr><xsl:apply-templates select="NAME"/></tr> </xsl:template> <xsl:template match="NAME">

<td><xsl:number level="any"/></td>
<td><xsl:value-of select="."/></td> </xsl:template>

205

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

The from attribute


The from attribute contains an XPath expression that specifies which element the counting begins with in the input tree. However, the counting still begins from 1, not 2 or 10 or some other number. The from attribute only changes which element is considered to be the first element. This attribute is only considered when level="any". Other times it has no effect.

206

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Questions

207

XML Day 2

Copyright IBM Corporation 2009

IBM Global Business Services

Testing your understanding

1. Which of the following is used to describe the XML document? a. Document Type Definition b. Data Type Definition c. Data Type Document d. Document Type Decision

2. XMLs goal is to replace HTML.


a. False b. True

208

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Testing your understanding

3. The wild card character used to describe 1 or many in DTD is _______. a. # b. + c. * d. ? 4. The storage unit that contain particular parts of XML document is called _____. a. b. c. d.
209

ELEMENT ENTITY ATTRIBUTE NOTATION


XML Day 1
Copyright IBM Corporation 2009

IBM Global Business Services

Testing your understanding


5. Processing Instructions are indicated as ________. a. <!xml > b. <?xml ?> c. <!xml . !> d. <?xml ... >

6. XML Elements must always be in lower case.


a. True b. False

210

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Testing your understanding


7. DTD is used widely when compared to Schema. a. False

b. True

8. External DTD has no <!DOCTYPE> declaration within it. a. False b. True

211

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Testing your understanding (continued)


9. The syntax for XML commenting is __________. a. /* This is comment in XML */ b. // This is comment in XML c. <-- This is comment in XML --> d. <!- - This is comment in XML - ->

10. Attributes of XML elements must always in single quotes.


a. True b. False

212

XML Day 1

Copyright IBM Corporation 2009

IBM Global Business Services

Summary

At the completion of this course, we see that you are now able to: Put in your own words and introduction to XML Define what is XML Identify the document type definitions and validity Describe attribute declarations in DTDs Explain entities and external DTD subsets Define embedding non XML data Describe XML namespaces and parsers Explain XML schemas
Copyright IBM Corporation 2009

213

XML Day 1

IBM Global Business Services

THANK YOU

214

214

XML Day 2

Distribution Channels

Copyright IBM Corporation 2009

Das könnte Ihnen auch gefallen