Beruflich Dokumente
Kultur Dokumente
http://www.lectnote.blogspot.com
What is XML?
XML XML XML XML XML XML stands for EXtensible Markup Language is a markup language much like HTML was designed to carry data, not to display data tags are not predefined. You must define your own tags is designed to be self-descriptive is a W3C Recommendation
XML was designed to transport and store data. HTML was designed to display data.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
XML is Everywhere
We have been participating in XML development since its creation. It has been amazing to see how quickly the XML standard has developed, and how quickly a large number of software vendors have adopted the standard.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
XML is now as important for the Web as HTML was to the foundation of the Web. XML is everywhere. It is the most common tool for data transmissions between all sorts of applications, and is becoming more and more popular in the area of storing and describing information.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The future might give us word processors, spreadsheet applications and databases that can read each other's data in a pure text format, without any conversion utilities in between. We can only pray that all the software vendors will agree. XML documents form a tree structure that starts at "the root" and branches to "the leaves".
<?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 = Latin-1/West European character set). The next line describes the root element of the document (like saying: "this document is a note"):
<note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
</note>
You can assume, from this example, that the XML document contains a note to Tove from Jani. Don't you agree that XML is pretty self-descriptive?
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
XML documents must contain a root element. This element is "the parent" of all other elements. The elements in an XML document form a document tree. The tree starts at the root and branches to the lowest level of the tree. All elements can have sub elements (child elements):
Example:
<bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN">
6
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
The root element in the example is <bookstore>. All <book> elements in the document are contained within <bookstore>. The <book> element has 4 children: <title>,< author>, <year>, <price>.
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself, and it has no closing tag.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
XML elements are defined using XML tags. XML tags are case sensitive. With XML, the tag <Letter> is different from the tag <letter>. Opening and closing tags must be written with the same case:
Note: "Opening and closing tags" are often referred to as "Start and end tags". Use whatever you prefer. It is exactly the same thing.
In the example above, "Properly nested" simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML. If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. This will generate an XML error:
To avoid this error, replace the "<" character with an entity reference:
There are 5 predefined entity references in XML: < > < > less than greater than
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML. <!-- This is a comment -->
10
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
An element can contain other elements, simple text or a mixture of both. Elements can also have attributes.
<bookstore> <book category="CHILDREN"> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title>Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
In the example above, <bookstore> and <book> have element contents, because they contain other elements. <author> has text content because it contains text. In the example above only <book> has an attribute (category="CHILDREN").
11
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Avoid ":" characters. Colons are reserved to be used for something called namespaces (more later). XML documents often have a corresponding database. A good practice is to use the naming rules of your database for the elements in the XML documents. Non-English letters like are perfectly legal in XML, but watch out for problems if your software vendor doesn't support them.
<note> <date>2008-01-10</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Should the application break or crash? No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output. One of the beauties of XML, is that it can often be extended without breaking applications.
12
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
XML Attributes
XML elements can have attributes in the start tag, just like HTML. Attributes provide additional information about elements.
XML Attributes
From HTML you will remember this: <img src="computer.gif">. The "src" attribute provides additional information about the <img> element. In HTML (and in XML) attributes provide additional information about elements:
Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:
<file type="gif">computer.gif</file>
<person sex="female">
or like this:
<person sex='female'>
If the attribute value itself contains double quotes you can use single quotes, like in this example:
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
In the first example sex is an attribute. In the last, sex is an element. Both examples provide the same information. There are no rules about when to use attributes and when to use elements. Attributes are handy in HTML. In XML my advice is to avoid them. Use elements instead.
My Favorite Way
The following three XML documents contain exactly the same information: A date attribute is used in the first example:
<note date="10/01/2008"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
14
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<note> <date>10/01/2008</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
<note> <date> <day>10</day> <month>01</month> <year>2008</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Attributes are difficult to read and maintain. Use elements for data. Use attributes for information that is not relevant to the data. Don't end up like this:
<note day="10" month="01" year="2008" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>
15
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<messages> <note id="501"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note id="502"> <to>Jani</to> <from>Tove</from> <heading>Re: Reminder</heading> <body>I will not</body> </note> </messages>
The ID above is just an identifier, to identify the different notes. It is not a part of the note itself. What I'm trying to say here is that metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.
XML Validation
XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
16
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
elements must have a closing tag tags are case sensitive elements must be properly nested attribute values must be quoted
<?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "Note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
The DOCTYPE declaration in the example above, is a reference to an external DTD file. The content of the file is shown in the paragraph below.
XML DTD
The purpose of a DTD is to define the structure of an XML document. It defines the structure with a list of legal elements:
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
If you want to study DTD, you will find our DTD tutorial on our homepage.
XML Schema
W3C supports an XML-based alternative to DTD, called XML Schema:
<xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
If you want to study XML Schema, you will find our Schema tutorial on our homepage.
XML Validator
Use our XML validator to syntax-check your XML.
18
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The W3C XML specification states that a program should stop processing an XML document if it finds an error. The reason is that XML software should be small, fast, and compatible. HTML browsers will display documents with errors (like missing end tags). HTML browsers are big and incompatible because they have a lot of unnecessary code to deal with (and display) HTML errors. With XML, errors are not allowed.
Note: This only checks if your XML is "Well formed". If you want to validate your XML against a DTD, see the last paragraph on this page.
Validate
Note: If you get an "Access denied" error, it's because your browser security does not allow file access across domains.
19
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The file "note_error.xml" demonstrates your browsers error handling. If you want see an error free message, substitute the "note_error.xml" with "cd_catalog.xml".
Note: Only Internet Explorer will actually check your XML against the DTD. Firefox, Mozilla, Netscape, and Opera will not.
20
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Look at this XML file: note.xml The XML document will be displayed with color-coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. To view the raw XML source (without the + and - signs), select "View Page Source" or "View Source" from the browser menu. Note: In Chrome, Opera, and Safari, only the element text will be displayed. To view the raw XML, you must right click the page and select "View Source"
21
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Without any information about how to display the data, most browsers will just display the XML document as it is. In the next chapters, we will take a look at different solutions to the display problem, using CSS, XSLT and JavaScript.
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD>
22
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
. . . </CATALOG>
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="simple.xsl"?> <breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description>two of our famous Belgian Waffles</description> <calories>650</calories> </food> </breakfast_menu>
If you want to learn more about XSLT, find our XSLT tutorial on our homepage.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
In the example above, the XSLT transformation is done by the browser, when the browser reads the XML file. Different browsers may produce different result when transforming XML with XSLT. To reduce this problem the XSLT transformation can be done on the server. View the result. Note that the result of the output is exactly the same, either the transformation is done by the web server or by the web browser.
XML Parser
Most browsers have a build-in XML parser to read and manipulate XML. The parser converts XML into a JavaScript accessible object.
Parsing XML
The XML DOM contains methods (functions) to traverse XML trees, access, insert, and delete nodes. However, before an XML document can be accessed and manipulated, it must be loaded into an XML DOM object. The parser reads XML into memory and converts it into an XML DOM object that can be accessed with JavaScript.
Code explained: Create an XMLHTTP object Open the XMLHTTP object Send an XML HTTP request to the server
24
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
try //Internet Explorer { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.loadXML(txt); return xmlDoc; } catch(e) { parser=new DOMParser(); xmlDoc=parser.parseFromString(txt,"text/xml"); return xmlDoc; }
Note: Internet Explorer uses the loadXML() method to parse an XML string, while other browsers uses the DOMParser object.
XML DOM
25
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The DOM (Document Object Model) defines a standard way for accessing and manipulating documents.
You can learn more about the XML DOM in our XML DOM tutorial.
You can learn more about the HTML DOM in our HTML DOM tutorial.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The following code loads an XML document ("note.xml") into the XML parser:
Example
<html> <head> <script type="text/javascript"> function loadXMLDoc(dname) { var xmlDoc; if (window.XMLHttpRequest) { xmlDoc=new window.XMLHttpRequest(); xmlDoc.open("GET",dname,false); xmlDoc.send(""); return xmlDoc.responseXML; } // IE 5 and IE 6 else if (ActiveXObject("Microsoft.XMLDOM")) { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async=false; xmlDoc.load(dname); return xmlDoc; } alert("Error loading document"); return null; } </script> </head> <body> <h1>W3Schools Internal Note</h1> <p><b>To:</b> <span id="to"></span><br /> <b>From:</b> <span id="from"></span><br /> <b>Message:</b> <span id="message"></span> <script type="text/javascript"> xmlDoc=loadXMLDoc("note.xml"); document.getElementById("to").innerHTML=xmlDoc.getElementsByTagN ame("to")[0].childNodes[0].nodeValue; document.getElementById("from").innerHTML=xmlDoc.getElementsByTa gName("from")[0].childNodes[0].nodeValue; document.getElementById("message").innerHTML=xmlDoc.getElementsB
27
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Important Note
To extract the text "Jani" from the XML, the syntax is:
getElementsByTagName("from")[0].childNodes[0].nodeValue
In the XML example there is only one <from> tag, but you still have to specify the array index [0], because the XML parser method getElementsByTagName() returns an array of all <from> nodes.
Example
<html> <head> <script type="text/javascript"> function loadXMLString() { text="<note>"; text=text+"<to>Tove</to>"; text=text+"<from>Jani</from>"; text=text+"<heading>Reminder</heading>"; text=text+"<body>Don't forget me this weekend!</body>"; text=text+"</note>"; try //Internet Explorer { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.loadXML(text);
28
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
} catch(e) { try // Firefox, Mozilla, Opera, etc. { parser=new DOMParser(); xmlDoc=parser.parseFromString(text,"text/xml"); } catch(e) { alert(e.message); return; } } document.getElementById("to").innerHTML=xmlDoc.getElementsByTagN ame("to")[0].childNodes[0].nodeValue; document.getElementById("from").innerHTML=xmlDoc.getElementsByTa gName("from")[0].childNodes[0].nodeValue; document.getElementById("message").innerHTML=xmlDoc.getElementsB yTagName("body")[0].childNodes[0].nodeValue; } </script> </head> <body onload="loadXMLString()"> <h1>W3Schools Internal Note</h1> <p><b>To:</b> <span id="to"></span><br /> <b>From:</b> <span id="from"></span><br /> <b>Message:</b> <span id="message"></span> </p> </body> </html>
Note: Internet Explorer uses the loadXML() method to parse an XML string, while other browsers use the DOMParser object.
XML to HTML
29
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Example
<html> <body> <script type="text/javascript"> var xmlDoc; if (window.XMLHttpRequest) { xmlDoc=new window.XMLHttpRequest(); xmlDoc.open("GET","cd_catalog.xml",false); xmlDoc.send(""); xmlDoc=xmlDoc.responseXML; } // IE 5 and IE 6 else if (ActiveXObject("Microsoft.XMLDOM")) { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async=false; xmlDoc.load("cd_catalog.xml"); } document.write("<table border='1'>"); var x=xmlDoc.getElementsByTagName("CD"); for (i=0;i<x.length;i++) { document.write("<tr><td>"); document.write(x[i].getElementsByTagName("ARTIST")[0].childNodes [0].nodeValue); document.write("</td><td>"); document.write(x[i].getElementsByTagName("TITLE")[0].childNodes[ 0].nodeValue); document.write("</td></tr>");
30
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Example explained
We the We We For We check the browser, and load the XML using the correct parser (explained in previous chapter) create an HTML table with <table border="1"> use getElementsByTagName() to get all XML CD nodes each CD node, we display data from ARTIST and TITLE as table data. end the table with </table>
For more information about using JavaScript and the XML DOM, visit our XML DOM tutorial.
The XMLHttpRequest object is supported in all modern browsers. Example: XML HTTP communication with a server while typing input
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Creating an XMLHttpRequest object is done with one single line of JavaScript. In all modern browsers (including IE7):
Example
<script type="text/javascript"> var xmlhttp; function loadXMLDoc(url) { xmlhttp=null; if (window.XMLHttpRequest) {// code for all new browsers xmlhttp=new XMLHttpRequest(); } else if (window.ActiveXObject) {// code for IE5 and IE6 xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } if (xmlhttp!=null) { xmlhttp.onreadystatechange=state_Change; xmlhttp.open("GET",url,true); xmlhttp.send(null); } else { alert("Your browser does not support XMLHTTP."); } } function state_Change() { if (xmlhttp.readyState==4) {// 4 = "loaded"
32
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
if (xmlhttp.status==200) {// 200 = OK // ...our code here... } else { alert("Problem retrieving XML data"); } } } </script>
Note: onreadystatechange is an event handler. The value (state_Change) is the name of a function which is triggered when the state of the XMLHttpRequest object changes. States run from 0 (uninitialized) to 4 (complete). Only when the state = 4, we can execute our code.
More Examples
Load a textfile into a div element with XML HTTP Make a HEAD request with XML HTTP
33
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Make a specified HEAD request with XML HTTP Display an XML file as an HTML table
XML Application
This chapter demonstrates a small XML application built with HTML and JavaScript.
<?xml version="1.0" encoding="ISO-8859-1"?> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> . . .
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
To load the XML document (cd_catalog.xml), we use the same code as we used in the XML Parser chapter:
if (window.XMLHttpRequest) { xmlDoc=new window.XMLHttpRequest(); xmlDoc.open("GET","cd_catalog.xml",false); xmlDoc.send(""); xmlDoc=xmlDoc.responseXML; } // IE 5 and IE 6 else if (ActiveXObject("Microsoft.XMLDOM")) { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async=false; xmlDoc.load("cd_catalog.xml"); }
After the execution of this code, xmlDoc is an XML DOM object, accessible by JavaScript.
Example
document.write("<table border='1'>"); var x=xmlDoc.getElementsByTagName("CD"); for (i=0;i<x.length;i++) { document.write("<tr><td>"); document.write(x[i].getElementsByTagName("ARTIST")[0].childNodes [0].nodeValue); document.write("</td><td>"); document.write(x[i].getElementsByTagName("TITLE")[0].childNodes[ 0].nodeValue); document.write("</td></tr>"); } document.write("</table>");
35
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
For each CD element in the XML document, a table row is created. Each table row contains two table data with ARTIST and TITLE from the current CD element.
Example
var x=xmlDoc.getElementsByTagName("CD"); i=0; function display() { artist=(x[i].getElementsByTagName("ARTIST")[0].childNodes[0].nod eValue); title=(x[i].getElementsByTagName("TITLE")[0].childNodes[0].nodeV alue); year=(x[i].getElementsByTagName("YEAR")[0].childNodes[0].nodeVal ue); txt="Artist: " + artist + "<br />Title: " + title + "<br />Year: "+ year; document.getElementById("show").innerHTML=txt; }
The body of the HTML document contains an onload event attribute that calls the display() function when the page is loaded. It also contains a <div id='show'> element to receive the XML data.
36
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
In the example above, you will only see data from the first CD element in the XML document. To navigate to the next CD element, you have to add some more code.
Example
function next() { if (i<x.length-1) { i++; display(); } } function previous() { if (i>0) { i--; display(); } }
The next() function displays the next CD, unless you are on the last CD element. The previous() function displays the previous CD, unless you are at the first CD element. The next() and previous() functions are called by clicking next/previous buttons:
<input type="button" onclick="previous()" value="previous" /> <input type="button" onclick="next()" value="next" />
37
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Look at this XML file: note.xml The XML document will be displayed with color-coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. To view the raw XML source (without the + and - signs), select "View Page Source" or "View Source" from the browser menu.
38
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Note: In Chrome, Opera, and Safari, only the element text will be displayed. To view the raw XML, you must right click the page and select "View Source"
XML CDATA
All text in an XML document will be parsed by the parser. But text inside a CDATA section will be ignored by the parser.
39
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The parser does this because XML elements can contain other elements, as in this example, where the <name> element contains two other elements (first and last):
<name><first>Bill</first><last>Gates</last></name>
Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.
<script> <![CDATA[
40
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
function matchwo(a,b) { if (a < b && a < 0) then { return 1; } else { return 0; } } ]]> </script>
In the example above, everything inside the CDATA section is ignored by the parser. Notes on CDATA sections: A CDATA section cannot contain the string "]]>". Nested CDATA sections are not allowed. The "]]>" that marks the end of the CDATA section cannot contain spaces or line breaks.
DTD Tutorial
The purpose of a DTD (Document Type Definition) is to define the legal building blocks of an XML document. A DTD defines the document structure with a list of legal elements and attributes. Start learning DTD now!
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
BYLINE (#PCDATA)> LEAD (#PCDATA)> BODY (#PCDATA)> NOTES (#PCDATA)> ARTICLE ARTICLE ARTICLE ARTICLE AUTHOR CDATA #REQUIRED> EDITOR CDATA #IMPLIED> DATE CDATA #IMPLIED> EDITION CDATA #IMPLIED>
Introduction to DTD
A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.
<?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)>
42
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note>
Open the XML file above in your browser (select "view source" or "view page source" to view the DTD) The DTD above is interpreted like this:
!DOCTYPE note defines that the root element of this document is note !ELEMENT note defines that the note element contains four elements: "to,from,heading,body" !ELEMENT to defines the to element to be of type "#PCDATA" !ELEMENT from defines the from element to be of type "#PCDATA" !ELEMENT heading defines the heading element to be of type "#PCDATA" !ELEMENT body defines the body element to be of type "#PCDATA"
<?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>
43
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Elements Attributes 44
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Elements
Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img". Examples:
Attributes
Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file:
Entities
45
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. Most of you know the HTML entity: " ". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML:
Entity References < > & " ' Character < > & " '
PCDATA
PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively.
CDATA
46
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
CDATA means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
DTD - Elements
In a DTD, elements are declared with an ELEMENT declaration.
Declaring Elements
In a DTD, XML elements are declared with an element declaration with the following syntax:
Empty Elements
Empty elements are declared with the category keyword EMPTY:
<!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example: <br />
47
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
48
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
DTD - Attributes
In a DTD, attributes are declared with an ATTLIST declaration.
Declaring Attributes
An attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value> DTD example: <!ATTLIST payment type CDATA "check"> XML example: <payment type="check" />
The attribute-type can be one of the following:
Type CDATA (en1|en2|..) ID Description The value is character data The value must be one from an enumerated list The value is a unique id 51
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
The value is the id of another element The value is a list of other ids The value is a valid XML name The value is a list of valid XML names The value is an entity The value is a list of entities The value is a name of a notation The value is a predefined xml value
52
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
In the example above, the "square" element is defined to be an empty element with a "width" attribute of type CDATA. If no width is specified, it has a default value of 0.
#REQUIRED
Syntax
DTD: <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML: <person />
Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present.
#IMPLIED
Syntax
53
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
#FIXED
Syntax
DTD: <!ATTLIST sender company CDATA #FIXED "Microsoft"> Valid XML: <sender company="Microsoft" /> Invalid XML: <sender company="W3Schools" />
Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
default-value>
Example
DTD: <!ATTLIST payment type (check|cash) "cash"> XML example: <payment type="check" /> or <payment type="cash" />
Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values.
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
</person>
In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information. There are no rules about when to use attributes, and when to use child elements. My experience is that attributes are handy in HTML, but in XML you should try to avoid them. Use child elements if the information feels like data.
My Favorite Way
I like to store data in child elements. The following three XML documents contain exactly the same information: A date attribute is used in the first example:
<note date="12/11/2002"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
A date element is used in the second example:
<note> <date>12/11/2002</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
An expanded date element is used in the third: (THIS IS MY FAVORITE):
56
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<note> <date> <day>12</day> <month>11</month> <year>2002</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a DTD
If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data. Don't end up like this (this is not how XML should be used):
<note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>
57
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<messages> <note id="p501"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note id="p502"> <to>Jani</to> <from>Tove</from> <heading>Re: Reminder</heading> <body>I will not!</body> </note> </messages>
The ID in these examples is just a counter, or a unique identifier, to identify the different notes in the XML file, and not a part of the note data. What I am trying to say here is that metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.
58
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
DTD - Entities
Entities are variables used to define shortcuts to standard text or special characters.
Entity references are references to entities Entities can be declared internal or external
DTD Example: <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> XML example: <author>&writer;©right;</author>
Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).
59
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
Example
DTD Example: <!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd"> <!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd"> XML example: <author>&writer;©right;</author>
DTD Validation
With Internet Explorer 5+ you can validate your XML against a DTD.
Example
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.validateOnParse="true"; xmlDoc.load("note_dtd_error.xml"); document.write("<br />Error Code: "); document.write(xmlDoc.parseError.errorCode);
60
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
document.write("<br />Error Reason: "); document.write(xmlDoc.parseError.reason); document.write("<br />Error Line: "); document.write(xmlDoc.parseError.line);
Example
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.validateOnParse="false"; xmlDoc.load("note_dtd_error.xml"); document.write("<br />Error Code: "); document.write(xmlDoc.parseError.errorCode); document.write("<br />Error Reason: "); document.write(xmlDoc.parseError.reason); document.write("<br />Error Line: "); document.write(xmlDoc.parseError.line);
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
To help you check your xml files, you can syntax-check any XML file here.
TV Schedule DTD
By David Moisan. Copied from http://www.davidmoisan.org/
<!DOCTYPE TVSCHEDULE [ <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST ]> TVSCHEDULE (CHANNEL+)> CHANNEL (BANNER,DAY+)> BANNER (#PCDATA)> DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> HOLIDAY (#PCDATA)> DATE (#PCDATA)> PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> TIME (#PCDATA)> TITLE (#PCDATA)> DESCRIPTION (#PCDATA)> TVSCHEDULE NAME CDATA #REQUIRED> CHANNEL CHAN CDATA #REQUIRED> PROGRAMSLOT VTR CDATA #IMPLIED> TITLE RATING CDATA #IMPLIED> TITLE LANGUAGE CDATA #IMPLIED>
<!DOCTYPE NEWSPAPER [ <!ELEMENT NEWSPAPER (ARTICLE+)> <!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)> <!ELEMENT HEADLINE (#PCDATA)>
62
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
BYLINE (#PCDATA)> LEAD (#PCDATA)> BODY (#PCDATA)> NOTES (#PCDATA)> ARTICLE ARTICLE ARTICLE ARTICLE AUTHOR CDATA #REQUIRED> EDITOR CDATA #IMPLIED> DATE CDATA #IMPLIED> EDITION CDATA #IMPLIED>
<!ENTITY NEWSPAPER "Vervet Logic Times"> <!ENTITY PUBLISHER "Vervet Logic Press"> <!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press"> ]>
<!DOCTYPE CATALOG [ <!ENTITY AUTHOR "John Doe"> <!ENTITY COMPANY "JD Power Tools, Inc."> <!ENTITY EMAIL "jd@jd-tools.com"> <!ELEMENT CATALOG (PRODUCT+)> <!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)> <!ATTLIST PRODUCT NAME CDATA #IMPLIED CATEGORY (HandTool|Table|Shop-Professional) "HandTool" PARTNUM CDATA #IMPLIED PLANT (Pittsburgh|Milwaukee|Chicago) "Chicago" INVENTORY (InStock|Backordered|Discontinued) "InStock"> <!ELEMENT SPECIFICATIONS (#PCDATA)> <!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED POWER CDATA #IMPLIED>
63
WEB TECHNOLOGIES
http://www.lectnote.blogspot.com
<!ELEMENT OPTIONS (#PCDATA)> <!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte" ADAPTER (Included|Optional|NotApplicable) "Included" CASE (HardShell|Soft|NotApplicable) "HardShell"> <!ELEMENT PRICE (#PCDATA)> <!ATTLIST PRICE MSRP CDATA #IMPLIED WHOLESALE CDATA #IMPLIED STREET CDATA #IMPLIED SHIPPING CDATA #IMPLIED> <!ELEMENT NOTES (#PCDATA)> ]>
64