Sie sind auf Seite 1von 52

Introduction to XML

Updated by
Dr Suthikshn Kumar
Suthikshn.kumar@pes.edu
Contents
 Introduction
 Syntax of XML
 XML Document Structure
 NameSpaces
 XML Schemas
 Displaying Raw XML documents
 Displaying XML Documents with CSS
 XSLT Style Sheets
 XML Processors
 Web Services
 Summary
Intro to XML
 The Extensible Markup Language (XML) is a general-purpose
markup language.
 It is classified as an extensible language because it allows its users to
define their own tags.
 Its primary purpose is to facilitate the sharing of structured data across
different information systems, particularly via the Internet.
 It is used both to encode documents and serialize data.
 In the latter context, it is comparable with other text-based serialization
languages such as JSON and YAML.
 It started as a simplified subset of the Standard Generalized Markup
Language (SGML), and is designed to be relatively human-legible.
 By adding semantic constraints, application languages can be
implemented in XML. These include XHTML, RSS, MathML, GraphML,
Scalable Vector Graphics, MusicXML, and thousands of others.
 Moreover, XML is sometimes used as the specification language for
such application languages.
 XML is recommended by the World Wide Web Consortium. It is a fee-
free open standard. The W3C recommendation specifies both the
lexical grammar, and the requirements for parsing.
Introduction
 What is XML?
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to describe data
• XML tags are not predefined. You must define your
own tags
• XML uses a Document Type Definition (DTD) or an
XML Schema to describe the data
• XML with a DTD or XML Schema is designed to be self-
descriptive
• XML is a W3C Recommendation
 XML is a W3C Recommendation
• The Extensible Markup Language (XML) became a W3C
Recommendation 10. February 1998.
The Main Difference Between
XML and HTML
 XML was designed to carry data.
• XML is not a replacement for HTML.
XML and HTML were designed with different goals:
• XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
• HTML is about displaying information, while XML is about describing information.
 XML Does not DO Anything
 XML was not designed to DO anything.
• Maybe it is a little hard to understand, but XML does not DO anything. XML was
created to structure, store and to send information.
• The following example is a note to Tove from Jani, stored as XML:
• <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body> </note>
• The note has a header and a message body. It also has sender and receiver
information. But still, this XML document does not DO anything. It is just pure
information wrapped in XML tags. Someone must write a piece of software to
send, receive or display it.
 XML is Free and Extensible
 XML tags are not predefined. You must "invent" your own tags.
• The tags used to mark up HTML documents and the structure of HTML
documents are predefined. The author of HTML documents can only use tags
that are defined in the HTML standard (like <p>, <h1>, etc.).
• XML allows the author to define his own tags and his own document structure.
• The tags in the example above (like <to> and <from>) are not defined in any
XML standard. These tags are "invented" by the author of the XML document.
• and data transmission
XML is a Complement to
HTML
 XML is not a replacement for HTML.
• It is important to understand that XML is not a replacement for HTML.
In future Web development it is most likely that XML will be used to
describe the data, while HTML will be used to format and display the
same data.
• My best description of XML is this: XML is a cross-platform, software
and hardware independent tool for transmitting information.
 XML in Future Web Development
 XML is going to be everywhere.
• We have been participating in XML development since its creation. It
has been amazing to see how quickly the XML standard has been
developed and how quickly a large number of software vendors have
adopted the standard.
• We strongly believe that XML will be as important to the future of the
Web as HTML has been to the foundation of the Web and that XML will
be the most common tool for all data manipulation
XML can Separate Data from
HTML
 With XML, your data is stored outside your HTML.
• When HTML is used to display data, the data is stored inside your HTML. With
XML, data can be stored in separate XML files. This way you can concentrate on
using HTML for data layout and display, and be sure that changes in the
underlying data will not require any changes to your HTML.
• XML data can also be stored inside HTML pages as "Data Islands". You can still
concentrate on using HTML only for formatting and displaying the data.
 XML is Used to Exchange Data
 With XML, data can be exchanged between incompatible systems.
• In the real world, computer systems and databases contain data in incompatible
formats. One of the most time-consuming challenges for developers has been to
exchange data between such systems over the Internet.
• Converting the data to XML can greatly reduce this complexity and create data
that can be read by many different types of applications.
 XML and B2B
 With XML, financial information can be exchanged over the Internet.
• Expect to see a lot about XML and B2B (Business To Business) in the near
future.
• XML is going to be the main language for exchanging financial information
between businesses over the Internet. A lot of interesting B2B applications are
under development.
XML Can be Used to Share
Data
 With XML, plain text files can be used to share data.
• Since XML data is stored in plain text format, XML provides a software- and hardware-
independent way of sharing data.
• This makes it much easier to create data that different applications can work with. It also
makes it easier to expand or upgrade a system to new operating systems, servers,
applications, and new browsers.
 XML Can be Used to Store Data
 With XML, plain text files can be used to store data.
• XML can also be used to store data in files or in databases. Applications can be written to
store and retrieve information from the store, and generic applications can be used to display
the data.
 XML Can Make your Data More Useful
 With XML, your data is available to more users.
• Since XML is independent of hardware, software and application, you can make your data
available to other than only standard HTML browsers.
• Other clients and applications can access your XML files as data sources, like they are
accessing databases. Your data can be made available to all kinds of "reading machines"
(agents), and it is easier to make your data available for blind people, or people with other
disabilities.
 XML Can be Used to Create New Languages
 XML is the mother of WAP and WML.
• The Wireless Markup Language (WML), used to markup Internet applications for handheld
devices like mobile phones, is written in XML.
 If Developers Have Sense
 If they DO have sense, all future applications will exchange their data in XML
XML Syntax
 As long as only well-formedness is required, XML is a generic framework for
storing any amount of text or any data whose structure can be represented as a
tree.
 The only indispensable syntactical requirement is that the document has exactly
one root element (alternatively called the document element).
 This means that the text must be enclosed between a root opening tag and a
corresponding closing tag. The following is a well-formed XML document:
 <book>This is a book.... </book>
 The root element can be preceded by an optional XML declaration. This
element states what version of XML is in use (normally 1.0); it may also contain
information about character encoding and external dependencies.
 <?xml version="1.0" encoding="UTF-8"?>
 The specification requires that processors of XML support the pan-Unicode
character encodings UTF-8 and UTF-16 (UTF-32 is not mandatory). The use of
more limited encodings, such as those based on ISO/IEC 8859, is
acknowledged and is widely used and supported.
 Comments can be placed anywhere in the tree, including in the text if the
content of the element is text or #PCDATA:
 <!-- This is a comment. -->
 In any meaningful application, additional markup is used to structure the
contents of the XML document. The text enclosed by the root tags may contain
an arbitrary number of XML elements. The basic syntax for one element is:
 <name attribute="value">content</name>
Example: Recipe for making
bread
<?xml version="1.0" encoding="UTF-8"?>

<recipe name="bread" prep_time="5 mins" cook_time="3 hours">


<title>Basic bread</title>
<ingredient amount="3" unit="cups">Flour</ingredient> <ingredient
amount="0.25" unit="ounce">Yeast</ingredient> <ingredient
amount="1.5" unit="cups" state="warm">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
<instructions>
<step>Mix all ingredients together.</step>
<step>Knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Knead again.</step>
<step>Place in a bread baking tin.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Bake in the oven at 350° for 30 minutes.</step> </instructions>
</recipe>
An Example XML Document

 XML documents use a self-describing and simple syntax.


 <?xml version="1.0" encoding="ISO-8859-1"?>
 <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>
 <body>Don't forget me this weekend!</body> </note>
 The first line in the document - the XML declaration - defines the XML version and the
character encoding used in the document. In this case the document conforms to the 1.0
specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.
 The next line describes the root element of the document (like it was saying: "this document
is a note"):
 <note>The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
 <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me
this weekend!</body>And finally the last line defines the end of the root element:
 </note>Can you detect from this example that the XML document contains a Note to Tove
from Jani? Don't you agree that XML is pretty self-descriptive?
 All XML Elements Must Have a Closing Tag
 With XML, it is illegal to omit the closing tag.
 In HTML some elements do not have to have a closing tag. The following code is legal in
HTML:
 <p>This is a paragraph <p>This is another paragraphIn XML all elements must have a
closing tag, like this:
 <p>This is a paragraph</p> <p>This is another paragraph</p> Note: You might have noticed
from the previous example that the XML declaration did not have a closing tag. This is not an
error. The declaration is not a part of the XML document itself. It is not an XML element, and
it should not have a closing tag.
Problems
1. Create an XML document for a library to store their book
list. The details of the book such as Title, author,
Publisher, Year of publication, Category, Reference
Number should be captured by the XML document.
2. Create an XML documents for a travel agency. The xml
files i.e, flights.xml, hotels.xml, capture the data of
available flights with details about flight no, airlines, no.
of seats, destination, etc. The hotels.xml captures the
data regarding the hotels, categories, rooms, tariff,
address etc.
3. An XML file is being designed to capture the data in
white pages telephone directory. Design the xml to
capture data such as name, phone no., street, address,
city, etc.
4. An XML file is being used for storing the student
information in an University. The information about each
student i.e,, name, usn, semester, branch, college name,
date of birth, contact phone, contact address are to
stored in the xml. Design the xml.
XML Tags are Case Sensitive

 Unlike HTML, XML tags are case sensitive.


• With XML, the tag <Letter> is different from the tag <letter>.
• Opening and closing tags must therefore be written with the same case:
• <Message>This is incorrect</message> <message>This is correct</message>
XML Elements Must be Properly Nested
 Improper nesting of tags makes no sense to XML.
• In HTML some elements can be improperly nested within each other like this:
• <b><i>This text is bold and italic</b></i>In XML all elements must be properly
nested within each other like this:
• <b><i>This text is bold and italic</i></b>
XML Documents Must Have a Root Element
 All XML documents must contain a single tag pair to define a root element.
• All other elements must be within this root element.
• All elements can have sub elements (child elements). Sub elements must be
correctly nested within their parent element:
• <root> <child> <subchild>.....</subchild> </child> </root>
XML Attribute Values Must
be Quoted
 With XML, it is illegal to omit quotation marks around attribute values.
 XML elements can have attributes in name/value pairs just like in HTML. In
XML the attribute value must always be quoted. Study the two XML documents
below. The first one is incorrect, the second is correct:
 <?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002>
<to>Tove</to> <from>Jani</from> </note>
<?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002">
<to>Tove</to> <from>Jani</from> </note>
The error in the first document is that the date attribute in the note element is
not quoted. This is correct: date="12/11/2002". This is incorrect:
date=12/11/2002.
 With XML, White Space is Preserved
 With XML, the white space in your document is not truncated.
 This is unlike HTML. With HTML, a sentence like this:
 Hello my name is Tove,
 will be displayed like this:
 Hello my name is Tove,
 because HTML reduces multiple, consecutive white space characters to a
single white space.
Comments in XML

 The syntax for writing comments in XML is similar to that of


HTML.
 <!-- This is a comment -->
 There is Nothing Special About XML
 There is nothing special about XML. It is just plain text with
the addition of some XML tags enclosed in angle brackets.
 Software that can handle plain text can also handle XML. In
a simple text editor, the XML tags will be visible and will not
be handled specially.
 In an XML-aware application however, the XML tags can be
handled specially. The tags may or may not be visible, or
have a functional meaning, depending on the nature of the
application.
XML Elements are Extensible
 XML documents can be extended to carry more
information.
• Look at the following XML NOTE example:
• <note> <to>Tove</to> <from>Jani</from> <body>Don't forget
me this weekend!</body> </note>Let's imagine that we created
an application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
• MESSAGE To: Tove
From: Jani
• Don't forget me this weekend!Imagine that the author of the
XML document added some extra information to it:
• <note> <date>2002-08-01</date> <to>Tove</to>
<from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body> </note>Should
the application break or crash?
• No. The application should still be able to find the <to>, <from>,
and <body> elements in the XML document and produce the
same output.

XML Elements have
Relationships
 Elements are related as parents and children.
 To understand XML terminology, you have to know how relationships between XML elements
are named, and how element content is described.
 Imagine that this is a description of a book:
• My First XMLIntroduction to XML
• What is HTML
• What is XML
• XML Syntax
• Elements must have a closing tag
• Elements must be properly nested
 Imagine that this XML document describes the book:
 <book>
 <title>My First XML</title>
 <prod id="33-657" media="paper"></prod>
 <chapter>Introduction to XML
 <para>What is HTML</para>
 <para>What is XML</para>
 </chapter>
 <chapter>XML Syntax
 <para>Elements must have a closing tag</para>
 <para>Elements must be properly nested</para>
 </chapter>
 </book>
 Book is the root element. Title, prod, and chapter are child elements of book. Book is the
parent element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister
elements) because they have the same parent.
Elements have Content
 Elements can have different content types.
• An XML element is everything from (including) the element's
start tag to (including) the element's end tag.
• An element can have element content, mixed content, simple
content, or empty content. An element can also have
attributes.
• In the example above, book has element content, because it
contains other elements. Chapter has mixed content because
it contains both text and other elements. Para has simple
content (or text content) because it contains only text. Prod
has empty content, because it carries no information.
• In the example above only the prod element has attributes.
The attribute named id has the value "33-657". The attribute
named media has the value "paper".
 Element Naming
 XML elements must follow these naming rules:
• Names can contain letters, numbers, and other characters
• Names must not start with a number or punctuation character
• Names must not start with the letters xml (or XML, or Xml, etc)
• Names cannot contain spaces
XML Attributes
 XML elements can have attributes.
• From HTML you will remember this: <IMG
SRC="computer.gif">. The SRC attribute provides additional
information about the IMG element.
• In HTML (and in XML) attributes provide additional information
about elements:
• <img src="computer.gif"> <a href="demo.asp">Attributes often
provide information that is not a part of the data. In the example
below, the file type is irrelevant to the data, but important to the
software that wants to manipulate the element:
• <file type="gif">computer.gif</file>

• f the attribute value itself contains double quotes it is necessary


to use single quotes, like in this example:
• <gangster name='George "Shotgun" Ziegler'>Note: If the
attribute value itself contains single quotes it is necessary to
use double quotes, like in this example:
• <gangster name="George 'Shotgun' Ziegler">
Use of Elements vs. Attributes
Well Formed XML Documents
 A "Well Formed" XML document has correct XML syntax.
• A "Well Formed" XML document is a document that conforms to
the XML syntax rules that were described in the previous
chapters:
• XML documents must have a root element
• XML elements must have a closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must always be quoted
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <note>
 <to>Tove</to>
 <from>Jani</from>
 <heading>Reminder</heading>
 <body>Don't forget me this weekend!</body>
 </note>
XML Document Structure
 Two auxilary files:
• DTD: file that specifies its tag set and structural and
syntactic rules
• Style sheet to describe how the content of the document
is to be printed or displayed.
 An XML document consists of one or more entities that
are logically related collections of information, ranging
from a single char to a book chapter.
 Reasons for breaking a document into entities:
• It is good to define a large document as a number of
smaller parts so to make it easy to manage.
• If the same data appears in more than once place in the
document, defining it as an entity allows any number of
references to a single copy of data.
• Many documents include images. It must be separate
entity called binary entity.
XML Document Structure
 When an XML processor encounters the name of a nonbinary
entity in a document, it replaces the name with the value it
references.
 Binary entities can be handled only by applications that deal
with the document, such as browsers.
 Entity names can have any length. They must begin with a
letter, a dash or a colon.
 A reference to entity is its name with a prepended
ampersand and an appended semicolon. If apple_image is
the name of an entity, &apple_image; is to reference it.
 When several predefined entities must appear near each other
in an XML doc, their references clutter the content and make it
difficult to read.
 In such cases character data section can be used.
 The content of character data section is not parsed by the XML
so it cannot include any tags.
 <![CDATA[ content ]]>
Document Type Definitions.
 A DTD is set of structural rules called:
• Declarations which specify a set of elements that can appear in
the document as well as how and where these elements may
appear.
 Use of a DTD is related to the use of an external style sheet for
XHTML.
 DTDs are used when the same tag set definition is used by
collection of documents, perhaps by collection of users, and the
collection must have a consistent and uniform structure.
 The purpose of DTD is to define a standard form for a
collection of XML documents.
 This form is specified as the tag and attributes sets, as well as
the rules that define how they can appear in a document.
 All documents in the collection can be tested against the DTD
to determine whether they conform to the rules it describes.
 A DTD :
• Internal DTD if it can be embedded in the XML doc
• External DTD if it is stored as a separate file, preferable
DTD syntax
 DTD is a sequence of declarations enclosed
in the block of a DOCTYPE markup
declaration.
 Each declaration within this block has the form
of a markup declaration:
 <!keyword …>
 Four possible key words can be used:
• ELEMENT used to define tags
• ATTLIST used to define tag attributes
• ENTITY used to define entities
• NOTATION used to define data type notation
Elements
 The declarations of a DTD have a form that is related to that of the
rules of context free grammars, also known as Backus-Naur Form
(BNF)
 Each element declaration in a DTD specifies the structure of one
category of elements.
 The declaration provides the name of the element whose structure is
being defined along with the specification of the structure of that
element.
 <!ELEMENT element_name ( list of names of child elements )>
 Example: <!ELEMENT memo ( from, to, date, re, body)>
 This element describes a tree structure
 Child element specification modifiers are barrowed from regular
expressions. ( + : one or more occurences, *: zero or more
occurences, ?: zero or one occurrence ).
 Ex: <!ELEMENT person (parent+, age, spouse?, sibling*)>
 The leaf nodes of a DTD specify the data types of the contecnt of their
parent nodes .
• PCDATA for parsable character data
• EMPTY when element has no content
• ANY when element may contain any content
 The form of the leaf element is as follows:
• <!ELEMENT element-name (#PCDATA)>
Attributes
 The attributes of an element are declared separately from the element
declarations in a DTD
 An attribute declaration must include the name of the element to which
the attribute belongs, the attribute’s name and it’s type.
 The general form of an attribute declaration is as follows:
 <!ATTLIST element_name attribute_name attribute_type
[default_value]>
 If more than one attribute is declared for a given element, the
declarations can be combined.
 The common type for attributes is CDATA for any string characters
 Default values for attributes are: any value, #FIXED ( cannot be
changed ), #REQUIRED( every instance of the element specify a
value), #IMPLIED ( The value may or may not be specified in an
element)
• <!ATTLIST airplane places CDATA “4”>
• <!ATTLIST airplane engine_type CDATA #REQUIRED>
• <!ATTLIST airplane price CDATA #IMPLIED>
• <!ATTLIST airplane manufacturer CDATA #FIXED “cessna”>
 The following element is valid of this DTD
• <airplane places = “10” engine_type = “jet”></airplane>
Entities
 Entities can be defined so that they can be referenced
anywhere in the content of an XML document, in which
case they are called general entities.
 The form of an entity declaration :
 <!ENTITY [%] entity_name “entity_value”>
 % specifies the entity is a parameter entity ( entities
defined to be referenced only in markup declarations)
 Ex: <!ENTITY jfk “John Fitzegerald Kennedy”>
 Any XML document that uses the DTD includes this
declaration can specify the complete name with just the
reference &jfk;
Sample planes.dtd
 Consider a booklet of ads for used aircraft.
<?xml version=“1.0” encoding = “utf-8”?>
<!– planes.dtd 
<!ELEMENT planes_for_sale (ad+)>
<!ELEMENT ad ( year, make, model, color, description, price?, seller,
location)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT make (#PCDTATA)>
<!ELEMENT model (#PCDATA)>
<!ELEMENT color (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT seller (#PCDATA)>
<!ELEMENT location (city, state)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>

<!ATTLIST seller phone CDATA #REQUIRED>


<!ATTLIST seller email CDATA #IMPLIED>

<!ENTITY c “Cessna”>
<!ENTITY p “Piper”>
<!ENTITY b “Beechcraft”>
Planes.xml
<?xml version = “1.0” encoding = “utf-8”?>
<!– planes.xml 
<!DOCTYPE planes_for_sale SYSTEM “planes.dtd”>
<planes_for_sale>
<ad>
<year> 1977</year>
<make> &c; </make>
<model> skyhawk </model>
<color> Light Blue and White </color>
<description> 685 hours, full IFR… </description>
<price> 23,495</price>
<seller phone = “555-222-3333”> skyway Aircraft </seller>
<location>
<city> Rapid city </city>
<state> South Dakota </state>
</location>
</ad>
</planes_for_sale>
DTD
 Document Type Definition (DTD), defined slightly differently within
the XML and SGML (the language XML was derived from)
specifications, is one of several SGML and XML schema languages,
and is also the term used to describe a document or portion thereof
that is authored in the DTD language.
 A DTD is primarily used for the expression of a schema via a set of
declarations that conform to a particular markup syntax and that
describe a class, or type, of SGML or XML documents, in terms of
constraints on the structure of those documents.
 As an expression of a schema, a DTD specifies, in effect, the syntax of
an "application" of SGML or XML, such as the derivative language
HTML or XHTML. This syntax is usually a less general form of the
syntax of SGML or XML.
 In a DTD, the structure of a class of documents is described via
element and attribute-list declarations.
 Element declarations name the allowable set of elements within the
document, and specify whether and how declared elements and runs
of character data may be contained within each element.
 Attribute-list declarations name the allowable set of attributes for each
declared element, including the type of each attribute value, if not an
explicit set of valid value(s).
Associating DTDs with
documents
 A DTD is associated with an XML document via a Document Type Declaration,
which is a tag that appears near the start of the XML document. The declaration
establishes that the document is an instance of the type defined by the
referenced DTD.
 The declarations in a DTD are divided into an internal subset and an external
subset. The declarations in the internal subset are embedded in the Document
Type Declaration in the document itself. The declarations in the external subset
are located in a separate text file. The external subset may be referenced via a
public identifier and/or a system identifier. Programs for reading documents may
not be required to read the external subset.
 Examples
 Here is an example of a Document Type Declaration containing both public and
system identifiers:
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 Here is an example of a Document Type Declaration that encapsulates an
internal subset consisting of a single entity declaration:
 <!DOCTYPE foo [ <!ENTITY greeting "hello"> ]> <!DOCTYPE bar [ <!ENTITY
greeting "hello"> ]>
An XML DTD example
 An example of an XML file which makes use of and
conforms to this DTD follows. It assumes the DTD is
identifiable by the relative URI reference "example.dtd":
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE
people_list SYSTEM "example.dtd">
<people_list>
<person>
<name>Fred Bloggs</name>
<birthdate>27/11/2008</birthdate>
<gender>Male</gender>
</person>
</people_list>
DTD problems:
 Create a DTD for a catalog of cars where each car has the child
elements make, model, year, color, engine, number_of_doors,
transmission_type and accessories. The engine element has the child
elements number_of_cylinders and fuel_system( carburatted or fuel
injected ). The accessories element has the attributes radio,
air_conditioning, power_windows, power_steering and power_brakes,
each of which is required and has the possible values yes and no. Entities
must be declared for the names of popular car makes.
 Create an XML document with atleast three instances of the car element
defined in the DTD of above. Process this document using the DTD and
produce a display of the raw XML document.
 Design an XML document to store information about patients in a
hospital. Information about patients must include name ( in three parts),
social security number, age, room number, primary insurance company-
including member id number, group number, phone number, and
address--- secondary insurance company ( in the same sub parts as for
the primary insurance company), known medical problems, and know
drug allergies. Both attributes and nested tags must be included. Make up
sample data for at least four patients.
 Write a DTD for the document described above. With the following
restrictions: the name, social security number, age, room number, and
primary insurance company are required. All the other elements are
optional, as are middle names.
Valid XML Documents

 A "Valid" XML document also conforms to a DTD.


 A "Valid" XML document is a "Well Formed" XML document, which
also conforms to the rules of a Document Type Definition (DTD):
 <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note
SYSTEM "InternalNote.dtd"> <note> <to>Tove</to>
<from>Jani</from> <heading>Reminder</heading> <body>Don't
forget me this weekend!</body> </note>
XML DTD
 A DTD defines the legal elements of an XML document.
 The purpose of a DTD is to define the legal building blocks of an XML
document. It defines the document structure with a list of legal
elements.
 XML Schema
 XML Schema is an XML based alternative to DTD.
 W3C supports an alternative to DTD called XML Schema. A General
XML Validator
XML Errors will Stop you
 Errors in XML documents will stop your XML program.
• The W3C XML specification states that a program should not continue to
process an XML document if it finds an error. The reason is that XML software
should be easy to write, and that all XML documents should be compatible.
• With HTML it was possible to create documents with lots of errors (like when you
forget an end tag). One of the main reasons that HTML browsers are so big and
incompatible, is that they have their own ways to figure out what a document
should look like when they encounter an HTML error.
• With XML this should not be possible.
 Syntax-check your XML - IE Only
• To help you syntax-check your xml, we have used Microsoft's XML parser to
create an XML validator.
• Paste your XML in the text area below, and syntax-check it by pressing the
"Validate" button.

Syntax-check your XML File - IE Only
• You can also syntax-check your XML file by typing the URL of your file into the
input field below, and then press the "Validate" button
• Filename:
If you want to syntax-check an error-free XML file, you can paste the following
address into the filename field: http://www.w3schools.com/xml/cd_catalog.xml
• Note: If you get the error "Access denied" when accessing this file, it is because
your Internet Explorer security settings do not allow access across domains!
Namespaces
 It is often convenient to construct XML documents that
include tag sets that are defined for and used by other
documents.
 When a tag set is available and appropriate, it is better to
use it than to reinvent a new collection of element types.
 An XML Namespace is a collection of element names
used in XML documents.
 The name os a namespace usually has the form of a
Uniform Resource Identifier (URI)
 A namespace for the elements of the hierarchy rooted at
a particular element is declared as the value of the
attribute xmlns.
 < element_name xmlns[:prefix] = URI >
 Ex: <birds xmlns:bd =
http://www.abundon.org/names/species>
 Within the birds element, including all of its children
elements, the names from the namespace must be
prefixed with bd, as :
 <bd:lark>
Example for NameSpace
usage
<states>
Xmlns = “http://www.states-info.org/states”
Xmlns:cap = “http://www.states-info.org/state-capitals”
<state>
<name> south Dakota </name>
<population> 754844 </population>
<capital>
<cap:name> Pierre </cap:name>
<cap:population> 12429 </cap:population>
</capital>
</state>
<!– more states -- ->
</states>
XML Schemas
 DTDs have several disadvantages :
• DTDs are written in a syntax unrelated to XML, so they cannot
be analyzed with an XML processor.
• It can be confusing to deal with two different syntactic forms,
one to define a document and one to define its structure.
• DTDs do not allow restrictions on the form of data that can be
content of a particular tag.
 XML Schema by W3C is one of the alternatives for DTD.
 XML Schema is an XML document, so it can be parsed with an
XML parser.
 It also provides far more control over data types than do DTDs.
 Data in a specific element can be required to be of any one of
44 different data types.
 The user can even define new types with constraints on
existing data types.
 To promote the transition from DTDs to XML schemas, XML
schema was designed to allow any DTD to be automatically
converted to an equivalent XML schema.
Schema Fundas
 Schemas can conveniently be related to the idea of a
class and an object in an object-oriented programming
language.
 A schema is similar to a class definition; an XML
document that conforms to the structure defined in the
schema is similar to an object of the schema’s class.
 XML documents that conform to a specific schema are
considered instances of that schema.
 Schemas have primary purposes:
• Schema specifies the structure of its instance
documents, including which elements and attributes may
appear as well as where and how often they may
appear.
• Schema specifies data type of every element and
attribute of its instance XML documents. ( this is where
schemas outshine DTDs).
Example Schema
<?xml version = “1.0” encoding = “utf-8”?>
<!– planes.xsd – a simple schema for planes.xml -- >
<xsd:schema
xmlns:xsd = http://www.w3.org/2001/XMLSchema
targetNamespace = “http:/cs.uccs.edu/planeSchema”
xmlns = http://cs.uccs.edu/planeSchema
elementFormDefault = “qualified”>
<xsd:element name = “planes”>
<xsd:complexType>
<xsd:all>
<xsd:element name = “make”
type = “xsd:string”
minOccurs = “1”
maxOccurs = “unbounded” />
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Planes.xml
<?xml version = “1.0” encoding = “utf-8”?>
<!– planes.xml -- >
<planes
xmlns = “http://cs.uccs.edu/planeSchema”
xmlns:xsi = “http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation = “http://cs.uccs.edu/planeSchema/planes.xsd”>
<make> Cessna </make>
<make> Piper </make>
<make> Beechcraft </make>
</planes>
Validating Instances of
Schemas
 Determining whether a given XML
instance document conforms to the
schema.
 XSV: XML Schema Validator from
University of Edinburgh.
 The xsv can be used online.
 If the schema is not in the correct
format, the validator will report that it
could not find the specified schema.
Browser Support
 Mozilla Firefox
• As of version 1.0.2, Firefox has support for XML and XSLT (and CSS).
 Mozilla
• Mozilla includes Expat for XML parsing and has support to display XML
+ CSS. Mozilla also has some support for Namespaces.
• Mozilla is available with an XSLT implementation.
 Netscape
• As of version 8, Netscape uses the Mozilla engine, and therefore it has
the same XML / XSLT support as Mozilla.
 Opera
• As of version 9, Opera has support for XML and XSLT (and CSS).
Version 8 supports only XML + CSS.
 Internet Explorer
• As of version 6, Internet Explorer supports XML, Namespaces, CSS,
XSLT, and XPath.
 Note: Internet Explorer 5 also has XML support, but the XSL part is
NOT compatible with the official W3C XSL Recommendation!
Viewing XML Files
 In Firefox and Internet Explorer:
 Open the XML file (typically by clicking on a link) - The XML document will be displayed with
color-coded root and child elements. A plus (+) or minus sign (-) to the left of the elements
can be clicked to expand or collapse the element structure. To view the raw XML source
(without the + and - signs), select "View Page Source" or "View Source" from the browser
menu.
 In Netscape 6:
 Open the XML file, then right-click in XML file and select "View Page Source". The XML
document will then be displayed with color-coded root and child elements.
 In Opera 7 and 8:
 In Opera 7: Open the XML file, then right-click in XML file and select "Frame" / "View Source".
The XML document will be displayed as plain text. In Opera 8: Open the XML file, then right-
click in XML file and select "Source". The XML document will be displayed as plain text.
 Look at this XML file: note.xml
 Note: Do not expect XML files to be formatted like HTML documents!
 Viewing an Invalid XML File
 If an erroneous XML file is opened, the browser will report the error.

 XML documents do not carry information about how to display the data.
 Since XML tags are "invented" by the author of the XML document, browsers do not know if a
tag like <table> describes an HTML table or a dining table.
 Without any information about how to display the data, most browsers will just display the
XML document as it is.
Displaying XML documents
with CSS
 CSS file that has style info for the elements in XML doc can be
developed
 The other way is to use th XSLT style sheet technology
 XSLT provides far more power over the appearance of the
documents display.
 XSLT is not supported by all the browsers.
 The form of a css style sheet for an XML document is simple.
 It is just the list element names, each followed by a brace-
delimited set of element’s CSS attributes.
 Planes.css

<!– planes.css 
Ad { display : block; margin-top: 15px; color: blue;}
Year, make, model { color: red; font-size: 16pt}

 Using in an XML
 <?xml-stylesheet type = “text/css” href = “planes.css” >
Displaying your XML Files
with CSS?
 It is possible to use CSS to format an XML document.
 Below is an example of how to use a CSS style sheet to format an XML document:
 Take a look at this XML file: The CD catalog
 Then look at this style sheet: The CSS file
 Finally, view: The CD catalog formatted with the CSS file
 Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/css"
href="cd_catalog.css"?>, links the XML file to the CSS file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="cd_catalog.css"?>
<CATALOG> <CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR> </CD>
<CD> <TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR> </CD> . . . .
</CATALOG>
 Note: Formatting XML with CSS is NOT the future of how to style XML documents. XML
document should be styled by using the W3C's XSL standard!
Displaying XML with XSL

 XSL is the preferred style sheet language of XML.


 XSL (the eXtensible Stylesheet Language) is far more sophisticated
than CSS. One way to use XSL is to transform XML into HTML before
it is displayed by the browser as demonstrated in these examples:

 Below is a fraction of the XML file. The second line, <?xml-stylesheet


type="text/xsl" href="simple.xsl"?>, links the XML file to the XSL
file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="simple.xsl"?> <breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description> two of our famous Belgian Waffles </description>
<calories>650</calories>
</food>
</breakfast_menu>
XML Data Embedded in
HTML
 An XML data island is XML data embedded into an HTML page.
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't
forget me this weekend!</body> </note>
 Then, in an HTML document, you can embed the XML file above with the <xml> tag. The id
attribute of the <xml> tag defines an ID for the data island, and the src attribute points to the
XML file to embed:
 <html> <body><xml id="note" src="note.xml"></xml></body> </html>
 However, the embedded XML data is, up to this point, not visible for the user.
 The next step is to format and display the data in the data island by binding it to HTML
elements.
 Bind Data Island to HTML Elements
 In the next example, we will embed an XML file called "cd_catalog.xml" into an HTML file.
 View "cd_catalog.xml".
 The HTML file looks like this:
 <html> <body> <xml id="cdcat" src="cd_catalog.xml"></xml> <table border="1"
datasrc="#cdcat"> <tr> <td><span datafld="ARTIST"></span></td> <td><span
datafld="TITLE"></span></td> </tr> </table> </body> </html>
 Example explained:
 The datasrc attribute of the <table> tag binds the HTML table element to the XML data island.
The datasrc attribute refers to the id attribute of the data island.
 <td> tags cannot be bound to data, so we are using <span> tags. The <span> tag allows the
datafld attribute to refer to the XML element to be displayed. In this case, it is
datafld="ARTIST" for the <ARTIST> element and datafld="TITLE" for the <TITLE> element in
the XML file. As the XML is read, additional rows are created for each <CD> element
Example: XML News

 XMLNews is a specification for exchanging news and other


information.
 Using such a standard makes it easier for both news producers and
news consumers to produce, receive, and archive any kind of news
information across different hardware, software, and programming
languages.
 An example XMLNews document:

<?xml version="1.0" encoding="ISO-8859-1"?>


<nitf> <head> <title>Colombia Earthquake</title> </head> <body>
<headline>
<hl1>143 Dead in Colombia Earthquake</hl1> </headline> <byline>
<bytag>By Jared Kotler, Associated Press Writer</bytag> </byline>
<dateline> <location>Bogota, Colombia</location>
<date>Monday January 25 1999 7:28 ET</date> </dateline> </body>
</nitf>
Parsing XML Documents
 To manipulate an XML document, you need an XML parser.
 The parser loads the document into your computer's memory.
 Once the document is loaded, its data can be manipulated using the DOM.
 The DOM treats the XML document as a tree.

 .
 Microsoft's XML Parser
 Microsoft's XML parser is a COM component that comes with Internet Explorer 5 and higher.
 Once you have installed Internet Explorer, the parser is available to scripts.
 Microsoft's XML parser supports all the necessary functions to traverse the node tree, access
the nodes and their attribute values, insert and delete nodes, and convert the node tree back
to XML.
 To create an instance of Microsoft's XML parser, use the following code:
 JavaScript:
 var xmlDoc=new ActiveXObject("Microsoft.XMLDOM");VBScript:
 set xmlDoc=CreateObject("Microsoft.XMLDOM")ASP:
 set xmlDoc=Server.CreateObject("Microsoft.XMLDOM")The following code fragment loads an
existing XML document ("note.xml") into Microsoft's XML parser:
 var xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false";
xmlDoc.load("note.xml");The first line of the script above creates an instance of the XML
parser. The second line turns off asynchronized loading, to make sure that the parser will not
continue execution of the script before the document is fully loaded. The third line tells the
parser to load an XML document called "note.xml".
XML DTD Example
 a very simple XML DTD to describe a list of persons is given below:
 <!ELEMENT people_list (person*)> <!ELEMENT person (name, birthdate?,
gender?, socialsecuritynumber?)> <!ELEMENT name (#PCDATA)>
<!ELEMENT birthdate (#PCDATA)> <!ELEMENT gender (#PCDATA)>
<!ELEMENT socialsecuritynumber (#PCDATA)>
Taking this line by line, it says:
 people_list is a valid element name, and an instance of such an element
contains any number of person elements. The * denotes there can be 0 or more
person elements within the people_list element.
 person is a valid element name, and an instance of such an element contains
one element named name, followed by one named birthdate (optional), then
gender (also optional) and socialsecuritynumber (also optional). The ? indicates
that an element is optional. The reference to the name element name has no ?,
so a person element must contain a name element.
 name is a valid element name, and an instance of such an element contains
parseable character data (#PCDATA).
 birthdate is a valid element name, and an instance of such an element contains
character data.
 gender is a valid element name, and an instance of such an element contains
character data.
 socialsecuritynumber is a valid element name, and an instance of such an
element contains character data.

Das könnte Ihnen auch gefallen