Sie sind auf Seite 1von 50

XML

Basics
Topics
• Basic XML document structure and
components
• Well-formed XML
• DTD
• DTD components
• Creating DTD and linking with XML
document
• XML parsers
• Valid XML document
XML document structure
<?xml version=“1.0”?> Processing instruction

<?noisemaker noise=“sound.wav”?>
<note> Root element
Attribute
<to style=“bold”>Harry Potter </to>
<from>Ron</from> Element
<heading>Reminder</heading>
<horizontal_line/> Empty Element
<body>Please, get your magic
wand.</body>
<!–letter format--> Comment
&quot; Entity Reference
</note>
Special markup characters
•< • Use &lt; for <
•> • Use &gt; for >
•& • Use &amp; for &
•‘ • Use &apos; for ‘
•“ • Use &quot for “
Redefining XML
•An XML document is an information
unit that can be viewed in two ways:
• as a linear sequence of characters
that contain character data or
markup or entity references
• or as an abstract data structure that
is a tree of nodes.
Well-formed Constraints
• All XML elements must nest correctly.
• XML tags are case sensitive. The case of the
start tag and and its corresponding end tag
must match.
• All XML elements must be properly nested
• All XML documents must have one and only
one Root element
• All the elements (other than the root)must
have one and one parent.
Well-formed Constraints
• Attribute values must always be quoted
• Empty tags must end with a ‘/’.

• An XML document that confirms to the


above rules is called a “Well formed”
XML document
Well-formed XML defined
Well-formed XML data conforms to the
XML syntax specification, and includes no
references to external resources (unless
a DTD is provided). It is comprised of
elements that form a hierarchical tree,
with a single root node (the document
element).
Example: XHTML
<html>
<head>
<title> Well-formed document:
XHTML document
</title>
</head>
<body>
<p>This is a well-formed HTML
document
<br/>and valid XHTML document.
</p>
<hr/>
</body></html>
Tool to check well-formed
XHTML document
• HTML Tidy utility by W3C can be used to test
for well-formed XHTML.
• It can also be used to convert HTML
document to XHTML document.
• Open source software.
• Command:
• tidy –xml –e filename.xhtml
Activity
• Test the ill-formed XHTML document using
tidy and note down the errors.
More about the XML names
• XML names are names given for elements
and attributes
• All XML names must begin with a letter or ‘_’
or :.
• Letter could be any alphabets in english or
any language supported by UNICODE.
• Only restriction is that it cannot be ‘XML’ or
‘xml’ or mix of case in the string ‘xml’.
Patient Xml_Tag
DOCTOR Legal -Name Illegal
Doctor:Patient 12Street
Element
• Basic building block of the XML document
• May have
•Character data •Entity references
•Attributes •Comments
•Character references •PIs
•CDATA section
• The root element is also called the Document
Element.
Attributes
• Attributes are used to attach the
information about the element.
• Attribute is a name-value pair
• Attribute values can be any text, entity
reference or character reference.
• Attribute values cannot contain special
characters.
• Only one instance of attribute name is
allowed.
Character references
• Characters that cannot be typed into a
document straight away but must be displayed,
can be represented as character references.
• Example: copy right symbol: ©, ®
• <special> &#169; Scicom
Infrastructure Pvt (India)
Ltd</special>
• Used for representing a single character.
• It is comprised of a decimal or hexa-decimal
number between ‘&#’ and ‘;’
Entity references
• 5 built in entity references &lt; &gt;
etc.
• Apart from these 5 entities, number of other
entity references are also defined like &copy;
&nbsp; etc.
CDATA section
• Character data that you don’t want to be
parsed can be kept in CDATA section.

<code>
<![CDATA[
if(a>b && a<10) doThis();
]]>
</code>
Comment and PI
• Comment can be given between <!-- and -->
• Example: <!-- this is a comment -->
• Processing instruction is used to pass some
hints/files to the application along with the xml
document.
• PI is given between two ‘?’
• Example:
• <? xml-stylesheet
href=“mystyle.css” type=“text/css”
?>
DTD
• Document Type Definition
• The DTD defines the structure of the xml
document and how content is nested.
• An XML document is valid only when it is
well-formed and confirms to the DTD (or XML
schema) defined for it.
• DTD defines the grammar rules for forming
an XML document.
XML Parser
• XML parsers/processors are which check if the
XML document is well-formed parser) or valid
• Non-validating parser: ensure that the XML
document is well-formed.
• Validating parser: ensure that the XML
document is
• Well-formed
• Valid
• Resolves external resources
xml parser

return valid/invalid
xml doc document

xml doc
xml application
XML Parser available
• Apache • Oracle
• Xerces-C(C++) • XML Parser for
• Xerces-J(Java) Java
• XML parser for C
• IBM and C++
• IBM 4C(C++)
• IBM4J (Java) • Sun
• JAXP and JAXB
• Microsoft API
• MSXML
• IE
Ways of DTD with XML
• Internal
• Including DTD in the same file as XML file
• External
• Creating another file for DTD and linking it
with XML file

• If both are provided, then if there are similar


declarations, internal DTD takes preference.
Linking external DTD with XML
• 2 ways to associate DTD with XML
1.<!DOCTYPE root SYSTEM location>
• SYSTEM is used to explicitly specify the
location of the DTD.
• Example: <!DOCTYPE note SYSTEM
“file:///c:/x.dtd” >

2.<!DOCTYPE root PUBLIC identifier


location>
• PUBLIC is used if the DTD is a
standard and is shared by many
organizations
Linking external DTD with XML
• The identifier is a name mapped to the
actual location of DTD from where
everybody shares the dtd.
• If the dtd is not accessible or available then
like SYSTEM command dtd is obtained
from location.
• Example
<!DOCTYPE web-app PUBLIC "-//Sun
Microsystems, Inc.//DTD Web
Application 2.3//EN“
"http://java.sun.com/dtd/web-
app_2_3.dtd">
Basic DTD declarations
• <!ELEMENT ...>
• <!ATTLIST ...>
• <!ENTITY ...>
• <!NOTATION ...>
ELEMENT Declaration
1. Text Only:
• Specifies that this element can contain
content that is text
• <!ELEMENT name (#PCDATA)>

• Example: <!ELEMENT fname


(#PCDATA)>
• VALID XML : <fname>ravi</fname>
• INVALID XML:
<fname><nickname>ravi</nickname
></fname>
ELEMENT Declaration
1. Element Only:
• Specifies that this element can contain
elements as specified by the tag
• <!ELEMENT name (child1) >

• Example: <!ELEMENT custname


(fname)>
• VALID XML :
<custname><fname>ravi</fname>
</custname>
• INVALID XML:
<custname>ravinath</custname>
Order of elements
• Sequence list:
• ‘,’ separated list
• The child elements must appear in the
specified order

• Example: <!ELEMENT custname


(fname,lname)>
• VALID XML :
<custname><fname>ravi</fname>
<lname>nath</lname></custname>
• INVALID XML:
<custname><lname>ravi</lname>
<fname>nath</fname></custname>
Order of elements
• Choice list:
• ‘|’ separated list
• The child elements can appear any order

• Example:
<!ELEMENT custname (fname | lname)>
• VALID XML :
<custname><lname>ravi</lname>
<fname>nath</fname></custname>

• INVALID XML:
<custname><lname>ravi</lname>
</custname>
Element Declaration
1. Mixed content:
• Specifies that this element can contain mixture of
elements and text as specified
• <!ELEMENT name Or |
#PCDATA,child1,child2)>

• Example: <!ELEMENT address #PCDATA,


city|state)>
• Valid XML: <address>Mr. Mohan Lal
172 Veera Apts., MG Road
<city>Cochin</city>
<state>Kerala</state>
</address>
Element Declaration
1. Anything:
• Specifies that this element can contain
any well-formed xml data
• <!ELEMENT name ANY>
• Example: <!ELEMENT address ANY>
• Valid XML:
• <address>
Mr. Mohan Lal
172 Veera Apts., MG Road
<city>Cochin</city>
<state>Kerala</state>
</address>
• <address/>
Element Declaration
1. Empty:
• Specifies that the element will be empty.
That is it cannot contain anything
• <!ELEMENT name EMPTY>
• Example: <!ELEMENT leftmargin
EMPTY>
• Valid XML:
• <leftmargin/>
• <leftmargin></leftmargin>
• Invalid XML:
• <leftmargin>xxx</leftmargin>
Element Declaration
• Cardinality
• none: the absence of cardinality indicates
one and only one
• *  0 or more
• + 1 or more
• ? 0 or 1

• Example:<!ELEMENT letter
(to+,subject?,body,from)>
<?xml version="1.0" ?>
<!DOCTYPE priceList [
Example:
<!ELEMENT priceList (coffee)+> Embedded
<!ELEMENT coffee (name, price) > DTD with
<!ELEMENT name (#PCDATA)>
XML
<!ELEMENT price (#PCDATA)> ]>
<priceList>
<coffee>
<name>Mocha Java</name>
<price>11.95</price>
</coffee>
</priceList>
Validating using XMLSpy
• We will use XML Spy 2.5 trial version for
DTD-Validation.
• XML Spy is Microsoft XML Parser.
Activity
• Open a valid XML file with XMLSpy.
• Try adding an invalid element.
Attribute Declaration
• <!ATTLIST elementName
attrName type attDefault value >
• elementName ,attrName are compulsory
• type type specifies the type of the value the
attribute can hold
• attDefault specifies whether an attributes
presence is required or not. Also says how the
parser must handle the attributes absence.
• Value specifies the default value
Attribute Types
• CDATA: text data (string)
• ID: valid xml name which is unique for each
identifier for each instance of the current element.
• IDREF: a reference to the ID type
• IDREFS: List of IDREFs separated by comma
• NMTOKEN: text data that can contain the
characters limited to letters, digits, underscores,
colons, periods and dashes
• NMTOKENS: a comma-separated list of NMTOKEN
items
Attribute Types
• ENTITY: name of the predefined entity.
• ENTITIES: a list of ENTITY names
separated by white space chars
• NOTATION: used to map a reference to a
notation-type declaration section that is
declared else where in the DTD.
• Enumerated value/attribute choice list: a list
of values that attribute value can have
Attributes Default
• #REQUIRED: the attribute is required.
• #IMPLIED: the attribute is optional.
• #FIXED with default value: the
attribute must always have the default value
• Default value: the attribute is optional. If it is
present, then it should be of the attribute type.
If it is not present then parser will take the
default value.
Example1
<?xml version="1.0" ?>
<!DOCTYPE bookreview [
<!ELEMENT bookreview (review)+>
<!ELEMENT review (#PCDATA)>
<!ATTLIST review
title CDATA #REQUIRED
ISBN ID #REQUIRED
Similar IDREF #IMPLIED
libno NMTOKEN #IMPLIED
authors NMTOKENS #REQUIRED
hardbound (YES|NO) "NO"
source CDATA #FIXED "BOOK"
> ]>
<bookreview>
<review title="The 7 habits of highly
effective people" ISBN="o-7434-
0885-3" libno="a1" authors="Stephen
R.Covey">
Powerful lessons in personal change
</review>
<review title="Failing Forward"
ISBN="o-81-7809-077-5" libno="a1"
Similar="o-7434-0885-3" authors="John
C. Maxwell" hardbound="YES">
Turning mistakes into stepping stones
for success
</review>
</bookreview>
Notation
• Notation is used to map a reference to a
notation-type declaration section that is
declared else where in the DTD.
• The notation is more important for the application
outside the parser.
• Example:
<!NOTATION gif SYSTEM
‘http://mysite.com/GIF_viewer.exe’>
<!ATTLIST PIC viewer NOTATION (gif)
#IMPLIED>
<img src=“pic.gif” type=“gif”>
Entities
• Entity/Entities are replaceable contents which
helps reduced re-type or reassign of the same
content
• All the entities except predefined entities need
to be declared.

• Entities can be classified in two ways:


• Depending on where it is used
• Depending on how it is parsed
Classification 1
• Two types of Entities:
• Parameter Entity: Entity reference within used
within the DTD.
• General Entity: Entity reference within used
within the XML document.

• It is an error to put a parameter reference in


the xml document. But it is not an error to put
an entity reference in DTD in defining the
value of another entity. But the reference will
not be ‘resolved’ until it is used in the
document.
Entity declaration
• DTD
<!ENTITY GenEntity “xml doc
entity”>
<!ENTITY ParEntity % “dtd doc
entity”>
%ParEntity;
• IN XML:
&GenEntity;
Another classification
• Parsed Entities: well-formed content which is
parsed
• Unparsed Entities: non-XML data

• Unparsed entities depend on the notation


declaration to identify them so that the
application processing the XML document
knows what kind of entity is being used and
what to do with it.
Example: Parsed Entity
In DTD
<!ENTITY rights “All rights
reserved”>
<!ENTITY book “Designed by WW.
&rights; ”>
In XML:
&book;
Example: Unparsed Entity
In DTD:
<!NOTATION gif SYSTEM
‘http://mysite.com/GIF_viewer.exe’>
<!ENTITY picref SYSTEM
‘http:/mysite/img.gif’ NDATA gif>
<!ENTITY poster EMPTY>
<!ATTLIST poster src ENTITY #implied>
In XML:
<poster src=“picref”/>

Das könnte Ihnen auch gefallen