Sie sind auf Seite 1von 49

Introduction to XML

by Nikita Bais

Table Of Contents
Markup Languages What is XML ? The Difference Between XML and HTML How Can XML be Used? XML Structure XML Syntax Valid vs Well Formed XML Document Type Definition (DTD)

Markup Languages

Mark up The term refers to the tagging electronic documents


Modify look and formatting of documents (ex: bold & italic fonts, font sizes, text indents) Sets up structure of document and defines semantic meaning Example of documents uses markup HTML, RTF, SGML, XML

Markup Languages

Classification Of Markup Languages


Specific Markup Language
Generate code that is specific to particular application. Ex: HTML, RTF

Generalized Markup Language


Describe only structure not its formatting and syntax is strictly enforced. Ex: SGML, XML

XML Basics

What is XML?
XML stands for EXtensible Markup Language XML was designed to carry data, not to display data XML tags are not predefined. Users can define their own tags. XML is designed to be self-descriptive XML is a W3C Recommendation XML documents can be validated using DTD

XML Basics

Difference Between XML and HTML


XML is not a replacement for HTML. XML is complement to HTML. XML and HTML were designed with different goals:
XML was designed to transport and store data, with focus on what data is HTML was designed to display data, with focus on how data looks

HTML is about displaying information, while XML is about carrying information.

XML Basics

How can XML be used?


XML Separates Data from HTML XML Simplifies Data Sharing XML Simplifies Data Transport XML is Used to Create New Internet Languages

XML Structure

XML document includes logical and physical structure Logical Structure Indicates how document is built as opposed to what document contains.

Physical Structure Content used in the document.

XML Logical Structure

Prolog

First structural element in an XML document which is optional.

Prolog consists of two basic components


The XML declaration (all in lower case) <?xml version=1.0> The Document Type Declaration <!DOCTYPE filename>

XML Logical Structure

Document Element Follows prolog Heart of XML document where the actual content resides

XML Physical Structure


The physical structure of an XML document is composed of all the content used in document. The data is stored in form of entities

Ex: Predefined entities in XML


Entity Reference Character

&lt; &gt; &amp; &quot; &apos;

< > &

XML Physical Structure


What is Entity? Entities are storage unit Each entity is identified by unique name Entities are declared in DTD and are used anywhere in xml document. Processor retrieves contents of the entity when referenced in the xml document

XML Physical Structure

Entity declaration Syntax <!ENTITY entity-name "entity-value"> Example


DTD :<!ENTITY writer "Donald Duck."> XML Document :<author>&writer;</author>

(An entity has three parts: an ampersand (&), an entity name, and a semicolon (;). )

XML Physical Structure


Internal and External Entities Internal entities

Require no separate storage Contents are provided in its declaration

Syntax <!ENTITY entity-name "entity-value"> Example <!ENTITY writer "Donald Duck.">

XML Physical Structure

External Entities

Require separate storage Refers to a storage unit in its declaration by using SYSTEM or PUBLIC identifier

Syntax
<!ENTITY entity-name SYSTEM "URI/URL">

Example
<!ENTITY MyImage SYSTEM http://www.images.com/sunset.gif" NDATA GIF>

XML Physical Structure


In addition to SYSTEM identifier an entity can include PUBLIC identifier PUBLIC identifier provides alternative way to retrieve content of an entity PUBLIC identifier is useful when working with an entity that is publically available

Ex: <!ENTITY MyImage PUBLIC -//Images//Text Standard Images//EN http://www.images.com/sunset.gif" NDATA GIF>

XML Physical Structure

Parsed Entity
An entity made up of parsable text(any text data) XML processor extract content of entity Content of entity appears at the location of the entity reference in XML document
Example: <!ENTITY writer "Donald Duck.">
Entity declaration writer that contains Donald Duck

<author>&writer;</author>
Reference to the writer entity gets replaced with Donald Duck

XML Physical Structure

Unparsed Entity
An entity that cannot be parsed by XML processor An entity might or might not be text, if text it is not parsable text i.e. binary. An entity sometimes referred as binary entity as its content is often binary file (i.e. image) Requires notation, that identifies the format, or type, of resource to which entity is declared.

XML Physical Structure

Example
Entity Delcaration: <!ENTITY MyImage SYSTEM sunset.gif" NDATA GIF> Notation Declaration: <!NOTATION GIF SYSTEM //Utils/Gifview.exe> (This Specifies that XML processor should use Gifview.exe to process entity of type GIF)

XML Syntax

Opening and Closing tags


XML requires that closing tag be used for every element Example:
<EMAIL> <TO>Ashish</TO> . </EMAIL>

XML Syntax

The EMPTY-ELEMENT tag


Shortcut for empty element (element containing no data) Example:
If CC element doesnt contained data,
it can be declared as: <CC></CC> OR <CC/>

XML Syntax

Attributes
Attributes provide a method of associating values to an element XML elements can have attributes in name/value pairs just like in HTML.

Example:
<EMAIL DATE=14/02/2011> </EMAIL>

Valid Vs Well Formed XML

Valid XML
XML validated against a DTD is "Valid" XML Obeys all the validity constraints identified in XML specification
Example: Validity Constraint : Required Attribute If default declaration is the key #REQUIRED then attribute must be specified for all the elements of the type in attribute-list declaration.

Valid Vs Well Formed XML


<!ATTLIST element-name attribute-name attributetype #REQUIRED> DTD: <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML: <person />

Valid Vs Well Formed XML

Well formed XML


XML document with correct XML syntax XML syntax rules
XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must be quoted

Valid Vs Well Formed XML


Well Formed XML Example

<?xml version="1.0" ?> <EMAIL> <TO>Ashish</TO> <CC>Rahul</CC> <SUBJECT>Meeting Reminder</SUBJECT> <BODY>Group Meeting at 4.00 PM</BODY> </EMAIL>

Valid Vs Well Formed XML

Benefits of well-formedness
For the Client saves downloading time of DTD, if the xml document is validated against DTD by server. In cases where validation is not required, the focus is on the structure of document.
(Note: Valid documents = Well-formedness + satisfying all validity constraints)

Document Type Declaration

Document Classes
Background of design of XML Relates to OOP Conceptual use of inheritance and polymorphism
Example: Base class Book Book

Number Of Chapters
Cover Letter

DTD CONTD

Inheritance (Book and its subclasses)


Book
NumberOfChapters CoverLetter

CookBook
NumberOfChapters(Value 10) CoverLetter(Value Red) Recipe

TextBook
NumberOfChapters(Value 21) CoverLetter(Value Blue) Recipe

DTD CONTD

Polymorphism
Book
ArtBook
NumberOfChapters

CoverLetter(Value Blue, Pattern pt)


Class ArtBook overloads CoverLetter property of base class Book, it accepts color patterns in addition to the color values.

DTD CONTD

DTD
Acts as a Rule Book that allows author to create new documents of same type and same characteristics as a base document Defines the building blocks of an XML document. Defines the document structure with a list of elements and attributes

DTD CONTD

Example: DTD created for medical community.


Documents created with DTD can contain Patient Name, Medical History, Medications and so on. This information can be easily read by any medical institution which supports XML based document system.

DTD CONTD

DTD structure
Internal DTD (subset)
DTD which is declared inside XML document
<!DOCTYPE root-element [element-declarations]>

External DTD (subset)


DTD declared in external file and that file is included in XML document
<!DOCTYPE root-element SYSTEM "filename"> (Note: If the document contains both type of DTD then internal subset takes precedence over external subset)

Internal DTD

In this example, EMAIL DTD is created in XML document itself.


<?xml version=1.0 ?>
<!DOCTYPE EMAIL [ <!ELEMENT EMAIL (TO, FROM, CC, SUBJECT, BODY)> <!ELEMENT TO (#PCDATA)> <!ELEMENT FROM (#PCDATA)> <!ELEMENT CC (#PCDATA)> <!ELEMENT SUBJECT (#PCDATA)> <!ELEMENT BODY (#PCDATA)> ]> <EMAIL> <TO>Ashish@msn.com</TO> <FROM>Rahul@msn.com</FROM> <CC>Bill@msn.com</CC> <SUBJECT>My First DTD</SUBJECT> <BODY>Hello World</BODY> </EMAIL>

CONTD.
Interpretation of DTD

!DOCTYPE EMAIL defines that the root element of this document is EMAIL !ELEMENT EMAIL defines that the EMAIL element contains four elements: " TO, FROM, CC, SUBJECT, BODY " !ELEMENT TO defines the TO element to be of type "#PCDATA" !ELEMENT FROM defines the FROM element to be of type "#PCDATA" !ELEMENT CC defines the CC element to be of type "#PCDATA !ELEMENT SUBJECT defines the SUBJECT element to be of type "#PCDATA !ELEMENT BODY defines the BODY element to be of type "#PCDATA"

External DTD

In the following example, email.dtd file is separately created and referenced in XML document as email.dtd
<?xml version="1.0"?> <!DOCTYPE EMAIL SYSTEM email.dtd"> <EMAIL> <TO>Ashish@msn.com</TO> <FROM>Rahul@msn.com</FROM> <CC>Bill@msn.com</CC> <SUBJECT>My First DTD</SUBJECT> <BODY>Hello World</BODY> </EMAIL>

Here the file email.dtd" will contain the EMAIL DTD.

DTD CONTD

The Building Blocks of XML Documents


From a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks: Elements Attributes Entities PCDATA CDATA

DTD CONTD

Element Declarations
Syntax: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> Empty Elements : Empty elements are declared with the category keyword EMPTY: <!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example: <br />

DTD CONTD

Elements with Parsed Character Data Elements with only parsed character data are declared with #PCDATA inside parentheses: <!ELEMENT element-name (#PCDATA)> Example: <!ELEMENT FROM (#PCDATA)>

DTD CONTD

Elements with any Contents Elements declared with the category keyword ANY, can contain any combination of parsable data: <!ELEMENT element-name ANY> Example: <!ELEMENT EMAIL ANY>

DTD CONTD

Elements with Children (sequences) Elements with one or more children are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> Example: <!ELEMENT EMAIL (TO, FROM, CC, SUBJECT, BODY)>
(NOTE : When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. )

DTD CONTD

Declaring Only One Occurrence of an Element <!ELEMENT element-name (child-name)> Example: <!ELEMENT EMAIL (BODY)> The example above declares that the child element BODY" must occur once, and only once inside the EMAIL" element.

DTD CONTD

Declaring Minimum One Occurrence of an Element <!ELEMENT element-name (child-name+)> Example: <!ELEMENT EMAIL (BODY+)> The + sign in the example above declares that the child element BODY" must occur one or more times inside the EMAIL" element.

DTD CONTD
Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)> Example: <!ELEMENT EMAIL (BODY*)> The * sign in the example above declares that the child element BODY" can occur zero or more times inside the EMAIL" element.

DTD CONTD

Declaring Zero or One Occurrences of an Element <!ELEMENT element-name (child-name?)> Example: <!ELEMENT EMAIL (BODY?)> The ? sign in the example above declares that the child element BODY" can occur zero or one time inside the EMAIL" element.

DTD CONTD

Declaring either/or Content Example: <!ELEMENT EMAIL(TO,FROM,CC,SUBJECT,(MESSAGE|BOD Y))> The example above declares that the EMAIL" element must contain a TO" element, a FROM" element, a CC" element, and either a MESSAGE" or a BODY" element.

DTD CONTD

Declaring Mixed Content Example: <!ELEMENT EMAIL (#PCDATA|TO|FROM|CC|SUBJECT|BODY)*> The example above declares that the EMAIL" element can contain zero or more occurrences of parsed character data, TO", FROM", CC", SUBJECT or BODY" elements.

DTD CONTD

Declaring Attributes An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attributetype default-value>

DTD example: <!ATTLIST person number CDATA 0000">


XML example: <person number="5677" />

THANK YOU!!!!!!!

Das könnte Ihnen auch gefallen