Sie sind auf Seite 1von 16

Introduction to SGML features of XML - XML as a subset of SGML XML Vs HTML

Views of an XML document - Syntax of XML- XML Document Structure Namespaces- XML
Schemas- simple XML documents Different forms of markup that can occur in XML
documents - Document Type declarations Creating XML DTDs

I INTRODUCTION TO SGML

Markup language refers to the traditional way of marking up a document. It determines the
structure and meaning of textual elements .It consists of codes and tags that are added to the text
to change the look or meaning of text or document.

There are two types of markup languages.

a) Specific Markup Language

It is used to generate the code that is specific to a particular application. Examples are

HTML Purpose is to format the documents for the web


RTF Used for Rich Text Formatting(MSWord supports RTF)

b)Generalized Markup Language

It is generated to solve some problems associated with porting documents from one platform
and operating system configuration to another .GML is introduced by Dr.C.F Goldforb in
1960s.It is first developed for IBM.

Later it is adopted as Standard by the International Organization for Standardisation (ISO) in


1986.Thus the SGML (Standard Generalized Markup Language) originated.

SGML Structure

An SGML application consists of two parts SGML declaration and SGML DTD (Document type
Definitions).

SGML Declaration - The declaration parts identifies the characters to be used in a document .It
provides a way to identify the objects that will be used throughout the SGML document. These
objects are called Entities

SGML DTD In the Document Type Definition we can list the element type we wish to use in
your document and indicating the structural order in which they can occur
SGML Features

1. The term SGML stands for Standard Generalized Markup Language


2. It is a system for defining the markup language.
3. SGML is a meta language .It facilitates the creation of other languages.
4. SGML is extensible .It allows the author to define a particular structure by defining the
parts that fits the structure.
5. SGML a system for organizing and tagging elements of a document.
6. SGML specifies the rules for tagging elements.
7. It is widely used to manage large document that are subject to frequent revisions and need
to be print in different format.
8. Authors can mark up their document by representing structural, presentational and
semantic information along with the content.
9. SGML is intended to be absolutely independent of any application
10. Closing tags are optional and nothing in the SGML document indicates how the data
should look.
11. HTML is an application of SGML because HTML was created using SGML standards.
12. SGML added provisions for identifying the characters to be used in the
document and providing a way to identify the objects that will be used throughout a
document.

XML FEATURES

1. XML stands for Extensible Markup Language.

2. It is designed to describe data or information and focus on what data is?

3. XML is a smaller language than SGML(ie subset of SGML)

4. It is used to format and transfer data in an easy and convenient way.

5. It is a markup language like HTML.

6. XML has the ability to work with HTML for data display and presentation

7. It is a standard language used to structure and describe data that can be understood by
different application.

8. XML documents are called self describing documents

9. XML tags are not predefined . you must define your own tags.

10. XML is free and extensible. It is a compliment to HTML


11. XML includes specification for a Style Sheet Language called eXtensible Stylesheet
Language ( XSL )

12. XML includes specification for a hyper linking scheme , which is described as a separate
language called eXtensible Link Language ( XLL )

13. Every XML document consists of data and markup.you can literally tag up your data with
your own tags .

14. XML can be used as a data interchange format .Since the XML text format is standards
based ,data can be converted and then easily read by another system or application

XML as a Subset of SGML

SGML is a very powerful, very general and a standard markup language. But with
that power comes the increased complexity.

XML is a subset of SGML intended to make SGML light enough for use on
web.

As XML is a proper subset of SGML, all XML documents are valid SGML
documents .But not all SGML documents are valid XML document.

Relationship of XML to SGML

SGML
XML

SGML is intended to be absolutely independent of any applications

The complexity of implementing SGMLs power limits its users to big companies
that need all that power. Hence XML the simplified SGML that retains most of the
inherent power of SGML in a simple ,tidy ,easy-to-use and easy-to-implement form
arrived.

Since XML is optimized for use on the World Wide Web, it is designed in such a
way that it has some benefits that are not found in SGML.

XML becomes a smaller language than SGML because the designers of XML
removed some specification in SGML that was not needed for web delivery..
COMPARISON OF HTML AND XML
HTML XML

HTML is HyperText Markup Languge XML is eXtensible Markup Language

It is used for displaying information and to It is designed to describe data and to focus on
format the document what data is?

It is Extensible,it allows the author to define a


HTML is not extensible.The user cant modify particular structure
the structure or format by adding your tags.

HTML tags are predefined.


Tags are not predefined.

Closing tags are mostly optional


Closing tags are compulsory

HTML is not case sensitive


XML is highly case sensitive

HTML has no Document Type


Defenition(DTD) XML uses DTD to describe data elements
used in the document

Document display is direct


and easy using any web browser with HTML . XML need XSL interaction for web browser
display of document

Cascading Style Sheet (CSS) a style sheet


standard for HTML can be embed within
HTML code
In XML presentation and content are kept
separate ie XSL page is acting independently.

CREATING XML DOCUMENT TYPE DEFENITION (DTD)


A Document Type Declaration is a statement embedded in an XML document whose purpose is
to acknowledge the existence and location of Document Type Definition (DTD).
Document Type Definition is a set of rules that defines the structure of an XML document
explicitly and completely.
It tells you what tags you can use in a document, what order they should appear in, which tags
can appear inside other ones, which tags have attributes and so on.
All Document Type Declaration starts with the string <!DOCTYPE
There are two types of DTD a) Internal DTD b) External DTD

INTERNAL DTD

If the DTDs are internal then the syntax is

<!DOCTYPE root-element [<!internal type definition>]>

Internal DTDs are also known as Internal Subset.This is a sample XML document with
internal DTD

<?xml version=1.0?>

<!DOCTYPE mail [

<!ELEMENT mail(to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>]>

<mail>

<to>Rani</to>

<from>Ravi</from>

<heading>Remainder</heading>

<body>About our parents Wedding Anniversary</body>

</mail>

The DTD is interpreted as follows:


1)<!DOCTYPE mail indicates that mail is the root element

2) <!ELEMENT mail Root element mail has 4 sub elements

3) <!ELEMENT to Sub element to wil be of type PCDATA

4)PCDATA Element contain only text data

5) # # is reserved character indicates that #PCDATA is

a reserved word

EXTERNAL DTD

If the Document Type Declaration is external then the DTD must be specified either as
SYSTEM or PUBLIC in the Document Type Declaration.

If PUBLIC the DTD can be used by anyone by referring the URL

If SYSTEM the DTD resides on the local hard disk and may not be available for use by other
applications. The External subset, if present, consists of a reference to an external entity
following the DOCTYPE keyword as illustrated here:

//mail.xml

<?xml version=1.0?>

<!DOCTYPE mail SYSTEM mail.dtd>

<mail>

<to>Rani</to>

<from>Ravi</from>

<heading>Remainder</heading>

<body>About our parents Wedding Anniversary</body>


</mail>

//mail.dtd

<?xml version=1.0?>

<!ELEMENT mail(to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>]>

The DTD can be housed exclusively by either the external or internal subset or both.

Another Example External DTD

//apples.dtd

<!ELEMENT apples(#PCDATA)>

//apples.xml

<!DOCTYPE SYSTEM apples.dtd [

<!ATTLIST apples color CDATA #REQUIRED>]>

<apples color=green>12</apples>

Internal DTD

//apples.xml

<!DOCTYPE apples [

<!ELEMENT apples(#PCDATA)>

]>

<apples>12</apples>
ELEMENT TYPE DECLARATION

Every element in a valid XML document must have an element type declared in the DTD.

To validate an XML document ,a validating parser needs to know three things about each
element
1) What the element type is named
2) What elements of that type can contain(content model)
3) What attributes an element of that type has associated

Both the element type name and its content model are declared together in what is known as
Element Type declaration

Element Type declaration must start with the string <!ELEMENT followed by the name and
content specification

Every element has certain allowed content. there are four general types of content
specification
1)EMPTY content may not have content

2)ANY content may have any combination of elements in any order


3)Mixed content may have character data or mix of character data and sub elements

Element Type Declaration Interpretation

<!ELEMENT stock EMPTY> An element of type stock does not contain


anything ex: <stock/>

<!ELEMENT An element of type contact contains 3 sub


contact(name,address,phone)> elements name, address and phone exactly in
that order.
Ex:<contact>

<name>aaa</name>
<address>SJCET,pala</address>

<phone>239301</phone>

</contact>

<!ELEMENT An element of type contact contains name


contact(name,address?,phone)> element followed by an optional address
element and phone .address can occur once or
not at all
Ex:<contact>

<name>aaa</name>
<phone>239301</phone></contact>

<!ELEMENT An element of type fruit contains either a single


fruit(apple|orange)> apple element or a single orange element. A
selection from a list of element, only one
allowed

<fruit><apple>---</apple></fruit>

An element of type fruit contains one or more


sub elements that are either apple element or
<!ELEMENT orange element
fruit(apple|orange)+>
Ex1:<fruit><apple>---/apple>

<apple>---</apple>

<orange>--</orange>

</fruit>

Ex2:<fruit>

<orange>--</orange></fruit>

An element of type fruit contains zero or more


sub elements that are either apple element or
<!ELEMENT orange element
fruit(apple|orange)*>
Ex1:<fruit><apple>---/apple>
<orange>---</orange><fruit>

Ex2:<fruit></fruit>

An element of type para contains a mixture of


character data and list elements in any order.
<!ELEMENT
para(#PCDATA|list)*> Ex1:<para>Here is my list

<list>---</list></para>

Ex2:<para>aaa,bbb,ccc</para>

Ex3:<para><list>---</list></para>

An invoice element consists of a from element


followed by to element followed by one or more
<!ELEMENT item element
invoice(from,to,item+)>

An invoice element consists of a from element


followed by zero or more to element followed
<!ELEMENT by one or more item element
invoice(from,to*,item+)>

An invoice element consists of any combination


of elements or character data, in any order .
<!ELEMENT invoice ANY>

ATTRIBUTE LIST DECLARATION

Attributes need to be declared in the DTD for validating XML parser to check that they have
been used properly in an XML document.

An Attribute List Declaration has four aspects..


1) The element type to which it belongs
2) What the attribute is named

3) What type of data the attribute value can contain

4) What default value each attribute has.

Attribute declarations starts with the string <!ATTLIST .

Example : <!ELEMENT person(#PCDATA)>


<!ATTLIST person email CDATA #REQUIRED>

You can declare many attribute in a single attribute list declaration


<!ATTLIST person email CDATA #REQUIRED
phone CDATA #REQUIRED

fax CDATA #REQUIRED>

You can also have multiple attribute list declaration for a single element
<!ATTLIST person email CDATA #REQUIRED>
<!ATTLIST person phone CDATA #REQUIRED>

Each attribute in a declaration has three parts: a name, type and default value. The table below
shows the partial attribute list declaration

Partial Attribute List Declaration Interpretation

<!ATTLIST product name----> An element of type product has an


attribute known as name
<product name=---->
<!ATTLIST product An element of type product has an attribute
known as name and color
name----
<product name=-- color=--->
color ----->

<!attlist product name----> Error the keyword ATTLIST must always


be in uppercase

ATTRIBUTE TYPES

String Attribute
Attribute Description:
Type:

CDATA CDATA stands for character data, that is, text that does not form markup

Tokenized
Attribute Description:
Attribute Type:

ID is a unique identifier of the attribute.IDs of a particular value should not


appear more than once in an XML document . An element type may

ID only have one ID attribute . An ID attribute can only have


an #IMPLIED or#REQUIRED default value . The first character of
an ID value must be aletter, '_', or ':'

IDREF is used to establish connections between elements.


The IDREF value of the attribute must refer to an ID value declared
IDREF elsewhere in the document . The first character of an ID value must be
a letter, '_', or ':'

IDREFS Allows multiple ID values separated by whitespace .

ENTITYs are used to reference data that act as an abbreviation or can be


ENTITY found at an external location. The first character of an ENTITY value must
be a letter, '_', or ':'
ENTITIES Allows multiple ENTITY names separated by whitespace

The first character of an NMTOKEN value must be a letter, digit, '.', '-', '_',
NMTOKEN
or ':'

NMTOKENS Allows multiple NMTOKEN names separated by whitespace.

Enumerated
Attribute Description:
Attribute Type:

NOTATIONs are useful when text needs to be interpreted in a particular


NOTATION way, for example, by another application. The first character of
a NOTATIONname must be a letter, '_', or ':'

Enumerated attribute types allow you to make a choice between different


Enumerated attribute values. The first character of an Enumerated value must be
aletter, digit, '.', '-', '_', or ':'

CDATA Example:

<?xml version="1.0"?>
<!DOCTYPE image [
<!ELEMENT image EMPTY>
<!ATTLIST image height CDATA #REQUIRED>
<!ATTLIST image width CDATA #REQUIRED>
]>
<image height="32" width="32"/>

ID Example:

<?xml version="1.0"?>
<!DOCTYPE student_name [
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no ID #REQUIRED>
]>
<student_name student_no="a9216735">Jo Smith</student_name>

IDREF Example:

<?xml version="1.0" standalone="yes"?>


<!DOCTYPE lab_group [
<!ELEMENT lab_group (student_name)*>
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no ID #REQUIRED>
<!ATTLIST student_name tutor_1 IDREF #IMPLIED>
<!ATTLIST student_name tutor_2 IDREF #IMPLIED>
]>
<lab_group>
<student_name student_no="a8904885">Alex Foo</student_name>
<student_name student_no="a9011133">Sarah Bar</student_name>
<student_name student_no="a9216735"
tutor_1="a9011133" tutor_2="a8904885">Jo Smith</student_name>
</lab_group>

ENTITY Example:

<?xml version="1.0" standalone="no"?>


<!DOCTYPE experiment_a [
<!ELEMENT experiment_a (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results image ENTITY #REQUIRED>
<!ENTITY a SYSTEM
"http://www.university.com/results/experimenta/a.gif">
]>
<experiment_a>
<results image="a"/>
<experiment_a>

ENTITIES Example:

<?xml version="1.0" standalone="no"?>


<!DOCTYPE experiment_a [
<!ELEMENT experiment_a (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results images ENTITIES #REQUIRED>
<!ENTITY a1 SYSTEM
"http://www.university.com/results/experimenta/a1.gif">
<!ENTITY a2 SYSTEM
"http://www.university.com/results/experimenta/a2.gif">
<!ENTITY a3 SYSTEM
"http://www.university.com/results/experimenta/a3.gif">
]>
<experiment_a>
<results images="a1 a2 a3"/>
</experiment_a>

NMTOKEN Example:

<?xml version="1.0"?>
<!DOCTYPE student_name [
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no NMTOKEN #REQUIRED>
]>
<student_name student_no="9216735">Jo Smith</student_name>

ATTRIBUTE DEFAULTS

An Attribute list declaration includes information about whether or not a value must be supplied
for it and if not,what the XML processor should do.

There are three different variations:

1)Required --->a value must be specified.

2)Implied --->the XML processor tells the application that no value was supplied.The
application can decide what best to do.

3)Fixed --->A value is supplied in the declaration. No value need be supplied in the document
and the XML processor will pass the specified fixed value through the document. If a value is
supplied in the document, it must exactly match the fixed value.

#REQUIRED -The attribute must have an explicitly specified value on every occurrence of
the element in the document.

<!ATTLIST product name CDATA #REQUIRED>

An element of type product has an attribute called name whose value can be any string of chars
except <,>,&.The value must be supplied when it is used in the document. <product
name=Acmepc>

In this example the type attribute of the fruit element is declared to be required.<!DOCTYPE
fruit[ <!ELEMENT fruit EMPTY>

<!ATTLIST fruit type CDATA #REQUIRED>]>

<fruit type=apple/>

A validating XML parser would thus reject the following document

.<!DOCTYPE fruit[<!ELEMENT fruit EMPTY>

<!ATTLIST fruit type CDATA #REQUIRED>]>

<fruit />

#IMPLIED -These are attributes that can be left unspecified if desired. The XML processor
passes the fact that the attribute was unspecified through out the XML application, which can
then choose what best to do.
Valid document.<!DOCTYPE fruit[<!ELEMENT fruit EMPTY>

<!ATTLIST fruit type CDATA #IMPLIED>]>

<fruit />

<!ATTLIST product color(red|green) #IMPLIED>

An element of type product has an attribute called color. Color attribute must be either string
red or green. If the value is not supplied, leave it up to the XML application to decide what
to do.

<product color=red></product> or <product> is also valid.

#FIXED An attribute declaration may specify that an attribute has a fixed value. In this case
attribute is not required, but if it occurs it must have a specific value.

<!ATTLIST product name CDATA #FIXED Acmepc>

An element of type product has an attribute called name having a fixed value Acmepc. Any
other value is an Error.<product name=Acmepc>

Das könnte Ihnen auch gefallen