Sie sind auf Seite 1von 21

Components of an

XML Document
Definition Description
What XML elements are and requirements for working with them in
Elements
XML documents.
Outlines the order and contents of the initial prolog or XML
Prolog
document header in an XML document.
Explains what the XML declaration is and its required placement if
XML Declaration
included in XML documents.
What processing instructions are in XML documents and their most
Processing Instructions frequent use, as a means of linking to an XML style sheet in
the prolog of an XML document.
What the DOCTYPE declaration is and how it is used to reference
DOCTYPE Declaration an external or internal Document Type Definition (DTD) for
XML documents that include it.
Explains how comments can be made in XML markup as a means of
XML Comments annotating and as a mechanism for including unparsed content
in the XML document.
Outlines the rules for use and inclusion of textual content (also
Textual Content
known as character data) in XML documents.
Describes XML character entities for escaping special or reserved
Character and Entity
characters that are used to delineate markup and node
References
boundaries within the XML document.
Describes the use of the XML-specific CDATA (character data)
CDATA Sections sections for fully escaping text contents (including formatting
or white space contents) in XML documents.
What XML attributes are and requirements for working with them in
Attributes
XML elements.
The rules and options for how white space can be handled when
White Space
parsing XML documents.
Elements
► Element Names
 Element names are case-sensitive and
must start with a letter or underscore.
► Start Tags, End Tags, and Empty Tags
 <elementName att1Name="att1Value"
att2Name="att2Value".../>
 <giggle></giggle> or <giggle/>
Prolog
► The prolog refers to the information
that appears before the start tag of
the document or root element. It
includes information that applies to
the document as a whole, such as
character encoding, document
structure, and style sheets.
► <?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="show_book.xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<!--catalog last updated 2000-11-01-->

► <?xml-stylesheet type="text/xsl"
href="show_book.xsl"?>

► <! -- catalog last updated 2000-11-01--!>


XML Declaration
► The version number, <?xml
version="1.0"?>.
► The encoding declaration, <?xml
version="1.0" encoding="UTF-8"?>.
► An XML declaration can also contain a
standalone declaration, for example,
<?xml version="1.0" encoding="UTF-
8" standalone="yes"?>
Processing Instructions
► Processing instructions can be used to pass
information to applications in a way that
escapes most XML rules. Processing
instructions do not have to follow much
internal syntax, can include markup
characters without escaping them, and can
appear anywhere in the document outside
of other markup. They can appear in the
prolog, including the document type
definition (DTD), in textual content, or after
the document. Their appearance is not
noted by schema or DTD processors.
► The following is an xml-stylesheet processing instruction
identifying a style sheet built using a cascading style sheet.

<?xml-stylesheet href="/style.css" type="text/css"


title="default stylesheet"?>

► The following is an xml-stylesheet processing instruction


identifying a style sheet built using Extensible Stylesheet
Language (XSL).

<?xml-stylesheet href="/style.xsl" type="text/xsl"


title="default stylesheet"?>
DOCTYPE Declaration
►A DOCTYPE declaration can contain:
 The name of the document or root
element.This is required if the DOCTYPE
declaration is used.
 System and public identifiers for the DTD
that can be used to validate the document
structure. If a public identifier is used, a
system identifier must also be present.
 An internal subset of DTD declarations.
The internal subset appears between
square brackets ([ ]).
► <!DOCTYPErootElement PUBLIC
"PublicIdentifier"
"URIreference"[declarations]>

► ThePublicIdentifier provides a separate


identifier that some XML parsers can use to
reference the DTD in place of the
URIreference. This is useful if the parser is
used on a system without a network
connection or where that connection would
slow down processing significantly.
XML Comments
► Content that is not intended for the XML
parser, such as notes about document
structure or editing, can be included in a
comment. Comments begin with a <!-- and
end with a -->

► <!--catalog last updated 2000-11-01--


>

► <!---<test pattern="SECAM" /><test


pattern="NTSC" /> -->
Textual Content
► Because of XML support for the
Unicode character set, XML supports a
range of characters, including letters,
digits, punctuation, and symbols. Most
control characters and Unicode
compatibility characters are not
allowed. XML relies on <, >, and & to
delimit markup, we should represent
these characters using the character
and entity references or CDATA.
Character and Entity
References
► Characters cannot be entered directly into a
document because they would be
interpreted as markup.
► Characters cannot be entered directly into a
document because of input device
limitations.
► Characters cannot be transported reliably
through a processor limited to one-byte
characters.
► A character string or document fragment
appears repeatedly and can be
lt
&lt;
< (less than)
► To write Me&You, for
example, use
gt
&gt;
Me&amp;You.
> (greater than) ► For a<b, use
amp a&lt;b.
&amp; ► For b>c, use
& (ampersand)
b&gt;c.
apos ► &apos; is not
&apos;
' (apostrophe or single recognized as an
quote)
HTML file; $#....
quot must be used when
&quot;
" (double quote)
transforming to
HTML.
CDATA Sections
► <![CDATA[An in-depth look at creating applications
with XML, using <, >,]]>

► <![CDATA[if (c<10)]]>Note
Content within CDATA sections must be within the
range of characters permitted for XML content;
control characters and compatibility characters
cannot be escaped this way. In addition, the
sequence ]]> cannot appear within a CDATA section
because this sequence signals the end of the
section. This means that CDATA sections cannot be
nested. The sequence also appears in some scripts.
Within scripts, it is usually possible to
substitute] ]> for ]]>.
Attributes
► Attributes allow we to add information
about an element using name-value
pairs. Attributes are often used to
define properties of elements that are
not considered the content of the
element, though in some cases (for
example, the HTML img element) the
content of the element is determined
by attribute values.
► <elementName
att1Name="att1Value"
att2Name="att2Value".../>

► <myElementquestion="They asked
&quot;Why?&quot;" />

► <myElementcontraction="isn't"
question='They asked "Why?"' />
White Space
► White Space and the XML Declaration
 According to the current XML 1.0
standard, white space is not allowed
before the XML declaration.
Xml version=1.0
BOOK
BOOKNAMEXMLBOOKNAME
BOOK
► White Space in Element Content
 XML parsers are required to report all white space that
appears in element content within a document. For this
reason, the following three documents are different to an
XML parse

► document
► data1data
► data2data
► data3data
► Document

► Documentdata1datadata2datadata3datadocument

► documentdata1data data2data data3datadocument


► White Space in Attributes

 <whiteSpaceLoss note1="this is a note."


note2="this
is
a
note.">

 An XML parser reports both attribute values as


this is a note., converting the line breaks to
single spaces.
► End of Line Handling
 XML processors treat the character
sequence Carriage Return-Line Feed
(CRLF) like single CR or LF characters. All
are reported as a single LF character.
Applications can save documents using
the appropriate line-ending convention.

Das könnte Ihnen auch gefallen