Sie sind auf Seite 1von 10

XML (Extensible Markup Language)

Extensible Markup Language (XML) is used to describe data. The XML standard is a
flexible way to create information formats and electronically share structured data via
the public Internet, as well as via corporate networks.
XML code, a formal recommendation from the World Wide Web Consortium (W3C), is
similar to Hypertext Markup Language (HTML). Both XML and HTML contain markup
symbols to describe page or file contents. HTML code describes Web page content
(mainly text and graphic images) only in terms of how it is to be displayed and
interacted with.XML data is known as self-describing or self-defining, meaning that the
structure of the data is embedded with the data, thus when the data arrives there is no
need to pre-build the structure to store the data; it is dynamically understood within the
XML. XML is actually a simpler and easier-to-use subset of the Standard Generalized
Markup Language (SGML), which is the standard to create a document structure.
The basic building block of an XML document is an element, defined by tags. An
element has a beginning and an ending tag. All elements in an XML document are
contained in an outermost element known as the root element. XML can also
support nested elements, or elements within elements. This ability allows XML to
support hierarchical structures. Element names describe the content of the element,
and the structure describes the relationship between the elements. An XML document is
considered to be "well formed" (that is, able to be read and understood by an
XML parser) if its format complies with the XML specification, if it is properly marked up,
and if elements are properly nested.

For example:
<? Xml version="1.0" standalone="yes"?>
<Conversation>
<greeting>Hello, world!</greeting>
<response>Stop the planet, I want to get off!</response>
</conversation>

XML Features
• Excellent for handling data with a complex structure or atypical data.
• Data described using markup language.
• Text data description.
• Human- and computer-friendly format.
• Handles data in a tree structure having one-and only one-root element.
• Excellent for long-term data storage and data reusability

XML Component
The most basic components of an XML document are elements,
attributes, and comments.

XML Elements- can be defined as building blocks of an XML. Elements can behave as
containers to hold text, elements, attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either
delimited by start and end tags, or for empty elements, by an empty-element tag.
Syntax
Following is the syntax to write an XML element –
<element-name attribute1 attribute2>
....content
</element-name>
Where,
.Element-name is the name of the element.
The name its case in the start and end tags must match.
.attribute1, attribute2 are attributes of the element separated by white spaces.
An attribute defines a property of the element. It associates a name with a value, which
is a string of characters.
An attribute is written as −
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.

Empty Element
An empty element (element with no content) has following syntax −
<name attribute1 attribute2.../>

Following is an example of an XML document using various XML elements –

<?xml version = "1.0"?>


<contact-info>
<address category = "residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
</contact-info>

XML Elements Rules


Following rules are required to be followed for XML elements −
• An element name can contain any alphanumeric characters. The only
punctuation mark allowed in names are the hyphen (-), under-score (_) and
period (.).
• Names are case sensitive. For example, Address, address, and ADDRESS are
different names.
• Start and end tags of an element must be identical.
• An element, which is a container, can contain text or elements as seen in the
above example.

XML Attributes- Attributes are part of XML elements. An element can have multiple
unique attributes. Attribute gives more information about XML elements.

Syntax
An XML attribute has the following syntax −
<element-name attribute1 attribute2 >
....content.
< /element-name>
where attribute1 and attribute2 has the following form −
name = "value"
Value has to be in double (" ") or single (' ') quotes. Here, attribute1 and attribute2 are
unique attribute labels. Attributes are used to add a unique label to an element, place
the label in a category, add a Boolean flag, or otherwise associate it with some string of
data. Following example demonstrates the use of attributes −
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>

<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
Attributes are used to distinguish among elements of the same name, when you do not
want to create a new element for every situation. Hence, the use of an attribute can add
a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attribute category
and assigning different values to each of the elements. Hence, we have two categories
of plants, one flowers and other color. Thus, we have two plant elements with different
attributes.
XML Document Structure
The XML Recommendation states that an XML document has both logical and physical
structure. Physically, it is comprised of storage units called entities, each of which may
refer to other entities, similar to the way that includes works in the C language.
Logically, an XML document consists of declarations, elements, comments, character
references, and processing instructions, collectively known as the markup.
An XML document consists of three parts, in the order given:
1. An XML declaration (which is technically optional, but recommended in most
normal cases)
2. A document type declaration that refers to a DTD (which is optional, but required if
you want validation)
3. A body or document instance (which is required)
Collectively, the XML declaration and the document type declaration are called the XML
prolog.

XML Declaration
The XML declaration is a piece of markup (which may span multiple lines of a file) that
identifies this as an XML document. The declaration also indicates whether the
document can be validated by referring to an external Document Type Definition (DTD).
The minimal XML declaration is:
<? Xml version=”1.0” ?>
XML is case-sensitive (more about this in the next subsection), so it's important that you
use lowercase for xml and version. The quotes around the value of the version attribute
are required, as are the ? characters. At the time of this writing, "1.0" is the only
acceptable value for the version attribute, but this is certain to change when a
subsequent version of the XML specification appears.
NOTE
Do not include a space before the string xml or between the question mark and the
angle brackets. The strings <?xml and ?> must appear exactly as indicated. The space
before the ?> is optional. No blank lines or space may precede the XML declaration;
adding white space here can produce strange error messages.
In most cases, this XML declaration is present. If so, it must be the very first line of the
document and must not have leading white space. This declaration is technically
optional; cases where it may be omitted include when combining XML storage units to
create a larger, composite document.
Actually, the formal definition of an XML declaration, according to the XML 1.0
specification is as follows:
Xml Decl = '<? Xml' Version Info Encoding Decl? SDDecl? S? '?>'
This Extended Backus-Naur Form (EBNF) notation, characteristic of many W3C
specifications, means that an XML declaration consists of the literal sequence '<?xml',
followed by the required version information, followed by optional encoding and
standalone declarations, followed by an optional amount of white space, and
terminating with the literal sequence '?>'. In this notation, a question mark not contained
in quotes means that the term that proceeds it is optional.
The following declaration means that there is an external DTD on which this document
depends. See the next subsection for the DTD that this negative standalone value
implies.
<? Xml version="1.0" standalone="no" ?>
On the other hand, if your XML document has no associated DTD, the correct XML
declaration is:
<? Xml version="1.0" standalone="yes" ?>
The XML 1.0 Recommendation states: "If there are external markup declarations but
there is no standalone document declaration, the value 'no' is assumed."
The optional encoding part of the declaration tells the XML processor (parser) how to
interpret the bytes based on a particular character set. The default encoding is UTF-8,
which is one of seven character-encoding schemes used by the Unicode standard, also
used as the default for Java. In UTF-8, one byte is used to represent the most common
characters and three bytes are used for the less common special characters. UTF-8 is
an efficient form of Unicode for ASCII-based documents. In fact, UTF-8 is a superset of
ASCII.

<? Xml version="1.0" encoding="UTF-8" ?>

For Asian languages, however, an encoding of UTF-16 is more appropriate because


two bytes are required for each character. It is also possible to specify an ISO character
encoding, such as in the following example, which refers to ASCII plus Greek
characters. Note, however, that some XML processors may not handle ISO character
sets correctly since the specification requires only that they handle UTF-8 and UTF-16.

<? Xml version="1.0" encoding="ISO-8859-7" ?>

Both the standalone and encoding information may be supplied:

<? Xml version="1.0" standalone="no" encoding="UTF-8" ?>

Is the next example valid?


<? Xml version="1.0" encoding='UTF-8' standalone='no'?>

Yes, it is. The order of attributes does not matter. Single and double quotes can be used
interchangeably, provided they are of matching kind around any particular attribute
value. (Although there is no good reason in this example to use double quotes for
version and single quotes for the other, you may need to do so if the attribute value
already contains the kind of quotes you prefer.) Finally, the lack of a blank space
between 'no' and ?> is not a problem.
Neither of the following XML declarations is valid.

<? XML VERSION="1.0" STANDALONE="no"?>


<? xml version="1.0" standalone="No"?>

The first is invalid because these particular attribute names must be lowercase, as must
"xml". The problem with the second declaration is that the value of the standalone
attribute must be literally "yes" or "no", not "No". (Do I dare call this a "no No"?)

Xml Comments –
Comments can be used to include related links, information, and terms. They are visible
only in the source code; not in the XML code. Comments may appear anywhere in XML
code.
Syntax
XML comment has the following syntax −
<!--Your comment-->
A comment starts with <!-- and ends with -->. You can add textual notes as comments
between the characters. You must not nest one comment inside the other.
Example
Following example demonstrates the use of comments in XML document −
<?xml version = "1.0" encoding = "UTF-8" ?>
<!--Students grades are uploaded by months-->
<class_list>
<student>
<name>Tanmay</name>
<grade>A</grade>
</student>
</class_list>
Any text between <!-- and --> characters is considered as a comment.

XML Comments Rules


Following rules should be followed for XML comments −
• Comments cannot appear before XML declaration.
• Comments may appear anywhere in a document.
• Comments must not appear within attribute values.
• Comments cannot be nested inside the other comments

Document Type Definitions


Document Type Definitions Schemas A schema is a set of rules that defines the
structure of elements and attributes and the types of their content and values in an XML
document. Analogy: A schema specifies a collection of XML documents in the same
way a BNF definition specifies the syntactically correct programs in programming
language. A schema defines what elements occur in a document and the order in which
they appear and how they are nested; it also tells what attributes belong to which
elements and describes their types to some extent.

Advantages of Schemas
• Define the characteristics and syntax of a set of documents.
• Independent groups can have a common format for interchanging XML documents.
• Software applications that process the XML documents know what to expect if the
documents adhere to a formal schema
• XML documents can be validated to verify that they conform to a given schema.
• Validation can be used as a debugging tool, directing the designer to items in a
document that violate the schema.
• A schema can act as documentation for users defining or reading some set of XML
documents.
• A schema can increase the reliability, consistency, and accuracy of exchanged
documents.

An Internal DTD Declaration


If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE>
definition:
Example
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

An External DTD Declaration


If the DTD is declared in an external file, the <!DOCTYPE> definition must contain a
reference to the DTD file:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Validation
Validation is a process by which an XML document is validated. An XML document is
said to be valid if its contents match with the elements, attributes and associated
document type declaration (DTD), and if the document complies with the constraints
expressed in it. Validation is dealt in two ways by the XML parser. They are −
• Well-formed XML document
• Valid XML document
Well-formed XML Document
An XML document is said to be well-formed if it adheres to the following rules −
• Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quote (double quote).
• It must follow the ordering of the tag. i.e., the inner tag must be closed before
closing the outer tag.
• Each of its opening tags must have a closing tag or it must be a self ending
tag.(<title>....</title> or <title/>).
• It must have only one attribute in a start tag, which needs to be quoted.
• amp(&), apos(single quote), gt(>), lt(<), quot (double quote) entities other than
these must be declared.

Example
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>

<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
• It defines the type of document. Here, the document type is element type.
• It includes a root element named as address.
• Each of the child elements among name, company and phone is enclosed in its
self explanatory tag.
• Order of the tags is maintained.

XML elements must follow these naming rules:


• Element names are case-sensitive.
• Element names must start with a letter or underscore.
• Element names cannot start with the letters xml (or XML, or Xml, etc)
• Element names can contain letters, digits, hyphens, underscores, and periods.
• Element names cannot contain spaces.
The following are some valid some XML names:
<Movie_Catalog>
<movie-100>
<movie.catlog>
. Document Object Model (DOM)- is the foundation of XML. XML documents have a
hierarchy of informational units called nodes; DOM is a way of describing those nodes
and the relationships between them.
A DOM document is a collection of nodes or pieces of information organized in a
hierarchy. This hierarchy allows a developer to navigate through the tree looking for
specific information. Because it is based on a hierarchy of information, the DOM is said
to be tree based.
The XML DOM, on the other hand, also provides an API that allows a developer to
add, edit, move, or remove nodes in the tree at any point in order to create an
application.

<!DOCTYPE html>
<html>
<body>
<h1>TutorialsPoint DOM example </h1>
<div>
<b>Name:</b> <span id = "name"></span><br>
<b>Company:</b> <span id = "company"></span><br>
<b>Phone:</b> <span id = "phone"></span>
</div>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp = new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","/xml/address.xml",false);
xmlhttp.send();
xmlDoc = xmlhttp.responseXML;

document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
document.getElementById("company").innerHTML=
xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
document.getElementById("phone").innerHTML=
xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
</script>
</body>
</html>

Contents of address.xml are as follows –

<?xml version = "1.0"?>


<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
Now let us keep these two files sample.htm and address.xml in the same
directory /xml and execute the sample.htm file by opening it in any browser. This
should produce the following output.

XSL
Before learning XSLT, we should first understand XSL which stands for
Extensible Style sheet Language. It is similar to XML as CSS is to HTML.
Need for XSL
In case of HTML document, tags are predefined such as table, div, and span; and the
browser knows how to add style to them and display those using CSS styles. But in
case of XML documents, tags are not predefined. In order to understand and style an
XML document, World Wide Web Consortium (W3C) developed XSL which can act as
XML based Style sheet Language. An XSL document specifies how a browser should
render an XML document.
Following are the main parts of XSL −
• XSLT − used to transform XML document into various other types of document.
• X-Path − used to navigate XML document.
• XSL-FO − used to format XML document.

XSLT
XSLT, Extensible Style sheet Language Transformations, provides the ability to
transform XML data from one format to another automatically.
How XSLT Works
An XSLT style sheet is used to define the transformation rules to be applied on the
target XML document. XSLT style sheet is written in XML format. XSLT Processor
takes the XSLT style sheet and applies the transformation rules on the target XML
document and then it generates a formatted document in the form of XML, HTML, or
text format. This formatted document is then utilized by XSLT formatter to generate the
actual output which is to be displayed to the end-user.

Advantages
Here are the advantages of using XSLT −
• Independent of programming. Transformations are written in a separate XSL file
which is again an XML document.
• Output can be altered by simply modifying the transformations in XSL file. No
need to change any code. So Web designers can edit the style sheet and can
see the change in the output quickly.

XSLT Example
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>MyCDCollection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Das könnte Ihnen auch gefallen