Sie sind auf Seite 1von 11

551103 Advanced DBMS UNIT IV Notes by Prof. T.

MEYYAPPAN, DCS, ALU

UNIT - IV
XML DATABASES:XML
XML Hierarchical data model, XML Documents, DTD, XML Schema, XML
Querying, XHTML, Illustrative Experiments.

Hierarchical Database Model

A hierarchical database is DBMS that represent


represent data in a tree-like
tree form. The
relationship between records is one-to-many.
one many. That means, one parent node can have many
child nodes.
A hierarchical database model is a data model where data is stored as records but
linked in a tree-like
like structure with the help of a parent and level. Each record has only one
parent. The first record of the data model is a root record

Another Example for hierarchical structure:


Every book consists of a title, a preface (optional), a sequence of chapters, a
sequence of appendices(optional)
endices(optional) and an index (optional). Every chapter consists of a title
and nonempty sequence of sections. Every section consists of a title and a nonempty
sequence of paragraphs.

History of hierarchical databases


Hierarchical format was introduced
introduced by IBM in 1960s for mainframe systems.
Mainframe computers still use hierarchical databases. IBM IMS is one of the most popular
databases. IMS uses blocks of data known as segments. Each segment can contain several
pieces of data, which are called fields.
fields. Each segment can be load and read into computer
memory from the database.

Advantages of hierarchical databases


Hierarchical databases are useful when you need to represent data in a tree like
hierarchy. The perfect example of a hierarchical data model is the navigation file or sitemap
of a Website. A company organization chart is another example of a hierarchical database.

The key advantages of hierarchical databases are:


551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

• Traversing through a tree structure is very simple and fast due to its one-to-
many relationships format. Major several programming languages provide
functionality to read tree structure databases.
• Easy to understand due to its one-to-many relationships.
• Key disadvantages of hierarchical databases are:
• It’s rigid format of one-to-many relationships. That means, it doesn’t allow
more than one parent of a child.
• Multiple nodes with same parent will add redundant data.
• Moving one record from one level to other level could be challenging.

Disadvantages:

• When a user needs to store a record in a child table that is currently


unrelated to any record in a parent table, it gets difficulty in recording and
user must record an additional entry in the parent table.

• This type of database cannot support complex relationships, and there is also
a problem of redundancy, which can result in producing inaccurate
information due to the inconsistent recording of data at various sites.

Examples of Hierarchical Databases


The most popular hierarchical databases are IBM Information Management System
(IMS) and RDM Mobile. Windows Registry is another example of a real-world use cases of a
hierarchical database system.

XML

The name XML stands for eXtensible Markup Language. Markup is the meta data
that describes the data.
XML is a meta language that allows to define languages with user defined tags. XML
schema is the definition of such tailored language.

• XML document is a document created using XML features.


• XML documents are meant to be readable both by humans and machines.
• Relations are very tightly structured while XML documents have loose
structure.
• They are meant to be easy for application programs to process. Producers
and consumers of data has to agree on how to interpret markup.
• XML documents are used for Electronic Data Interchange(EDI) between
organizations.

Example:
<?xml version = “1.0”>
<greeting kind=”succint” >Hello, World </greeting>

First line is the declaration.


551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

Second line is an XML element consisting of a start tag(<greeting), some attribute


and its value (kind=”succinct”), and character data (Hello, World) and an end
tag</greeting>.

XML Document Structure

• An XML document is a basic unit of XML information composed of elements


and other markup in an orderly package.
• An XML document can contains wide variety of data. The root or document
node represents the entire document.
• The leaf nodes represent character data and various items such as XML
comments, white space and so on. The complete hierarchy is called
information set.
• Application program can retrieve, insert, delete and change nodes using API
called DOM(Document Object Model)

<?xml version = "1.0"?>


<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>

Document Prolog Section

Document Prolog comes at the top of the document, before the root element. This
section contains :

• XML declaration
• Document type declaration

Document Elements Section

Document Elements are the building blocks of XML. These divide the document into
a hierarchy of sections, each serving a specific purpose. You can separate a document into
multiple sections so that they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other elements.

XML elements can be defined as building blocks of an XML. Elements can behave as
containers to hold text, elements, attributes, media objects or all of these.

Each XML document contains one or more elements, the scope of which are either
delimited by start and end tags, or for empty elements, by an empty-element tag.

Following is the syntax to write an XML element :


551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

<element-name
name attribute1 attribute2>
....content
</element-name>

XML Database

Our PARTS relation (table) can be represented in XML as shown below:


<?xml version = “1.0”>
This is an XML representation of the PARTS relation -->
<!--This
<PartsRelation>
<PartTuple>
<PART_NUM> P1 </PART_NUM>
<PART_NAME> Nut </PART_NAME>
<COLOR> Red </COLOR
COLOR>
<WEIGHT> 12.0 </WEIGHT
WEIGHT>
<CITY> BOMBAY </PART_NUM>
</PartTuple>
<PartTuple>
<PART_NUM> P2 </PART_NUM>
<PART_NAME> Bolt </PART_NAME>
<COLOR> Green </COLOR>
<WEIGHT> 17.0 </WEIGHT>
<CITY> Chennai </PART_NUM>
</PartTuple>

XML Data Definition


Data Definition is the legal building blocks of an XML document. It is used to define
document structure with a list of legal elements and attributes. Such information can be
specified by means of either
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

a) Document Type Definition (DTD) constructed using DTD definition language


or
b) constructed using a language called XML Schema

Document Type Definition(DTD)

It defines the legal building blocks of an XML document. It is used to define


document structure with a list of legal elements and attributes.

Its main purpose is to define the structure of an XML document. It contains a list of
legal elements and define the structure with the help of them. It provides:

• The basic rules for defining and using markup languages based on XML - how
to write markups, character data, comments etc.
• The rules for how a program is designed to process XML documents (XML
parser).
• The rules for defining DTD
The revised version of our PARTS relation is given below:

<?xml version = “1.0”>


<!--This is an XML representation of the PARTS relation -->
<!DOCTYPE …>
<PartsRelation>
<NOTE>Revised Version</NOTE>
<PartTuple CITY=”Bombay”>
<PNUM> P1 </PNUM>
<PART_NAME> Nut </PART_NAME>
<WEIGHT> 12.0 </WEIGHT>
<NOTE>Part Color is Red by default</NOTE>
</PartTuple>
<PartTuple COLOR=”Green” CITY=”Chennai”>
<PNUM> P2 </PNUM>
<PART_NAME> Bolt </PART_NAME>
<WEIGHT> 17.0 </WEIGHT>
</PartTuple>
</PartsRelation>

DTD for the above XML document is shown below:

<!ELEMENT PartsRelation (NOTE?, PartTuple*”)>


<!ELEMENT NOTE (#PCDATA)>
<!ELEMENT PartTuple (PNUM, PART_NAME, WEIGHT, NOTE?)>
<!ATTLIST PartTuple
CITY (Bombay | Bangalore | Chennai) #REQUIRED
COLOR(Red | Green | Blue | “Red”)>
<!ELEMENT PNUM (#PCDATA)>
<!ELEMENT PART_NAME (#PCDATA)>
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

<!ELEMENT WEIGHT (#PCDATA)>

• PCD stands for Parsed Character Data


• Document that conforms to this DTD has exactly one root element called
PartsRelation
• Root element contains zero or more PartTuple elements (indicated by *)
Root element is optionally preceded by NOTE element (indicated by ? mark)
• Every PartTuple element contains exactly one PNUM element, one PART_NAME
element and one WEIGHT element ( in that order), optionally followed by NOTE
element.
• Every PartTuple must contain CITY attribute and optionally COLOR attribute. CITY
attribute must be Bombay, Bangalore or Chennai. COLOR attribute values must be
Red, Green, Blue and Red by default.
• DTD can be either internal or external
• Internal DTD is directly included in the document they describe)
• External DTD is stored in external file

In internal case, DTD precedes the root element. It must be enclosed in delimiters { }

<!DOCTYPE document type name


{ …..
}>

In external case, delimiters do not appear but reference to the external file appears as
shown below:

<!DOCTYPE PartsRelation SYSTEM “file:/c:/parts.dtd”>

Type ID and IDREF

DTDs do not support certain kinds of integrity constraints (legal values of attributes).
But, DTDs support certain uniqueness and referential constraints. For example,

<!ATTLIST PartTuple PNUM ID #REQUIRED>


<!ATTLIST SupplierTuple SNUM ID #REQUIRED>
<!ATTLIST ShipmentTuple PNUM IDREF #REQUIRED>

Attributes of ID behaves like primary keys. IDREF behaves like foreign keys.
PNUM is required attribute as it is the primary of the PARTS relation.
PNUM in Shipment Relation should find an entry in PARTS relation (referential integrity)

Limitations of DTDs

DTD support for integrity constraints is very weak. The following are the other
disadvantages:
• They do not use XML syntax. They can’t be processed by regular XML parser.
For example, !ELEMENT, #PCDATA and PNUM are not legal XML attributes.
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

• They do not provide no data type support (everything is just a character


string)
XML Schema
XML Schema is commonly known as XML Schema Definition (XSD). XML schema is
definition of the syntax rules that a conforming XML document is required to obey.

• It is used to describe and validate the structure and the content of XML data.
• XML schema defines the elements, attributes and data types.
• It is similar to a database schema that describes the data in a database.
• XML schema provides more extensive constraints than a DTD could.
• XML schema provides a set of built-in primitive types – String, Boolean,
decimal and derived data types integer, positiveinteger, negativeinteger and
so on.
• Data types can be simple or complex

Difference between XML and XSD


XSD defines elements and structures that can appear in the document, while XML
does not. XSD ensures that the data is properly interpreted, while XML does not.

XML SCHEMA FOR PARTS RELATION

<?xml version = “1.0” ?>


<-- XML Schema for Parts Relation documents -->
<!DOCTYPE xsd:schema SYSTEM “http://www.W3.org/2001/XMLSchema.dtd”>
<xsd: schema xmlns=” http://www.W3.org/2001/XMLSchema”>
<xsd: element name=”NOTE” type =”xsd:string”/>
<xsd: element name=”PartsRelation” >
<xsd: complexType>
<xsd: sequence>
<xsd: element ref=”NOTE” minoccurs=”0”/>
<xsd: element name=”PartTuple” type=”PartTupleType” minoccurs=”0”
maxoccurs=”unbounded”/>
</xsd: sequence>
</xsd: complexType>
</ xsd: element>

<xsd: complexType name=”PartTupleType”>


<xsd: sequence>
<xsd: element name=”PNUM” type=”PartNum” />
<xsd: element name=”PART_NAME” type=”xsd:string”/>
<xsd: element name=”WEIGHT”>
<xsd: simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:totalDigits value =”5”>
<xsd:fractionDigits value=”1” fixed=”true”>
</xsd:restriction>
</xsd: simpleType>
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

</xsd: element>
<xsd: element ref=”NOTE” minoccurs=”0”/>
</xsd: sequence>
<xsd: attribute name=”CITY” type = “City”>
<xsd: attribute name=”COLOR” type = “Color” default=”Red”/>
</xsd: complexType>

<xsd: simpleType name=”PartNum”>


<xsd:resitriction base=”xsd:string”>
<xsd: pattern value=”P[0-9] {1,3}”/>
</xsd:restriction>

<xsd: simpleType name=”Color”>


<xsd:resitriction base=”xsd:string”>
<xsd: enumeration value=”Red”/>
<xsd: enumeration value=”Green”/>
<xsd: enumeration value=”Blue”/>
</xsd:restriction>
</xsd: simpleType>

<xsd: simpleType name=”City”>


<xsd:resitriction base=”xsd:string”>
<xsd: enumeration value=”Bombay”/>
<xsd: enumeration value=”Bangalore”/>
<xsd: enumeration value=”Chennai”/>
</xsd:restriction>
</xsd: simpleType>
</xsd:schema>

-- xsd stands for xml schema definition.


-- Legal values for WEIGHT element is 0.1, 0.2, …9999.99 and minimum value should be 0.1.
-- PartNum values are P1, P2 which is given as a pattern P followed by 1 to 3 digits.
-- Color and City values are enumerated set of values.

XML Data Manipulation


XML data can be manipulated with XML query languages XQuery and XPath.

XQuery is derived from an earlier language Quilt. Quilt was influenced by SQL, OQL,
XQL ,XML-QL and Lorel.
• XQuery is read only.
• Updation of data must be done with DOM or by some proprietary
facilities(vendor-specific).
• XML documents are essentially character strings that are meant to be
readable by humans.
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

• XQuery do not operate on XML documents. The XML documents are


transformed to abstract(parsed) form. The abstract form is known as
instance of XQuery Data Model.
• Result of evaluating a Query is known as InfoSet.
XPath
XQuery relies heavily on XPath’s path expressions. XPath is an expression that,
starting from some given source node or nodes navigates along a specified path or paths to
find desired target node or nodes. XPath is a navigational language using addressing
mechanism.

Examples:
/PartsRelation/PartTuple returns sequence of nodes corresponding to PartTuple
elements. / is equivalent to . (dot) in relational calculus.

XQuery Example

To get the supplier name, part name, and shipment quantity for every shipment of
parts from suppliers, the XQuery is formulated as shown below:

<Result>
{
for $spx in document(“ShipmentsRelation.xml”)
//ShipmentTuple ,
for $sx in document(“SuppliersRelation.xml”)
//SupplierTuple[SNUM = $spx/SNUM] ,
for $px in document(“PartsRelation.xml”)
//PartTuple[PNUM = $spx/PNUM]
Order by SUPPLIER_NAME, PART_NAME
Return
<ResultTuple>
{ $sx / SUPPLIER_NAME, $px /PART_NAME, $spx /QUANTITY }
</ResultTuple>
}
</Result>

// returns the tuples of the specified relation starting from the SHIPMENTS relation (root
node) and ShipmentTuple(children of the root node). spx, sx and px are range variables over
shipment, suppliers and parts relations.

Result of the query is converted into XML form for target consumers.

The above XQuery is equivalent ot SQL query in relational calculus as shown below:
{SX.SUPPLIER_NAME, PX.PART_NAME, SPX.QUANTITY} WHERE SX.SNUM = SPX.SNUM AND PX.PNUM = SPX.PNUM

XML and DATABASES


XML Database is used to store huge amount of information in the XML format. The
data stored in the database can be queried using XQuery, serialized, and exported into a
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

desired format. In contrast, data can be retrieved from relational databases as a result of
some query and convert it into XML form, so that it can be transmitted to some consumer.

Following are the three ways of storing an XML document in a database:


1. Storing entire document as values of some attributes.(Documents as attribute
values)
2. Shred the document as various attributes from various tuples from various
relations.(Shred and Publish)
3. Storing the XML document as such in native XML database instead of relations.(XML
databases)

Documents as Attribute Values


1. A new data type XMLDOC is defined.
2. Values of XMLDOC are XML documents.
3. Specific attributes of relations can be defined of that XMLDOC type.
4. XML documents are five to ten times the size of raw data it represents. They can
be stored in compressed format to reduce storage size.
5. Tuples containing XMLDOC values can be inserted and deleted using INSERT and
DELETE conventional operators. UPDATE operator updates the entire XMLDOC.
Piece-wise updation is not possible.
6. Like other data types, XMLDOC has a set of associated operators. Operators are
similar to the ones found in XQuery.
7. Operators that check XML schema and DTD type compatibility should also be
provided.
This method is suitable when:
• The data is operated in their entirety.
• They are rarely updated.
• Searching is done on small set of attributes.

Shred and Publish

It does not involve any new data type. Instead, XML documents are shredded into
pieces and stored as values of various attributes of various relations in various places in the
database. In this case, the database does not contain XML documents. XML can be formed
by combining certain values in some ways.

Application programs can create XML document from regular database data using
the results of a query. This operation is called Publishing. It provides the ability to have
XML views in non-XML data. Shred and Publish approach is sometimes referred to as XML
collection.

This method is preferred when:


• The data already exists in a relation database and must interact with data in
XML documents.
• Operations or frequently performed on individual elements or attributes.
• Updates are frequent and update performance is important.
• Application programs use existing relational interfaces.
551103 Advanced DBMS UNIT IV Notes by Prof. T. MEYYAPPAN, DCS, ALU

XML Database
There are three different types of XML databases:
1. Native XML Database (NXD)
2. XML Enabled Database (XEDB)
3. Hybrid XML Databases (HXD):

Native XML Database

It defines a (logical) model for an XML document and stores and retrieves documents
according to that model. The model must include elements, attributes, PCDATA, and
document order.
Examples of such models are: XPath data model, XML Infoset.

a) The model must include elements, attributes, PCDATA, and document order.
b) It must have an XML document as its fundamental unit of (logical) storage, just as a
relational database has a row in a table as its fundamental unit of (logical) storage.
c) It is not required to have any particular underlying physical storage model. It can be
built on a relational, hierarchical, or object-oriented database, or use a proprietary
storage format such as indexed, compressed files.

XML Enabled Database (XEDB)

A database that has an added XML mapping layer provided either by the database
vendor or a third party. This mapping layer manages the storage and retrieval of XML data.
Data that is mapped into the database is mapped into application specific formats and the
original XML meta-data and structure may be lost. Data manipulation may occur via either
XML specific technologies (e.g. XPath, XSLT, DOM or SAX) or other database technologies
(e.g. SQL). The fundamental unit of storage in an XML Enabled Database is implementation
dependent.

Hybrid XML Databases (HXD)

A database that can be treated as either a Native XML Database or as an XML


Enabled Database depending on the requirements of the application.

Das könnte Ihnen auch gefallen