Sie sind auf Seite 1von 81

PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER

PREPARED BY ARUN PRATAP SINGH 1



1

DISTRIBUTED DATABASES INTRODUCTION :
o A distributed database (DDB) is a collection of multiple, logically interrelated
databases distributed over a computer network.

o A distributed database management system (DDBMS) is the software that manages
the DDB and provides an access mechanism that makes this distribution
transparent to the users.

A distributed database is a database in which storage devices are not all attached to a common
processing unit such as the CPU, controlled by a distributed database management system (together
sometimes called a distributed database system). It may be stored in multiple computers, located in
the same physical location; or may be dispersed over a network of interconnected computers. Unlike
parallel systems, in which the processors are tightly coupled and constitute a single database system,
a distributed database system consists of loosely-coupled sites that share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple physical
locations. A distributed database can reside on network servers on the Internet, on
corporate intranets or extranets, or on other company networks. Because they store data across
multiple computers, distributed databases can improve performance at end-user worksites by allowing
transactions to be processed on many machines, instead of being limited to one.
[2]

Two processes ensure that the distributed databases remain up-to-date and
current: replication and duplication.
UNIT : III


PREPARED BY ARUN PRATAP SINGH 2

2
1. Replication involves using specialized software that looks for changes in the distributive
database. Once the changes have been identified, the replication process makes all the
databases look the same. The replication process can be complex and time-consuming
depending on the size and number of the distributed databases. This process can also require
a lot of time and computer resources.
2. Duplication, on the other hand, has less complexity. It basically identifies one database as
a master and then duplicates that database. The duplication process is normally done at a set
time after hours. This is to ensure that each distributed location has the same data. In the
duplication process, users may change only the master database. This ensures that local data
will not be overwritten.

A database user accesses the distributed database through:
Local applications
-applications which do not require data from other sites.
Global applications
-applications which do require data from other sites.
A homogeneous distributed database has identical software and hardware running all
databases instances, and may appear through a single interface as if it were a single
database. A heterogeneous distributed database may have different hardware, operating
systems, database management systems, and even data models for different databases.

A DDBMS mainly classified into two types:
Homogeneous Distributed database management systems
Heterogeneous Distributed database management systems

Homogeneous DDBMS :-

In a homogeneous distributed database all sites have identical software and are aware
of each other and agree to cooperate in processing user requests.
The homogeneous system is much easier to design and manage
The operating system used, at each location must be same or compatible.
The database application (or DBMS) used at each location must be same or compatible.

In a homogeneous distributed database all sites have identical software and are aware of each other
and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms
of right to change schema or software. A homogeneous DDBMS appears to the user as a single

PREPARED BY ARUN PRATAP SINGH 3

3
system. The homogeneous system is much easier to design and manage. The following conditions
must be satisfied for homogeneous database:
The operating system used, at each location must be same or compatible
The data structures used at each location must be same or compatible.
The database application (or DBMS) used at each location must be same or compatible.

Heterogeneous DDBMS :-

In a heterogeneous distributed database different sites may use different schema and
software.
In heterogeneous systems, different nodes may have different hardware & software and
data structures at various nodes or locations are also incompatible.
Different computers and operating systems, database applications or data models may
be used at each of the locations.

In a heterogeneous distributed database, different sites may use different schema and software.
Difference in schema is a major problem for query processing and transaction processing. Sites may
not be aware of each other and may provide only limited facilities for cooperation in transaction
processing. In heterogeneous systems, different nodes may have different hardware & software and
data structures at various nodes or locations are also incompatible. Different computers and operating
systems, database applications or data models may be used at each of the locations. For example,
one location may have the latest relational database management technology, while another location
may store data using conventional files or old version of database management system. Similarly, one
location may have the Windows NT operating system, while another may have UNIX. Heterogeneous
systems are usually used when individual sites use their own hardware and software. On
heterogeneous system, translations are required to allow communication between different sites (or
DBMS). In this system, the users must be able to make requests in a database language at their local
sites. Usually the SQL database language is used for this purpose. If the hardware is different, then
the translation is straightforward, in which computer codes and word-length is changed. The
heterogeneous system is often not technically or economically feasible. In this system, a user at one
location may be able to read but not update the data at another location.
Advantages :
Increase reliability and availability
Easier expansion
Reliable transactions - due to replication of the database
Hardware, operating-system, network, fragmentation, DBMS, replication and location
independence
Economics it may cost less to create a network of smaller computers with the power of a
single large computer
Disadvantages :
Additional software is required
Operating system should support distributed environment

PREPARED BY ARUN PRATAP SINGH 4

4
Concurrency control poses a major issue. It can be solved by locking and timestamping.
Distributed access to data
Analysis of distributed data

DISTRIBUTED DATABASE ARCHITECTURE :
A distributed database system allows applications to access data from local and remote databases. In
a homogenous distributed database system, each database is an Oracle Database. In
a heterogeneous distributed database system, at least one of the databases is not an Oracle Database.
Distributed databases use a client/server architecture to process information requests.


PREPARED BY ARUN PRATAP SINGH 5

5

Homogenous Distributed Database Systems :-
A homogenous distributed database system is a network of two or more Oracle Databases that reside on
one or more machines. Figure 29-1 illustrates a distributed system that connects three databases: hq, mfg,
and sales. An application can simultaneously access or modify the data in several databases in a single
distributed environment. For example, a single query from a Manufacturing client on local database mfg
can retrieve joined data from the products table on the local database and the dept table on the remote hq
database.


PREPARED BY ARUN PRATAP SINGH 6

6

Heterogeneous Distributed Database Systems :-

In a heterogeneous distributed database system, at least one of the databases is a non-Oracle Database
system. To the application, the heterogeneous distributed database system appears as a single, local,
Oracle Database. The local Oracle Database server hides the distribution and heterogeneity of the data.

The Oracle Database server accesses the non-Oracle Database system using Oracle Heterogeneous
Services in conjunction with an agent. If you access the non-Oracle Database data store using an Oracle
Transparent Gateway, then the agent is a system-specific application. For example, if you include a Sybase
database in an Oracle Database distributed system, then you need to obtain a Sybase-specific transparent
gateway so that the Oracle Database in the system can communicate with it.

Client/Server Database Architecture :-

A database server is the Oracle software managing a database, and a client is an application that requests
information from a server. Each computer in a network is a node that can host one or more databases.
Each node in a distributed database system can act as a client, a server, or both, depending on the situation.

In Figure 29-2, the host for the hq database is acting as a database server when a statement is issued
against its local data (for example, the second statement in each transaction issues a statement against
the local dept table), but is acting as a client when it issues a statement against remote data (for example,
the first statement in each transaction is issued against the remote table emp in the sales database).


PREPARED BY ARUN PRATAP SINGH 7

7


DISTRIBUTED DATABASE SYSTEM DESIGN :
In a distributed system, data are physically distributed among several sites but it provides a
view of single logical database to its users. Each node of a distributed database system may
follow the three-tier architecture like the centralized database management system (DBMS).
Thus, the design of a distributed database system involves the design of a global conceptual
schema, in addition to the local schemas, which conform to the three-tier architecture of the
DBMS in each site. The design of computer network across the sites of a distributed system
adds extra complexity to the design issue. The crucial design issue involves the distribution
of data among the sites of the distributed system. Therefore, the design and implementation
of the distributed database system is a very complicated task and it involves three important
factors as listed in the following.
Fragmentation A global relation may be divided into several non-overlapping
subrelations called fragments, which are then distributed among sites.
Allocation Allocation involves the issue of allocating fragments among sites in a
distributed system. Each fragment is stored at the site with optimal distribution.
Replication The distributed database system may maintain several copies of a
fragment at different sites.

PREPARED BY ARUN PRATAP SINGH 8

8
Design Strategies:-



In this process, the database design starts from the global schema design and proceeds by
designing the fragmentation of the database, and then by allocating the fragments to the different
sites, creating the physical images. The process is completed by performing the physical design
of the data at each site, which is allocated to it. The global schema design involves both designing
of global conceptual schema and global external schemas (view design). In global conceptual
schema designing step, the user needs to specify the data entities and to determine the
applications that will run on the database as well as statistical information about these
applications. At this stage, the design of local conceptual schemas is considered. The objective
of this step is to design local conceptual schemas by distributing the entities over the sites of the
distributed system. Rather than distributing relations, it is quite common to partition relations into
subrelations, which are then distributed to different sites. Thus, in a top-down approach, the
distributed database design involves two phases, namely, fragmentation and allocation.

PREPARED BY ARUN PRATAP SINGH 9

9

The fragmentation phase is the process of clustering information in fragments that can be
accessed simultaneously by different applications, whereas the allocation phase is the process
of distributing the generated fragments among the sites of a distributed database system. In the
top-down design process, the last step is the physical database design, which maps the local
conceptual schemas into physical storage devices available at corresponding sites. Top-down
design process is the best suitable for those distributed systems that are developed from scratch.

In the bottom-up design process, the issue of integration of several existing local schemas into a
global conceptual schema is considered to develop a distributed system. When several existing
databases are aggregated to develop a distributed system, the bottom-up design process is
followed. This process is based on the integration of several existing schemas into a single global
schema. It is also possible to aggregate several existing heterogeneous systems for constructing
a distributed database system using the bottom-up approach. Thus, the bottom-up design process
requires the following steps:

The selection of a common database model for describing the global schema of the
database
The translation of each local schema into the common data model
The integration of the local schemas into a common global schema.
Any one of the above design strategies is followed to develop a distributed database system.


PREPARED BY ARUN PRATAP SINGH 10

10




PREPARED BY ARUN PRATAP SINGH 11

11



PREPARED BY ARUN PRATAP SINGH 12

12



DISTRIBUTED QUERY PROCESSING :
Query Processing Basics
centralized query processing
distributed query processing
The retrieval of data from different sites in a network is known as distributed query
processing.

PREPARED BY ARUN PRATAP SINGH 13

13

Step 1 Query Decomposition :-
o Normalization
o Analysis
o Simplification
o Restructuring

Step 2 Data Localization

Step 3 Global Query Optimization

Step 4 Local Optimization


PREPARED BY ARUN PRATAP SINGH 14

14



PREPARED BY ARUN PRATAP SINGH 15

15


PREPARED BY ARUN PRATAP SINGH 16

16


PREPARED BY ARUN PRATAP SINGH 17

17




PREPARED BY ARUN PRATAP SINGH 18

18



PREPARED BY ARUN PRATAP SINGH 19

19


PREPARED BY ARUN PRATAP SINGH 20

20


PREPARED BY ARUN PRATAP SINGH 21

21


PREPARED BY ARUN PRATAP SINGH 22

22



PREPARED BY ARUN PRATAP SINGH 23

23



PREPARED BY ARUN PRATAP SINGH 24

24



PREPARED BY ARUN PRATAP SINGH 25

25


PHF PRIMARY HORIZONTAL FRAGMENTATION

PREPARED BY ARUN PRATAP SINGH 26

26



PREPARED BY ARUN PRATAP SINGH 27

27



PREPARED BY ARUN PRATAP SINGH 28

28
VF VERTICAL FRAGMENTATION

DHF DERIVED HORIZONTAL FRAGMENTATION

PREPARED BY ARUN PRATAP SINGH 29

29



PREPARED BY ARUN PRATAP SINGH 30

30



PREPARED BY ARUN PRATAP SINGH 31

31



PREPARED BY ARUN PRATAP SINGH 32

32
CONCURRENCY CONTROL IN DISTRIBUTED DATABASE :
Concurrency Control: In distributed database systems, database is typically used by many
users. These systems usually allow multiple transactions to run concurrently i.e. at the same time.
Concurrency control is the activity of coordinating concurrent accesses to a database in a
multiuser database management system (DBMS). Concurrency control permits users to access
a database in a multi-programmed fashion while preserving the illusion that each user is executing
alone on a dedicated system. The main technical difficulty in attaining this goal is to prevent
database updates performed by one user from interfering with database retrievals and updates
performed by another. When the transactions are updating data concurrently, it may lead to
several problems with the consistency of the data.

Distributed Concurrency Control Algorithms:
In this paper, we consider some of the distributed concurrency control algorithms. We summarize
the salient aspects of these four algorithms in this section. In order to do this, we must first explain
the structure that we have assumed for distributed transactions. Before discussing the algorithms,
we need to get an idea about the distributed transactions. Distributed Transaction: A distributed
transaction is a transaction that runs in multiple processes, usually on several machines. Each
process works for the transaction. Distributed transaction processing systems are designed to
facilitate transactions that span heterogeneous, transaction-aware resource managers in a
distributed environment. The execution of a distributed transaction requires coordination between
a global transaction management system and all the local resource managers of all the involved
systems. The resource manager and transaction processing monitor are the two primary elements
of any distributed transactional system. Distributed transactions, like local transactions, must
observe the ACID properties. However, maintenance of these properties is very complicated for
distributed transactions because a failure can occur in any process. If such a failure occurs, each
process must undo any work that has already been done on behalf of the transaction. A distributed
transaction processing system maintains the ACID properties in distributed transactions by using
two features:
overable processes log their actions and therefore can restore
earlier states if a failure occurs.
or aborting of a transaction. The most common commit protocol is the two-phase commit protocol.
Distributed Two-Phase Locking (2PL):

In order to ensure serializability of parallel executed transactions elaborated different methods of
concurrency control. One of these methods is locking method. There are different forms of locking
method. Two phase locking protocol is one of the basic concurrency control protocols in
distributed database systems. The main approach of this protocol is read any, write all.
Transactions set read locks on items that they read, and they convert their read locks to write
locks on items that need to be updated. To read an item, it suffices to set a read lock on any copy
of the item, so the local copy is locked; to update an item, write locks are required on all copies.
Write locks are obtained as the transaction executes, with the transaction blocking on a write
request until all of the copies of the item to be updated have been successfully locked. All locks
are held until the transaction has successfully committed or aborted [2]. The 2PL Protocol
oversees locks by determining when transactions can acquire and release locks. The 2PL
protocol forces each transaction to make a lock or unlock request in two steps:


PREPARED BY ARUN PRATAP SINGH 33

33


The transaction first enters into the Growing Phase, makes requests for required locks, then gets
into the Shrinking phase where it releases all locks and cannot make any more requests.
Transactions in 2PL Protocol should get all needed locks before getting into the unlock phase.
While the 2PL protocol guarantees serializability, it does not ensure that deadlocks do not happen.
So deadlock is a possibility in this algorithm, Local deadlocks are checked for any time a
transaction blocks, and are resolved when necessary by restarting the transaction with the most
recent initial startup time among those involved in the deadlock cycle. Global deadlock detection
is handled by a Snoop process, which periodically requests waits-for information from all sites
and then checks for and resolves any global deadlocks.

Wound-Wait (WW):
The second algorithm is the distributed wound-wait locking algorithm. It follows the same
approach as the 2 PL protocol. The difference lies in the fact that it differs from 2PL in its handling
of the deadlock problem: unlike 2PL protocol, rather than maintaining waits-for information and
then checking for local and global deadlocks, deadlocks are prevented via the use of timestamps
in this algorithm. Each transaction is numbered according to its initial startup time, and younger
transactions are prevented from making older ones wait. If an older transaction requests a lock,
and if the request would lead to the older transaction waiting for a younger transaction, the
younger transaction is wounded it is restarted unless it is already in the second phase of its
commit protocol. Younger transactions can wait for older transactions so that the possibility of
deadlocks is eliminated [2].

t(T1) > t(T2) -: If requesting transaction [t(T1)] is younger than the transaction [t(T2)] that has
holds lock on requested data item then requesting transaction [t(T1)] has to wait. t(T1) < t(T2) -:
If requesting transaction [t(T1)] is older than the transaction [t(T2)] that has holds lock on
requested data item then requesting transaction [t(T1)] has to abort or rollback.

Basic Timestamp Ordering (BTO):
A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically,
timestamp values are assigned in the order in which the transactions are submitted to the system,
so a timestamp can be thought of as the transaction start time. The third algorithm is the basic
timestamp ordering algorithm. The idea for this scheme is to order the transactions based on their
timestamps. A schedule in which the transactions participate is then serializable, and the
equivalent serial schedule has the transactions in order of their timestamp values. This is called

PREPARED BY ARUN PRATAP SINGH 34

34
timestamp ordering (TO). Like wound-wait, it employs transaction startup timestamps, but it uses
them differently. BTO associates timestamps with all recently accessed data items and requires
that conflicting data accesses by transactions be performed in timestamp order instead of using
locking approach. Transactions that attempt to perform out-of-order accesses are restarted. When
a read request is received for an item, it is permitted if the timestamp of the requester exceeds
the items write timestamp. When a write request is received, it is permitted if the requesters
timestamp exceeds the read timestamp of the item; in the event that the timestamp of the
requester is less than the write timestamp of the item, the update is simply ignored [2]. For
replicated data, the read any, write all approach is used, so a read request may be sent to any
copy while a write request must be sent to all copies. Integration of the algorithm with two phase
commit is accomplished as follows: Writers keep their updates in a private workspace until commit
time.
Distributed Optimistic(OPT):

The fourth algorithm is the distributed, timestamp-based, optimistic concurrency control algorithm.
which operates by exchanging certification information during the commit protocol. For each data
item, a read timestamp and a write timestamp are maintained. Transactions may read and update
data items freely, storing any updates into a local workspace until commit time. For each read,
the transaction must remember the version identifier (i.e., write timestamp) associated with the
item when it was read. Then, when all of the transactions cohorts have completed their work, and
have reported back to the master, the transaction is assigned a globally unique timestamp. This
time stamp is sent to each cohort in the prepare to commit message ,and it is used to locally
certify all of its reads and writes as follows [2]:
A read request is certified if-:
(i) The version that was read is still the current version of the item, and
(ii) No write with a newer timestamp has already been locally certified.
A write request is certified if-:
(i) No later reads have been certified and subsequently committed, and
(ii) No later reads have been locally certified already [2].
Concurrency control is the activity of coordinating concurrent accesses to a database in
a multi-user database management system (DBMS)
Several problems
1. The lost update problem.
2. The temporary update problem
3. The incorrect summary problem

As an example, consider an on-line airline reservation system. Suppose two customers Customer
A and Customer B, simultaneously try to reserve a seat for the same flight. In the absence of
concurrency control, these two activities could interfere as illustrated in Figure 1. Let Seat No 18
be the first available seat. Both transactions could read the reservation information approximately
same time and they reserve the seat No 18 for Customer A and Customer B, and store the result
back into the database. The net effect is incorrect: Although two customers reserved a seat, the
database reflects only one activity, the other reservation is lost by the system.

PREPARED BY ARUN PRATAP SINGH 35

35


RECOVERY CONTROL IN DISTRIBUTED DATABASES :
As with local recovery, distributed database recovery aims to maintain the atomicity and durability
of distributed transactions. A database must guarantee that all statements in a transaction,
distributed or non-distributed, either commit or roll back as a unit. The effects of an ongoing
transaction should be invisible to all other transactions at all sites. This transparency should be
true for transactions that include any type of operations, including queries, updates or remote
procedure calls. In a distributed database environment also the database management system
must coordinate transaction control with these characteristics over a communication network and
maintain data consistency, even if network or system failure occurs.
In DDBMS, a given transaction is submitted at some one site, but it can access data at other sites
as well. When a transaction is submitted at some one site, the transaction manager at that site
breaks it up into a collection of one or more sub-transactions that execute at different sites. The
transaction manager then submits these sub-transactions to the transaction managers at the
other sites and coordinates their activities. To ensure the atomicity of the global transaction, the
DDBMS must ensure that sub-transactions of the global transaction either all commit or all abort.
Recovery Control in distributed database is based on the two-phase commit protocol. The two
phase commit protocol is the transaction protocol duo to which all nodes and databases agree
with each other to commit a transaction. This protocol is required in an environment where single
transaction can interact with multiple independent resource managers as in case of distributed
databases. It also support data integrity by ensuring that modifications made to transactions are
either committed by all the databases involved in the distributed system or rolled back by all the
databases.

PREPARED BY ARUN PRATAP SINGH 36

36
The two phases commit protocol works in two phases. The first phase is called the prepare phase
during which the updates are recorded in a transaction log file, and the resource through a
resource manager indicates that it is ready to make the changes. Resources can vote either to
commit phase depend on the vote of resources. If all resources vote to commit then, all the
resources participating in the transaction are updated whereas if one or more of the resources
vote to roll back, then, all the resources are rolled back to their previous state.
Consider an example, in which an interaction between a coordinator at a local site and a
participant at a remote site takes place and a transaction has requested the commit operation. In
the first phase, the coordinator instructs the participants to get ready and sends the get ready
message at time. Participants make an entry in it log and send the ok message as
acknowledgement to the coordinator. The coordinator then, writes an entry in the log, takes a final
decision and sends it to the participants.

Prepare Phase
Coordinator receives a commit request
Coordinator instructs all resource managers to get ready to go either way on the
transaction. Each resource manager writes all updates from that transaction to its
own physical log
Coordinator receives replies from all resource managers. If all are ok, it writes
commit to its own log; if not then it writes rollback to its log
Commit Phase
Coordinator then informs each resource manager of its decision and broadcasts a
message to either commit or rollback (abort). If the message is commit, then each
resource manager transfers the update from its log to its database
A failure during the commit phase puts a transaction in limbo. This has to be
tested for and handled with timeouts or polling

WEB DATABASES :
The World Wide Web (WWW)popularly known as "the Web"originally developed in
Switzerland at CERN (Note 1) in early 1990 as a large-scale hypermedia information service
system for biological scientists to share information (Note 2). Today this technology allows
universal access to this shared information to anyone having access to the Internet and the Web
contains hundreds of millions of Web pages within the reach of millions of users.
In Web technology, a basic client-server architecture underlies all activities. Information is stored
on computers designated as Web servers in publicly accessible shared files encoded using
HyperText Markup Language (HTML). A number of tools enable users to create Web pages
formatted with HTML tags, freely mixed with multimedia contentfrom graphics to audio and
even to video. A page has many interspersed hyperlinksliterally a link that enables a user to
"browse" or move from one page to another across the Internet. This ability has given a

PREPARED BY ARUN PRATAP SINGH 37

37
tremendous power to end users in searching and navigating related informationoften across
different continents.
Information on the Web is organized according to a Uniform Resource Locator (URL)
something similar to an address that provides the complete pathname of a file. The pathname
consists of a string of machine and directory names separated by slashes and ends in a filename.
For example, the table of contents of this book is currently at the following URL:
http://cseng.aw.com/book/0,,0805317554,00.html
A URL always begins with a hypertext transport protocol (http), which is the protocol used by
the Web browsers, a program that communicates with the Web server, and vice versa. Web
browsers interpret and present HTML documents to users. Popular Web browsers include the
Internet Explorer of Microsoft and the Netscape Navigator. A collection of HTML documents and
other files accessible via the URL on a Web server is called a Web site. In the above URL,
"www.awl.com" may be called the Web site of Addison Wesley Publishing.
Providing Access to Databases on the World Wide Web
Todays technology has been moving rapidly from static to dynamic Web pages, where content
may be in a constant state of flux. The Web server uses a standard interface called the Common
Gateway Interface (CGI) to act as the middlewarethe additional software layer between the
user interface front-end and the DBMS back-end that facilitates access to heterogeneous
databases. The CGI middleware executes external programs or scripts to obtain the dynamic
information, and it returns the information to the server in HTML, which is given back to the
browser.
As the Web undergoes its latest transformations, it has become necessary to allow users access
not only to file systems but to databases and DBMSs to support query processing, report
generation, and so forth. The existing approaches may be divided into two categories:
1. Access using CGI scripts: The database server can be made to interact with the Web server
via CGI. Figure 27.01 shows a schematic for the database access architecture on the Web
using CGI scripts, which are written in languages like PERL, Tcl, or C. The main
disadvantage of this approach is that for each user request, the Web server must start a
new CGI process: each process makes a new connection with the DBMS and the Web
server must wait until the results are delivered to it. No efficiency is achieved by any
grouping of multiple users requests; moreover, the developer must keep the scripts in the
CGI-bin subdirectories only, which opens it to a possible breach of security. The fact that
CGI has no language associated with it but requires database developers to learn PERL
or Tcl is also a drawback. Manageability of scripts is another problem if the scripts are
scattered everywhere.

PREPARED BY ARUN PRATAP SINGH 38

38


2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems to
allow access to relational databases through the execution of SQL statements. It is a way
of connecting with databases, without any additional processes for each client request.
Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Base
connectivity as many believe. JDBC has the capabilities to connect to a database, send
SQL statements to a database and to retrieve the results of a query using the Java classes
Connection, Statement, and Result Set respectively. With Javas claimed platform
independence, an application may run on any Java-capable browser, which loads the Java
code from the server and runs it on the clients browser. The Java code is DBMS
transparent; the JDBC drivers for individual DBMSs on the server end carry the task of
interacting with that DBMS. If the JDBC driver is on the client, the application runs on the
client and its requests are communicated to the DBMS directly by the driver. For standard
SQL requests, many RDBMSs can be accessed this way. The drawback of using JDBC
is the prospect of executing Java through virtual machines with inherent efficiency. The
JDBC bridge to Object Database Connectivity (ODBC) remains another way of getting to
the RDBMSs.
Besides CGI, other Web server vendors are launching their own middleware products for
providing multiple database connectivity. These include Internet Server API (ISAPI) from
Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web
access option provided by Informix. Other DBMS vendors already have, or will have similar
provisions to support database access on the Web.

PREPARED BY ARUN PRATAP SINGH 39

39
THE WEB INTEGRATION OPTION OF INFORMIX :
Informix has addressed the limitations of CGI and the incompatibilities of CGI, NSAPI, and ISAPI
by creating the Web Integration Option (WIO). WIO eliminates the need for scripts. Developers
use tools to create intelligent HTML pages called Application Pages (or App Pages) directly within
the database. They execute SQL statements dynamically, format the results inside HTML, and
return the resulting Web page to the end users. The schematic architecture is shown in Figure
27.02. WIO uses the Web Driver, a lightweight CGI process that is invoked when a URL request
is received by the Web server. A unique session identifier is generated for each request but the
WIO application is persistent and does not terminate after each request.

When the WIO application receives a request from the Web driver, it connects to the database
and executes Web Explode, a function that executes queries within Web pages and formats
results as a Web page that goes back to the browser via the Web driver.
Informix HTML tag extensions allow Web authors to create applications that can dynamically
construct Web page templates from the Informix Dynamic Server and present them to the end
users. WIO also lets users create their own customized tags to perform specialized tasks. Thus,
without resorting to any programming or script development, powerful applications can be
designed. Another feature of WIO helps transaction-oriented applications by providing an
application programming interface (API) that offers a collection of basic services such as
connection and session management that can be incorporated into Web application.

PREPARED BY ARUN PRATAP SINGH 40

40
WIO supports applications developed in C, C++, and Java. This flexibility lets developers port
existing applications to the Web or develop new applications in these languages. The WIO is
integrated with Web server software and utilizes the native security mechanism of the Informix
Dynamic Server. The open architecture of WIO allows the use of various Web browsers and
servers.

THE ORACLE WEBSERVER :
ORACLE supports Web access to databases using the components shown in Figure 27.03. The
client requests files that are called "static" or "dynamic" files from the Web server. Static files have
a fixed content whereas dynamic files may have content that includes results of queries to the
database.There is an HTTP demon (a process that runs continuously) called Web Listener
running on the server that listens for the requests originating in the clients. A static file (document)
is retrieved from the file system of the server and displayed on the Web browser at the client.
Request for a dynamic page is passed by the listener to a Web request broker (WRB), which is a
multi-threaded dispatcher that adheres to cartridges. Cartridges are software modules
(mentioned earlier in Section 13.2.6) that perform specific functions on specific types of data; they
can communicate among themselves. Currently cartridges are provided for PL/SQL, Java, and
Live HTML; customized cartridges may be provided as well.




PREPARED BY ARUN PRATAP SINGH 41

41
OPEN PROBLEMS WITH WEB DATABASES :
The Web is an important factor in planning for enterprise-wide computing environments, both for
providing external access to the enterprises systems and information for customers and suppliers
and for marketing and advertising purposes. At the same time, due to security requirements,
employees of some organizations are restricted to operate within intranetssubnetworks that
cannot be accessed freely from the outside world. Among the prominent applications of the
intranet and the WWW are databases to support electronic storefronts, parts and product
catalogs, directories and schedules, newsstands, and bookstores. Electronic commercethe
purchasing of products and services electronically on the Internetis likely to become a major
application supported by such databases.
The future challenges of managing databases on the Web will be many, among them the
following:
Web technology needs to be integrated with the object technology. Currently, the web can
be viewed as a distributed object system, with HTML pages functioning as objects
identified by the URL.
HTML functionality is too simple to support complex application requirements. As we saw,
the Web Integration Option of Informix adds further tags to HTML. In general, additional
facilities will be needed to (1) make Web clients function as application front ends,
integrating data from multiple heterogeneous databases; (2) make Web clients present
different views of the same data to different users; and (3) make Web clients "intelligent"
by providing additional data mining functionality (see Section 26.2).
Web page content can be made more dynamic by adding more "behavior" to it as an object
(see Chapter 11 for a discussion of object modeling). In this respect (1) client and server
objects (HTML pages) can be made to interact; (2) Web pages can be treated as
collections of programmable objects; and (3) client-side code can access these objects
and manipulate them dynamically.
The support for a large number of clients coupled with reasonable response times for queries
against very large (several tens of gigabytes in size) databases will be major challenges
for Web databases. They will have to be addressed both by Web servers and by the
underlying DBMSs.

Efforts are underway to address the limitations of the current data structuring technology,
particularly by the World Wide Web Consortium (W3C). The W3C is designing a Web Object
Model. W3C is also proposing an Extensible Markup Language (XML) for structured document
interchange on the Web. XML defines a subset of SGML (the Standard Generalized Markup
Language), allowing customization of markup languages with application-specific tags. XML is
rapidly gaining ground due to its extensibility in defining new tags. W3Cs Document Object
Model (DOM) defines an object-oriented API for HTML or XML documents presented by a Web
client. W3C is also defining metadata modeling standards for describing Internet resources.

MULTIMEDIA DATABASES :
A multimedia system is a computer controlled integration of medial information objects
of different types (text, images, audio, video,). The integration refers to:
Data modeling
Storage

PREPARED BY ARUN PRATAP SINGH 42

42
Presentation
Time synchronization
A promise is that the media must be digitally represented, or at least digitally controllable.

In the years ahead multimedia information systems are expected to dominate our daily lives. Our
houses will be wired for bandwidth to handle interactive multimedia applications. Our high-
definition TV/computer workstations will have access to a large number of databases, including
digital libraries that will distribute vast amounts of multisource multimedia content.
The Nature of Multimedia Data and Applications
Nature of Multimedia Applications
In Section 23.3 we discussed the advanced modeling issues related to multimedia data. We also
examined the processing of multiple types of data in Chapter 13 in the context of object relational
DBMSs (ORDBMSs). DBMSs have been constantly adding to the types of data they support.
Today the following types of multimedia data are available in current systems:

Text: May be formatted or unformatted. For ease of parsing structured documents,
standards like SGML and variations such as HTML are being used.
Graphics: Examples include drawings and illustrations that are encoded using some
descriptive standards (e.g., CGM, PICT, postscript).
Images: Includes drawings, photographs, and so forth, encoded in standard formats such
as bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG. These
images are not subdivided into components. Hence querying them by content (e.g., find
all images containing circles) is nontrivial.
Animations: Temporal sequences of image or graphic data.
Video: A set of temporally sequenced photographic data for presentation at specified
ratesfor example, 30 frames per second.
Structured audio: A sequence of audio components comprising note, tone, duration, and
so forth.
Audio: Sample data generated from aural recordings in a string of bits in digitized form.
Analog recordings are typically converted into digital form before storage.
Composite or mixed multimedia data: A combination of multimedia data types such as
audio and video which may be physically mixed to yield a new storage format or logically
mixed while retaining original types and formats. Composite data also contains
additional control information describing how the information should be rendered.
Nature of Multimedia Applications
Multimedia data may be stored, delivered, and utilized in many different ways. Applications may
be categorized based on their data management characteristics as follows:
Repository applications: A large amount of multimedia data as well as metadata is stored
for retrieval purposes. A central repository containing multimedia data may be
maintained by a DBMS and may be organized into a hierarchy of storage levelslocal
disks, tertiary disks and tapes, optical disks, and so on. Examples include repositories of
satellite images, engineering drawings and designs, space photographs, and radiology
scanned pictures.

PREPARED BY ARUN PRATAP SINGH 43

43
Presentation applications: A large number of applications involve delivery of multimedia
data subject to temporal constraints. Audio and video data are delivered this way; in
these applications optimal viewing or listening conditions require the DBMS to deliver
data at certain rates offering "quality of service" above a certain threshold. Data is
consumed as it is delivered, unlike in repository applications, where it may be processed
later (e.g., multimedia electronic mail). Simple multimedia viewing of video data, for
example, requires a system to simulate VCR-like functionality. Complex and interactive
multimedia presentations involve orchestration directions to control the retrieval order of
components in a series or in parallel. Interactive environments must support capabilities
such as real-time editing analysis or annotating of video and audio data.
Collaborative work using multimedia information: This is a new category of applications in
which engineers may execute a complex design task by merging drawings, fitting
subjects to design constraints, and generating new documentation, change notifications,
and so forth. Intelligent healthcare networks as well as telemedicine will involve doctors
collaborating among themselves, analyzing multimedia patient data and information in
real time as it is generated.

All of these application areas present major challenges for the design of multimedia database
systems.
DATA MANAGEMENT ISSUES :
Multimedia applications dealing with thousands of images, documents, audio and video
segments, and free text data depend critically on appropriate modeling of the structure and
content of data and then designing appropriate database schemas for storing and retrieving
multimedia information. Multimedia information systems are very complex and embrace a large
set of issues, including the following:
Modeling: This area has the potential for applying database versus information retrieval
techniques to the problem. There are problems of dealing with complex objects (see
Chapter 11) made up of a wide range of types of data: numeric, text, graphic (computer-
generated image), animated graphic image, audio stream, and video sequence.
Documents constitute a specialized area and deserve special consideration.
Design: The conceptual, logical, and physical design of multimedia databases has not been
addressed fully, and it remains an area of active research. The design process can be
based on the general methodology described in Chapter 16, but the performance and
tuning issues at each level are far more complex.
Storage: Storage of multimedia data on standard disklike devices presents problems of
representation, compression, mapping to device hierarchies, archiving, and buffering
during the input/output operation. Adhering to standards such as JPEG or MPEG is one
way most vendors of multimedia products are likely to deal with this issue. In DBMSs, a
"BLOB" (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.
Standardized software will be required to deal with synchronization and
compression/decompression, and will be coupled with indexing problems, which are still
in the research domain.
Queries and retrieval: The "database" way of retrieving information is based on query
languages and internal index structures. The "information retrieval" way relies strictly on
keywords or predefined index terms. For images, video data, and audio data, this opens
up many issues, among them efficient query formulation, query execution, and

PREPARED BY ARUN PRATAP SINGH 44

44
optimization. The standard optimization techniques we discussed in Chapter 18 need to
be modified to work with multimedia data types.
Performance: For multimedia applications involving only documents and text, performance
constraints are subjectively determined by the user. For applications involving video
playback or audio-video synchronization, physical limitations dominate. For instance,
video must be delivered at a steady rate of 60 frames per second. Techniques for query
optimization may compute expected response time before evaluating the query. The use
of parallel processing of data may alleviate some problems, but such efforts are currently
subject to further experimentation.

Such issues have given rise to a variety of open research problems. We look at a few
representative problems now.
MULTIMEDIA DATABASE APPLICATIONS :
Large-scale applications of multimedia databases can be expected to encompass a large number
of disciplines and enhance existing capabilities. Some important applications will be involved:
Documents and records management: A large number of industries and businesses keep
very detailed records and a variety of documents. The data may include engineering
design and manufacturing data, medical records of patients, publishing material, and
insurance claim records.
Knowledge dissemination: The multimedia mode, a very effective means of knowledge
dissemination, will encompass a phenomenal growth in electronic books, catalogs,
manuals, encyclopedias and repositories of information on many topics.
Education and training: Teaching materials for different audiencesfrom kindergarten
students to equipment operators to professionalscan be designed from multimedia
sources. Digital libraries are expected to have a major influence on the way future students
and researchers as well as other users will access vast repositories of educational
material. (See Section 27.6 on digital libraries.)
Marketing, advertising, retailing, entertainment, and travel: There are virtually no limits to
using multimedia information in these applicationsfrom effective sales presentations to
virtual tours of cities and art galleries. The film industry has already shown the power of
special effects in creating animations and synthetically designed animals, aliens, and
special effects. The use of predesigned stored objects in multimedia databases will
expand the range of these applications.
Real-time control and monitoring: Coupled with active database technology, multimedia
presentation of information can be a very effective means for monitoring and controlling
complex tasks such as manufacturing operations, nuclear power plants, patients in
intensive care units, and transportation systems.

MOBILE DATABASES :
Recent advances in wireless technology have led to mobile computing, a new dimension in data
communication and processing. The mobile computing environment will provide database
applications with useful aspects of wireless technology. The mobile computing platform allows
users to establish communication with other users and to manage their work while they are
mobile. This feature is especially useful to geographically dispersed organizations. Typical

PREPARED BY ARUN PRATAP SINGH 45

45
examples might include traffic police, taxi dispatchers, and weather reporting services, as well as
financial market reporting and information brokering applications. However, there are a number
of hardware as well as software problems that must be resolved before the capabilities of mobile
computing can be fully utilized. Some of the software problemswhich may involve data
management, transaction management, and database recoveryhave their origin in distributed
database systems. In mobile computing, however, these problems become more difficult to solve,
mainly because of the narrow bandwidth of the wireless communication channels, the relatively
short active life of the power supply (battery) of mobile units, and the changing locations of
required information (sometimes in cache, sometimes in the air, sometimes at the server). In
addition, mobile computing has its own unique architectural challenges.

The general architecture of a mobile platform is illustrated in Figure 27.04. It is a distributed
architecture where a number of computers, generally referred to as Fixed Hosts (FS) and Base
Stations (BS), are interconnected through a high-speed wired network. Fixed hosts are general
purpose computers that are not equipped to manage mobile units but can be configured to do so.
Base stations are equipped with wireless interfaces and can communicate with mobile units to
support data access.
Mobile Units (MU) (or hosts) and base stations communicate through wireless channels having
bandwidths significantly lower than those of a wired network. A downlink channel is used for
sending data from a BS to an MU and an uplink channel is used for sending data from an MU to
its BS. Recent products for portable wireless have an upper limit of 1 Mbps (megabits per second)
for infrared communication, 2 Mbps for radio communication, and 9.14 Kbps (kilobits per second)
for cellular telephony. Ethernet, by comparison, provides 10 Mbps fast Ethernet and FDDI provide
100 Mbps and ATM (asynchronous transfer mode) provides 155 Mbps.
Mobile units are battery-powered portable computers that move freely in a geographic mobility
domain, an area that is restricted by the limited bandwidth of wireless communication channels.
To manage the mobility of units, the entire geographic mobility domain is divided into smaller
domains called cells. The mobile discipline requires that the movement of mobile units be

PREPARED BY ARUN PRATAP SINGH 46

46
unrestricted within the geographic mobility domain (intercell movement), while having information
access contiguity during movement guarantees that the movement of a mobile unit across cell
boundaries will have no effect on the data retrieval process.



PREPARED BY ARUN PRATAP SINGH 47

47



PREPARED BY ARUN PRATAP SINGH 48

48



PREPARED BY ARUN PRATAP SINGH 49

49

Types of Data in Mobile Applications
Applications that run on mobile hosts have different data requirements. Users either engage in
personal communications or office activities, or they simply receive updates on frequently
changing information. Mobile applications can be categorized in two ways: (1) vertical applications
and (2) horizontal applications (Note 3). In vertical applications users access data within a
specific cell, and access is denied to users outside of that cell. For example, users can obtain
information on the location of doctors or emergency centers within a cell or parking availability
data at an airport cell. In horizontal applications, users cooperate on accomplishing a task, and
they can handle data distributed throughout the system. The horizontal application market is
massive; two types of applications most mentioned are mail-enabled applications and information
services to mobile users.
Data may be classified into three categories:
1. Private data: A single user owns this data and manages it. No other user may access it.
2. Public data: This data can be used by anyone who can read it. Only one source updates it.
Examples include weather bulletins or stock prices.
3. Shared data: This data is accessed both in read and write modes by groups of users.
Examples include inventory data for products in a company.
Public data is primarily managed by vertical applications, while shared data is used by horizontal
applications, possibly with some replication. Copies of shared data may be stored both in base
and mobile stations. This presents a variety of difficult problems in transaction management
consistency as well as integrity and scalability of the architecture.

PREPARED BY ARUN PRATAP SINGH 50

50


SPATIAL DATABASE :
Spatial databases provide concepts for databases that keep track of objects in a multi-
dimensional space. For example, cartographic databases that store maps include two-
dimensional spatial descriptions of their objectsfrom countries and states to rivers, cities, roads,
seas, and so on. These databases are used in many applications, such as environmental,
emergency, and battle management. Other databases, such as meteorological databases for
weather information, are three-dimensional, since temperatures and other meteorological
information are related to three-dimensional spatial points. In general, a spatial database stores
objects that have spatial characteristics that describe them. The spatial relationships among the
objects are important, and they are often needed when querying the database. Although a spatial
database can in general refer to an n-dimensional space for any n, we will limit our discussion to
two dimensions as an illustration.
The main extensions that are needed for spatial databases are models that can interpret spatial
characteristics. In addition, special indexing and storage structures are often needed to improve
performance. Let us first discuss some of the model extensions for two-dimensional spatial
databases. The basic extensions needed are to include two-dimensional geometric concepts,
such as points, lines and line segments, circles, polygons, and arcs, in order to specify the spatial
characteristics of objects. In addition, spatial operations are needed to operate on the objects
spatial characteristicsfor example, to compute the distance between two objectsas well as
spatial Boolean conditionsfor example, to check whether two objects spatially overlap. To
illustrate, consider a database that is used for emergency management applications. A description
of the spatial positions of many types of objects would be needed. Some of these objects generally

PREPARED BY ARUN PRATAP SINGH 51

51
have static spatial characteristics, such as streets and highways, water pumps (for fire control),
police stations, fire stations, and hospitals. Other objects have dynamic spatial characteristics that
change over time, such as police vehicles, ambulances, or fire trucks.
The following categories illustrate three typical types of spatial queries:
Range query: Finds the objects of a particular type that are within a given spatial area or
within a particular distance from a given location. (For example, finds all hospitals within
the Dallas city area, or finds all ambulances within five miles of an accident location.)
Nearest neighbor query: Finds an object of a particular type that is closest to a given
location. (For example, finds the police car that is closest to a particular location.)
Spatial joins or overlays: Typically joins the objects of two types based on some spatial
condition, such as the objects intersecting or overlapping spatially or being within a
certain distance of one another. (For example, finds all cities that fall on a major highway
or finds all homes that are within two miles of a lake.)
For these and other types of spatial queries to be answered efficiently, special techniques for
spatial indexing are needed. One of the best known techniques is the use of R-trees and their
variations. R-trees group together objects that are in close spatial physical proximity on the same
leaf nodes of a tree-structured index. Since a leaf node can point to only a certain number of
objects, algorithms for dividing the space into rectangular subspaces that include the objects are
needed. Typical criteria for dividing the space include minimizing the rectangle areas, since this
would lead to a quicker narrowing of the search space. Problems such as having objects with
overlapping spatial areas are handled in different ways by the many different variations of R-trees.
The internal nodes of R-trees are associated with rectangles whose area covers all the rectangles
in its subtree. Hence, R-trees can easily answer queries, such as find all objects in a given area
by limiting the tree search to those subtrees whose rectangles intersect with the area given in the
query.
Other spatial storage structures include quadtrees and their variations. Quadtrees generally
divide each space or subspace into equally sized areas, and proceed with the sub-divisions of
each subspace to identify the positions of various objects. Recently, many newer spatial access
structures have been proposed, and this area is still an active research area.

CLUSTERING BASED DISASTER PROOF DATABASES :
If downtime is not an option, and the Web never closes for business, how do you keep
your company's doors open 24/7? The answer lies in high-availability (HA) systems that
approach 100 percent uptime.
The principles of high availability define a level of backup and recovery. Until recently, high
availability simply meant hardware or software recovery via RAID (Redundant Array of
Independent Disks). RAID addressed the need for fault tolerance in data but didn't solve the
problem of a complete DBMS failure.

PREPARED BY ARUN PRATAP SINGH 52

52

For even more uptime, database administrators are turning to clustering as the best way to
achieve high availability. Recent moves by Oracle, with its Real Application Cluster, and Microsoft,
with MCS (Microsoft Cluster Service) have made multinode clusters for HA in production
environments mainstream.
In a high-availability setup, a cluster functions by associating servers that have the ability to share
a disk group. As illustrated here, each node has fail-over node within its cluster. If a failure occurs
in Node 1, Node 2 picks up the slack by assuming the resources and the unique logic and
transaction functions of the failed DBMS.
Clustering can have the added benefit of not being bound by node colocation. Fiber-optic
connections, which can be cabled for miles between the nodes in a cluster, ensure continued
operation even in the face of a complete meltdown of your primary system.
When a hot-standby model is in place, downtimes may be less than a minute. This is especially
important if your service-level agreement requires higher than 99.9 percent uptime, which
translates to only 8.7 hours of downtime per year.
Clustering technologies are pricey, however. The enterprise software and hardware must be
uniform and compatible with the clustering technology to work properly. There's also the
associated overhead in the design and maintenance of redundant systems.
One cost-effective solution is log shipping, in which a database can synchronize physically distinct
databases by sending transactions logs from one server to another. In the event of a failure, the
logs can be used to reinstate the settings up to the point of the failure. Other methods include
snapshot databases and replication technologies such as Sybase's Replication Server, which has
been around for decades.

PREPARED BY ARUN PRATAP SINGH 53

53
High-availability add-ons to databases are useful but should be understood in the context of a
complete HA methodology. This requires a concerted effort toward standardization on each of
your mission-critical infrastructures. Fault-tolerant application design with hands-off exception
handling, self-healing and redundant networks, and a stable operating system are all prerequisites
for high availability.
When you adhere to these standards, enforceable, database-specific HA technologies are sure
to lead your enterprise on the path to minimum downtime.

SOME QUESTIONS
Q .1 How a distributed database can be recovered in case of failure ?
Ans :


PREPARED BY ARUN PRATAP SINGH 54

54





PREPARED BY ARUN PRATAP SINGH 55

55




PREPARED BY ARUN PRATAP SINGH 56

56



PREPARED BY ARUN PRATAP SINGH 57

57



In a distributed setting, the server must log a write operation not only to the local log file, but
also to 1, 2 or more remote logs. The issue is close to replication methods, the main choice being
to adopt either a synchronous or asynchronous protocol.
Synchronous protocol.
The server acknowledges the Client only when all the remote nodes have sent a
confirmation of the successful completion of their write() operation. In practice, the Client
waits until the slower of all the writers sends its acknowledgment. This may severely hinder
the efficiency of updates, but the obvious advantage is that all the replicas are consistent.

PREPARED BY ARUN PRATAP SINGH 58

58
Asynchronous protocol.
The Client application waits only until one of the copies (the fastest) has been effectively
written. Clearly, this puts a risk on data consistency, as a subsequent read operation may
access an older version that does not yet reflect the update.

Q. 2 What is a multimedia database? explain the methods of mining multimedia
database.
Ans : Multimedia database : Explained above.


PREPARED BY ARUN PRATAP SINGH 59

59



PREPARED BY ARUN PRATAP SINGH 60

60




The methods of mining multimedia database :



PREPARED BY ARUN PRATAP SINGH 61

61






PREPARED BY ARUN PRATAP SINGH 62

62




PREPARED BY ARUN PRATAP SINGH 63

63




PREPARED BY ARUN PRATAP SINGH 64

64




Q. 3 Write short notes on any four of the following :
(1) Web database
(2) Mobile databases

Ans : Explained above.

Q. 4 Design issues of distributed databases.
Ans :

PREPARED BY ARUN PRATAP SINGH 65

65



PREPARED BY ARUN PRATAP SINGH 66

66



PREPARED BY ARUN PRATAP SINGH 67

67



PREPARED BY ARUN PRATAP SINGH 68

68



PREPARED BY ARUN PRATAP SINGH 69

69



PREPARED BY ARUN PRATAP SINGH 70

70



Q. 5 What is commit protocol and why is it required in a distributed database?
Describes and compare. Two phase and three phase commit. What is blocking and
how does the three phase protocol prevent it? Explain Distributed transaction.

Ans :

PREPARED BY ARUN PRATAP SINGH 71

71
Distributed transaction :


PREPARED BY ARUN PRATAP SINGH 72

72

Commit Protocol :
why commit protocol is required in a distributed database : because of system failure
and to provide atomicity across sites.

PREPARED BY ARUN PRATAP SINGH 73

73



PREPARED BY ARUN PRATAP SINGH 74

74



PREPARED BY ARUN PRATAP SINGH 75

75



PREPARED BY ARUN PRATAP SINGH 76

76

BLOCKING PROBLEM :



PREPARED BY ARUN PRATAP SINGH 77

77



PREPARED BY ARUN PRATAP SINGH 78

78


Q. 6 What are web databases ? How databases are accessed through web ?

Ans : Web databases : Explained above.

Providing Access to Databases on the World Wide Web
Todays technology has been moving rapidly from static to dynamic Web pages, where content
may be in a constant state of flux. The Web server uses a standard interface called the Common
Gateway Interface (CGI) to act as the middlewarethe additional software layer between the
user interface front-end and the DBMS back-end that facilitates access to heterogeneous
databases. The CGI middleware executes external programs or scripts to obtain the dynamic
information, and it returns the information to the server in HTML, which is given back to the
browser.
As the Web undergoes its latest transformations, it has become necessary to allow users access
not only to file systems but to databases and DBMSs to support query processing, report
generation, and so forth. The existing approaches may be divided into two categories:
1. Access using CGI scripts: The database server can be made to interact with the Web server
via CGI. Figure 27.01 shows a schematic for the database access architecture on the Web
using CGI scripts, which are written in languages like PERL, Tcl, or C. The main
disadvantage of this approach is that for each user request, the Web server must start a
new CGI process: each process makes a new connection with the DBMS and the Web
server must wait until the results are delivered to it. No efficiency is achieved by any
grouping of multiple users requests; moreover, the developer must keep the scripts in the
CGI-bin subdirectories only, which opens it to a possible breach of security. The fact that

PREPARED BY ARUN PRATAP SINGH 79

79
CGI has no language associated with it but requires database developers to learn PERL
or Tcl is also a drawback. Manageability of scripts is another problem if the scripts are
scattered everywhere.


2. Access using JDBC: JDBC is a set of Java classes developed by Sun Microsystems to
allow access to relational databases through the execution of SQL statements. It is a way
of connecting with databases, without any additional processes for each client request.
Note that JDBC is a name trademarked by Sun; it does not stand for Java Data Base
connectivity as many believe. JDBC has the capabilities to connect to a database, send
SQL statements to a database and to retrieve the results of a query using the Java classes
Connection, Statement, and Result Set respectively. With Javas claimed platform
independence, an application may run on any Java-capable browser, which loads the Java
code from the server and runs it on the clients browser. The Java code is DBMS
transparent; the JDBC drivers for individual DBMSs on the server end carry the task of
interacting with that DBMS. If the JDBC driver is on the client, the application runs on the
client and its requests are communicated to the DBMS directly by the driver. For standard
SQL requests, many RDBMSs can be accessed this way. The drawback of using JDBC
is the prospect of executing Java through virtual machines with inherent efficiency. The
JDBC bridge to Object Database Connectivity (ODBC) remains another way of getting to
the RDBMSs.

PREPARED BY ARUN PRATAP SINGH 80

80
Besides CGI, other Web server vendors are launching their own middleware products for
providing multiple database connectivity. These include Internet Server API (ISAPI) from
Microsoft and Netscape API (NSAPI) from Netscape. In the next section we describe the Web
access option provided by Informix. Other DBMS vendors already have, or will have similar
provisions to support database access on the Web.

Q. 7 Compare the relative merits of centralized and hierarchical deadlock
detection in a distributed DBMS.

Ans :

A centralized deadlock detection scheme is a reasonable choice if the concurrent control
algorithm is also centralized.
It is better for distributed access patterns across sites since deadlocks occurring between any can
be immediately identified. However, this benefit comes at the expense of communications
between the central location and every other site.

A hierarchical deadlock detection scheme releases the burden of one single site for deadlock
detection, and let more sites get involved.
When access patterns are more localized, perhaps by geographic area, they may likely occur
among certain sites with frequent communications. The hierarchical approach is more efficient in
that it checks for deadlocks where they most likely happen, the hierarchical scheme splits
deadlock detection efforts, thus resulting in greater efficiency.

Das könnte Ihnen auch gefallen