Sie sind auf Seite 1von 46

DISTRIBUTED

DATABASES

Distributed Database

In distributed database system the database is stored on


several computers.
The computers in a distributed system communicate with
one another through various communication media, such
as high-speed networks or telephone lines.
Computers in a distributed system also referred to as
sites or nodes
It consist of single logical database that is split into a
number of fragments .
Each fragment is stored on one or more computers under
the control of separate DBMS.
logically interrelated collection of shared data physically
distributed over a computer network is called a distributed
database

Ex:
One bank have branches all over India &
its head office is in Delhi.
Assume bank maintains local data in
Local Branch and copy of data of all
branches at Delhi.
Data is distributed all over India.
This eases query processing for local
customers of a branch & also of a global
customer.

Bank using distributed


processing

Mumbai

Delhi

Chennai

Bangalore
(Head Office)
Agra

Local
Branch
Local
Branch
Local
Branch
Local
Branch

A distributed database system consists of a


collection of sites connected together via
some kind of communications network, in
which :
each site is a database system site in its
own right;
the sites agree to work together, so that
a user at any site can access data
anywhere in the network exactly as if
the data were all stored at the user's
own site

Distributed DBMS.
Software system that permits the
management of the distributed database and
makes the distribution transparent to users.

Characterstics of DDBMS

Collection of logically-related shared data.


Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one global
application.

Advantages of DDBMS
1. Data sharing
.If a number of different sites are connected
to each other, then a user at one site may be
able to access data that is available at another
site.
For example, in the distributed banking
system,
is possible for a user in one branch
2.
Local itAutonomy
toThe
access
dataadvantage
in another to
branch.
.
primary
accomplishing data
sharing by means of data distribution is that
each site is able to retain a degree of control
over data stored locally.

In a centralized system, the database administrator


of the central site controls the database.
In a distributed system, there is a global database
administrator responsible for the entire system.
A part of these responsibilities is delegated to the
local database administrator for each site.
each local administrator may have a different degree
of autonomy which is often a major advantage of
distributed databases.
3.Reliability and Availability

If one site fails in distributed system, the


remaining sited may be able to continue operating.
In particular, if data are replicated in several sites,
transaction needing a particular data item may find
it in several sites. Thus, the failure of a site does not
necessarily imply the shutdown of the system.

The failure of one site must be detected


by the system, and appropriate action
may be needed to recover from the
failure. The system must no longer use
the service of the failed site. Finally,
when the failed site recovers or is
repaired, mechanisms must be available
to integrate it smoothly back into the
system.
4.Faster data access
users can issue commands from any
location to access data and it does not
affect the working of database.
Its advantage is that if a user wants to

5. Modular Growth
Any time new nodes (computers) can be added to
the network without any difficulty.
6.Speedup Query Processing:
If a query involves data at several sites, it may be
possible to split the query into sub queries that
can be executed in parallel by several sites.
Such parallel computation allows for faster
processing of a users query.
In those cases in which data is replicated, queries
may be directed by the system to the least
heavily loaded sites.

Disadvantages of DDBMSs

Complexity -A distributed database is


more complicated to set up and maintain
compared to a central database.
Managing and controlling of ddms is
complex
Security-there is less security because
data is at so many different sites.
Distributed databases provides more
flexible accesses that increase the
chance of security violations since the
database can be accessed throughout
every site within the network.

Lack of Standards- there are no tools or


methodologies yet to help users convert a centralized
DBMS into a distributed DBMS.
Database Design More Complex-besides of the normal
difficulties, the design of a distributed database has to
consider fragmentation of data, allocation of
fragments to specific sites and data replication.
Cost- increased complexity and a more extensive
infrastructure means extra costs.
Lack of Experience-distributed databases are difficult
to work with, and as a young field there is not much
readily available experience on proper practice.

local and global transactions


A

local transaction accesses data in the


single site at which the transaction was
initiated.
A global transaction either accesses
data in a site different from the one at
which the transaction was initiated or
accesses data in several different sites.
Ensuring ACID properties of local
transcation can be done same as
normal transction. Ensuring ACID
properties of global transcation is
complex

Types of DDBMS

Homogeneous DDBMS
Heterogeneous DDBMS

Homogenous Distributed Database Systems

All

sites have identical software /schema


They are aware of each other and agree to
cooperate in processing user requests
Goal: provide a view of a single database,
hiding details of distribution. It appears to
user as a single system

Homogeneous Database

Identical DBMSs

All data is managed by the distributed


DBMS( no exclusively local data)
All access is through one, global schema
The global schema is the union of all the local

Heterogeneous Distributed
Database Systems

Data distributed across all the nodes

Different software/schema on different sites

Different DBMSs may be used at each node


Local access is done using the local DBMS and
schema
Remote access is done using the global
schema

Goal: integrate existing databases to provide useful


functionality

19

Typical Heterogeneous Environment

Non-identical DBMSs

Source: adapted from Bell and Grimson, 1992.

20

Distributed Database
Design

Design of ddms introduce 3 issues


How to partition database into fragments
Which fragments to replicate
Where to locate those fragments and replicas

Fragmentation and replication deals with


first 2 issues.allocation deal with 3 rd
issues

Fragmentation

Allocation

Relation may be divided into a number of subrelations, which are then distributed.
Each fragment is stored at site with "optimal"
distribution.

Replication

Copy of fragment may be maintained at


several sites.

Data Fragmentation

If the relation r relation r into fragments r1, r2, , rn which


contain sufficient information to reconstruct relation r.
3 Rules which must be followed:
Completeness - If a relation R is decomposed into
fragments R1,R2....Rn, each data item in R must appear in
at least one fragment
Reconstruction - It must be possible to define a relational
operation that will reconstruct R from the fragments
Disjointness - A data item must appear in only one
fragment - exception - Primary Key in vertical fragmentation
For horizontal fragmentation, data item is a tuple
For vertical fragmentation, data item is an attribute.

Types of fragmentation:

Three types of fragmentation:

Horizontal
Vertical
Mixed

Other possibility is no fragmentation:


If relation is small and not updated frequently, may
be better not to fragment relation.

Horizontal fragmentation
each tuple of r is assigned to one or more fragments
Example : relation account with following schema
Account = (account_number, branch_name , balance
)
account relation can be divided into several different
fragments,each of which consists of tuples of
accounts belonging to a particular branch.If the
banking system has only two branchesHillside and
Valleyviewthen there are two different fragments:

We reconstruct the relation r by taking the union of all fragments; that is,
r = r1 r2 r n

Horizontal Fragmentation of account


Relation
account_number branch_name
A-305
A-226
A-155

balance

Hillside
Hillside
Hillside

500
336
62

account1 = branch_name=Hillside
(account )
account_number branch_name
balance
A-177
A-402
A-408
A-639

Valleyview
Valleyview
Valleyview
Valleyview

account2 = branch_name=Valleyview
(account )

205
10000
1123
750

PROJ1: projects with budgets less than $200,000


PROJ2: projects with budgets greater than or equal
to
$200,000

Vertical fragmentation:

the schema for relation r is split into several smaller


schemas

All schemas must contain a common candidate key (or superkey) to


ensure lossless join property.
A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.

We can reconstruct the relation by taking natural join of


relations
r=r1
r2
r3
..
rn

Vertical Fragmentation of employee_info


Relation
branch_name customer_name

tuple_id

Lowman
1
Hillside
Camp
2
Hillside
Camp
3
Valleyview
Kahn
4
Valleyview
Kahn
5
Hillside
Kahn
6
Valleyview
Green
7
Valleyview
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number

balance

tuple_id

500
A-305
1
336
A-226
2
205
A-177
3
10000
A-402
4
62
A-155
5
1123
A-408
6
750
A-639
7
deposit2 = account_number, balance, tuple_id (employee_info )

Horizontal and Vertical


Fragmentation

Mixed Fragmentation
Combination of horizontal and vertical

strategies
Is also called hybrid or nesting
A horizontal fragment that is subsequently
vertically fragmented, or a vertical fragment
that is then horizontally fragmented.
Mixed fragmentation is defined using
select and project operation of relation
algebra
Original relation can be obtained by join
and union operation

Advantages of
Fragmentation

Horizontal:
allows parallel processing on fragments of a
relation
allows a relation to be split so that tuples are
located where they are most frequently
accessed

Vertical:
allows tuples to be split so that each part of
the tuple is stored where it is most frequently
accessed
tuple-id attribute allows efficient joining of
vertical fragments

Disadvantages:

Performance - may be slower

Integrity - more difficult

Data Replication

System maintains multiple copies of data,


stored in different sites, for faster
retrieval and fault tolerance.
Two types replication
Full replication
Partial replication

Full replication

Full replication of a relation is the case


where the relation is stored at all sites.
Fully redundant databases are those in
which every site contains a copy of the
entire database.

Can be impractical due to amount of overhead

Partial

replication

Some importantant frequently used


fragments are only replicated
Most DDBMSs are able to handle the
partially replicated database well
Unreplicated

database

Stores each database fragment at single

site
No duplicate database fragments

Advantages of Replication
Availability: failure of site containing relation
r does not result in unavailability of r is
replicas exist.
Parallelism: queries on r may be processed
by several nodes in parallel.
Reduced
data transfer: relation r is
available locally at each site containing a
replica of r.

Disadvantages of Replication

Increased cost of updates: each replica of


relation r must be updated.

Increased complexity of concurrency control:


concurrent updates to distinct replicas may
lead to inconsistent data unless special
concurrency control mechanisms are
implemented.
One solution: choose one copy as primary copy

and apply concurrency control operations on


primary copy

Data allocation

Four alternative strategies regarding


placement of data:
Centralized
Partitioned (or Fragmented)
Complete Replication
Selective Replication
Data allocation algorithms consider variety of
factors like
performance,reliabitlity,availbility,storage
cost,communication cost

Centralized data allocation


entire DB is stored at one site with users
distributed across the network.
Partitioned data allocation

Complete Replication

Database partitioned into disjoint fragments, each


fragment assigned to one site.
Consists of maintaining complete copy of database at
each site.

Selective Replication

Combination of partitioning, replication, and


centralization.

erence Architecture for DDBMS

Due to diversity, no universally accepted


architecture such as the ANSI/SPARC 3level architecture.
A reference architecture consists of:

Set of global external schemas.


Global conceptual schema (GCS).
Fragmentation schema and allocation schema.
Set of schemas for each local DBMS conforming
to 3-level ANSI/SPARC .

Some levels may be missing, depending on


levels of transparency supported.

Global conceptual schema


the global conceptual is a logical description of the whole
data base as if it were not distributed.
In DDBMS, GCS is union of all local conceptual schemas.
fragmentation and allocation schema
The fragementation schema is a description of how the
data is to be logically partioned
Allocation schema is a description of where data is to be
located
Local schemas
Each local DBMS has its own set of schemas
The local mapping schema maps fragments in allocation
schema into external objects in the local data base

Components of a DDBMS

Local DBMS (LDBMS) component - It has its


own
local system catalog that stores
information about the data held at that site.
Data communications (DC) component is
the software that enables all sites to
communicate with each other.
Global System Catalog (GSC) - The GSC holds
information specific to the distributed nature
of
the system, such as the fragmentation and
allocation schemas.
Distributed DBMS component - is the
controlling unit of the entire system.

Das könnte Ihnen auch gefallen