Sie sind auf Seite 1von 57

DISTRIBUTED

DBMS

SUSHIL
KULKARNI
 DDBMS Concepts
 Applications
 Characteristics, Properties of DDBMS
 Distributed Processing
 Advantages & Disadvantages DDBMS
 Types & Functions of DDBMS
 Main Issues of DDBMS
 Component Architecture for DDBMS
 Data Allocation & Fragmentation
 Transparencies
CONCEPTS
CONCEPTS
• So far, we assume a centralized database
 Data are stored in one location (e.g. a single
hard disk)
 A centralized database management system to
handle transaction
 To handle multiple requests, a client-server
system is used
- Client send requests for data to server
- Server handle query, transaction management etc.

SUSHIL KULKARNI
CONCEPTS
• This is not the only possibility
• In many cases, it may be advantageous
for data to be distributed
– Branches of a bank
– Different part of the government storing
different kind of data about a person
– Different organizations sharing part of their
data
• Thus, distributed databases

SUSHIL KULKARNI
CONCEPTS
• Data spread over multiple machines (also referred
to as sites or nodes.
• Network interconnects the machines
• Data shared by users on multiple machines

SUSHIL KULKARNI
CONCEPTS

Distributed database

Logical interrelated collection of shared data,


along with description of data, physically
distributed over a computer network.

SUSHIL KULKARNI
CONCEPTS

Distributed DBMS

The software system that permits the


management of the distributed database and
makes the distribution transparent to users

SUSHIL KULKARNI
CONCEPTS

Applications

• User access distributed database via


applications

SUSHIL KULKARNI
CONCEPTS

TWO types of Applications

• Local application : Application that do not


required data from other sites.

• Global application : Application that required


data from other sites.
SUSHIL KULKARNI
TYPES OF DDBMS
• In a homogeneous distributed database:

– All sites have identical software.


– Are aware of each other and agree to
cooperate in processing user requests.
– Each site surrenders part of its autonomy
in terms of right to change schemas or
software.
– Appears to user as a single system.

SUSHIL KULKARNI
TYPES OF DDBMS
• In a heterogeneous distributed database:

– Different sites may use different schemas and


software.
• Difference in schema is a major problem for
query processing.
• Difference in software is a major problem for
transaction processing.
– Sites may not be aware of each other and may
provide only limited facilities for cooperation in

transaction processing.
SUSHIL KULKARNI
TYPE: HOMOGENEOUS DBMS

Identical DBMSs

SUSHIL KULKARNI
TYPE: HETROGENEOUS DBMS

Non-identical DBMSs

SUSHIL KULKARNI
OBJECTIVES : DISTRIBUTED
ARCHITECTURE
• Location Transparency
– User does not have to know the location of the data.
– Data requests automatically forwarded to appropriate
sites

• Local Autonomy
– Local site can operate with its database when
network connections fail
– Each site controls its own data, security,
logging, recovery

SUSHIL KULKARNI
SIGNIFICANT TRADE -OFF

 Synchronous • Asynchronous
Distributed Database Distributed Database
• All copies of the same • Some data inconsistency
data are always is tolerated
identical • Data update propagation
is delayed
• Data updates are
immediately applied to • Lower data integrity
all copies throughout • Less overhead  faster
network response time
• Good for data integrity
• High overhead  slow
response times
NOTE: all this assumes replicated data (to be discussed later)
Advantages & Disadvantages

 Advantages  Disadvantages
• Increased reliability • Software cost &
& availability complexity
• Local control • Processing overhead
• Modular growth • Data integrity
• Lower • Slow response
communication
costs
• Faster response
DISTRIBUTED PROCESSING

A centralized database that can be


accessed over a computer network.

SUSHIL KULKARNI
DISTRIBUTED PROCESSING
T T T T T T

COM 1 COM 2

Communication

Network DB

T T T

SUSHIL KULKARNI COM 3


FUNCTIONS OF DDBMS
Functions of a centralized DBMS plus:

extended communication to allow the transfer of


queries and data among sites

extended system catalog to store data distribution


details

distributed query processing , including query


optimization

SUSHIL KULKARNI
FUNCTIONS OF DDBMS

extended concurrency control to maintain


consistency of replicated data.

extended recovery services to take account


of failures of individual sites and common
links

SUSHIL KULKARNI
TWO MAIN ISSUES IN DDBMS

Making query from one site to the same or


remote site.

Logical database is partitioned in to


different
data streams and located at different sites.

SUSHIL KULKARNI
COMPONENT ARCHITECTURE FOR
DDBMS
• Local DBMS

• Data Communication Component

• Global System Catalog

• Distributed DBMS component

SUSHIL KULKARNI
DATA
ALLOCATION
DATA ALLOCATION
• Centralized

• Fragmented

• Complete replication

• Selective replication

SUSHIL KULKARNI
Distributed Data Storage
• Assume relational data model.
• Replication:
– System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
• Fragmentation:
– Relation is partitioned into several fragments stored in
distinct sites
• Replication and fragmentation can be combined:
– Relation is partitioned into several fragments: System
maintains several identical replicas of each such
fragment.
Data Replication

• A relation or fragment of a relation is


replicated if it is stored redundantly in
two or more sites.
• Full replication of a relation is the case
where the relation is stored at all sites.
• Fully redundant databases are those in
which every site contains a copy of the
entire database.

SUSHIL KULKARNI
Data Replication (Cont.)
Data Replication
Data Replication
• Advantages of Replication:
– Availability: failure of site containing relation r
does not result in unavailability of r is replicas
exist.
– Parallelism: queries on r may be processed by
several nodes in parallel.
– Reduced data transfer: relation r is available
locally at each site containing a replica of r.

SUSHIL KULKARNI
Data Replication (Cont.)
Data Replication
Data Replication
• Disadvantages of Replication
– Increased cost of updates: each replica of
relation r must be updated.
– Increased complexity of concurrency control:
concurrent updates to distinct replicas may
lead to inconsistent data unless special
concurrency control mechanisms are
implemented.
• One solution: choose one copy as primary copy
and apply concurrency control operations on
primary copy.
Data Fragmentation
• Division of relation r into fragments r1, r2, …, rn which
contain sufficient information to reconstruct relation r.
• Horizontal fragmentation: each tuple of r is assigned to
one or more fragments.
• Vertical fragmentation: the schema for relation r is split
into several smaller schemas.
– All schemas must contain a common candidate key (or
superkey) to ensure lossless join property.
– A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.
• Example : relation account with following schema.
• Account-schema = (branch-name, account-number,
balance).
HORIZONTAL FRAGMENTATION

Original relation A1 A2 ………. An


T1

A1 A2 ………. An T2
T1 T3
1
T2 .
1
T3 .T60
1
.
2 Site 1
.T60
2
T61
A1 A2 ………. An
3
. T61
3
. .
3
Tn .
Tn

Site 2
-Fragments contain subsets of complete tuples (all attributes at all
sites)
How to reconstruct R= Rs1 Rs2 ……. Rsn
VERTICAL FRAGMENTATION

A1 A2 A3 A4

Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn

TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition

SITE1 SITE2
VERTICAL FRAGMENTATION

A1 A2 A3 A4

Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn

TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition

SITE1 SITE2
MIXED FRAGMENTATION

Rs1 A1 A2 A3 A4 A5
Rs3
u
R s
a
A1 A2 A3 A4 A5

Rs2
A1 A2 A3 E
A4 A5 u
(Salary (Benefit r
Attributes) Attributes)
o
p
Rs4 e
MIXED FRAGMENTATION
  A1 A2 A3 A4
Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn

TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition

SITE1 SITE2
Horizontal Fragmentation of account
Relation
branch­name account­number balance

Hillside A­305 500


Hillside A­226 336
Hillside A­155 62

account1=σ branch­name=“Hillside”(account)

branch­name account­number balance

Valleyview A­177 205


Valleyview A­402 10000
Valleyview A­408 1123
Valleyview A­639 750

account2=σ branch­name=“Valleyview”(account)
SUSHIL KULKARNI
Vertical Fragmentation of employee-info
Relation
branch­name customer­name tuple­id

Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1=Π branch­name, customer­name, tuple­id(employee­info)
account number balance tuple­id

A­305 500 1
A­226 336 2
A­177 205 3
A­402 10000 4
A­155 62 5
A­408 1123 6
A­639 750 7
deposit2=Π account­number, balance, tuple­id(employee­info)
Advantages of Fragmentation
• Horizontal:
– allows parallel processing on fragments of a relation
– allows a relation to be split so that tuples are located where
they are most frequently accessed
• Vertical:
– allows tuples to be split so that each part of the tuple is stored
where it is most frequently accessed
– tuple-id attribute allows efficient joining of vertical fragments
– allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
– Fragments may be successively fragmented to an arbitrary
depth.

SUSHIL KULKARNI
REPLICATION and FRAGMENTATION

Partition of Attributes/tuples need not be disjoint


A1 A2 A3 A4 A5

A1 A2 A3 A4 A2 A3 A4 A5

Overlap
(replication of attributes)
TRANSPARENCIES
TRANSPARENCIES IN DDBMS
• Transparencies hide implementation
details from the user
• Example in Centralized databases : Data
independence
• Main types of transparencies in DDBMS:
o Distributed Transparency
o Transaction Transparency

SUSHIL KULKARNI
DISTRIBUTED TRANSPARENCY
Allows the user to see the database as a
single, logical entity.

If this transparency is exhibited then the


user does not need to know that
1. The data are partitioned.
2. Data can be replicated at several
sites.
3. Data location.
SUSHIL KULKARNI
EXAMPLE
Staff (staffNo, position, sex, dob, salary,
fName, lName, branchNo)

Vertical fragmentation:
S = Π staffNo, position, sex , dob, salary (Staff)
1

S = Π staffNo, fName, lName , dbranchNo (Staff)


2

SUSHIL KULKARNI
EXAMPLE
Fragment S 2 according to branch number.
Assume that there are only three branches.
Horizontal fragmentation:
S = σ (Staff)
21 branchNo = ' B 003 '

S =σ (Staff)
22 branchNo = ' B 005 '

S =σ (Staff)
23 branchNo = ' B007 '
SUSHIL KULKARNI
EXAMPLE
Assume that :

S 1 and S 2 are at site 5,


S 21 at site 3
S 22 at site 5
S 23 at site 7

SUSHIL KULKARNI
FRAGMENTATION TRANSPARENCY

If it is provided then the user does not need


to know the data is fragmented.
Example:

SELECT fName, lName

FROM Staff
WHERE position = ‘ Manager ’
SUSHIL KULKARNI
LOCATION TRANSPARENCY

If it is provided then the user must know


how the data has been fragmented but still
does not have know the location of the data.

SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’)
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:

UNION

SELECT fName, lName


FROM S23
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’ )

SUSHIL KULKARNI
LOCAL MAPPING TRANSPARENCY

If it is provided then the user must know


how the data has been fragmented as well
as the location of the data.

SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21 AT SITE 3
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5
where position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22 AT SITE 5
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’)
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:

UNION

SELECT fName, lName


FROM S23 AT SITE 7
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’ )

SUSHIL KULKARNI
TRANSACTION TRANSPARENCY

It maintains distributed database’s integrity


and consistency.

SUSHIL KULKARNI
QUERY PROCESSING IN DDMS
Issues 1:
Horizontal
Parallel Processing across Fragments fragmentations

=Emp1 U Emp2
Π LName(σ salary>40,000(Employee))
2 Fragments
⇒ Π LName( σ salary>40,000(Emp1)) U Π LName( σ salary>40,000(Emp2))
Site 1 Site 2

Execution in Parallel on fragments


and union results together
QUERY PROCESSING IN DDMS

Site1 Site2 Site3 Joins- symmetric and


associative

(A B) C

Parallel Processing
(σ xx(A)) (B C)
A (B C)
QUERY PROCESSING IN DDMS
Join Strategies
R=Π Fnames, Cnames, Dnames (Employee Department)
Site 3 Site 1 Mg rssn Site 2
100 records, 2000 bytes 10,000 records, to ssn 100 records, 3000
1,000,000 bytes bytes

Strategies:
1,003,000
1)Ship both relations to the result site and join there bytes
transfered
1,002,000
2)Ship employee to 2, join at 2, results to 3 bytes
transfered
3)Ship Department to 1, join at 1, results to 3 5,000 bytes
transfered
⇒ minimize total communication cost of data transfer
THANKS !

Das könnte Ihnen auch gefallen