Beruflich Dokumente
Kultur Dokumente
■ Introduction
■ Background
■ Distributed DBMS Architecture
❏ Distributed Database Design
➠ Fragmentation
➠ Data Location
❏ Semantic Data Control
❏ Distributed Query Processing
❏ Distributed Transaction Management
❏ Parallel Database Systems
❏ Distributed Object DBMS
❏ Database Interoperability
❏ Current Issues
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 1
Design Problem
Level of sharing
■ Top-down
➠ mostly in designing systems from scratch
■ Bottom-up
➠ when the databases already exist at a number of
sites
Objectives
User Input
Conceptual View Integration View Design
Design
Access
GCS Information ES’s
Distribution
Design User Input
LCS’s
Physical
Design
LIS’s
❷ How to fragment?
❺ How to allocate?
❻ Information requirements?
PROJ1 PROJ2
PROJ1 PROJ2
tuples relations
or
attributes
CONCURRENCY
Moderate Difficult Easy
CONTROL
Possible Possible
REALITY Realistic
application application
■ Four categories:
➠ Database information
➠ Application information
➠ Communication network information
➠ Computer system information
TITLE, SAL
L1
EMP PROJ
L2 L3
ASG
Preliminaries :
➠ Pr should be complete
➠ Pr should be minimal
■ Example :
➠ Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
➠ Find the budgets of projects at each location. (1)
➠ Find projects with budgets less than $200000. (2)
which is complete.
PAY1 PAY2
TITLE SAL TITLE SAL
Mech. Eng. 27000 Elect. Eng. 40000
Programmer 24000 Syst. Anal. 34000
PROJ1 PROJ2
Database
P1 Instrumentation 150000 Montreal P2 135000 New York
Develop.
PROJ4 PROJ6
■ Reconstruction
➠ If relation R is fragmented into FR = {R1,R2,…,Rr}
R = ∪∀R ∈FR Ri
i
■ Disjointness
➠ Minterm predicates that form the basis of fragmentation
should be mutually exclusive.
TITLE, SAL
L1
EMP PROJ
L2 L3
ASG
EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE
■ Non-overlapping fragments
➠ splitting
access
query access = access frequency of a query ∗
all sites execution
is maximized.
where
n
bond(Ax,Ay) = ςz ==1
aff(Az,Ax)aff(Az,Ay)
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
A1 A3 A2
45 45 0
0 5 80
45 53 5
0 3 75
A 1 A 3 A2 A4
A 1 45 45 0 0
A 3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
TA
Ai
Ai+1
...
BA
Am
CTQ∗=CBQ−=COQ2
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 52
VF – Algorithm
Two problems :
❶ Cluster forming in the middle of the CA matrix
➠ Shift a row up and a column left and apply the algorithm
to find the “best” partitioning point
➠ Do this for all possible shifts
➠ Cost O(m2)
❷ More than two clusters
➠ m-way partitioning
➠ try 1, 2, …, m–1 split points along diagonal and try to find
the best point for each of these
➠ Cost O(2m)
■ Reconstruction
➠ Reconstruction can be achieved by
R = K Ri ∀Ri ∈FR
■ Disjointness
➠ TID's are not considered to be overlapping since they are maintained
by the system
➠ Duplicated keys are not considered to be overlapping
R
●
HF HF
R1 R2
● ●
VF VF VF VF VF
● ● ● ● ●
R11 R12 R21 R22 R23
Decision Variable
acknowledgment cost
all sites all fragments
➠ Retrieval Cost
■ Heuristics based on
➠ single commodity warehouse location (for FAP)
➠ knapsack problem
➠ branch and bound techniques
➠ network flow