You are on page 1of 12

Chapter 5 Distributed Database Design

- Design of a distributed computer system involves making decision on the


placement of data and programs across the sites of a computer network

- This course concentrates on distribution of data

Alternative Design Strategies


- Top-Down Design Process (Refer text page 104)
- Bottom-Up Design Process (Refer text page 106)

Reasons for Fragmentation

- A relation is not a suitable unit for distribution because application views are
usually subsets of relations. Therefore subsets of relations are more suitable as
distribution unit.

- Relation is not replicated (high volume of remote data accesses)


Relation is replicated at all or some sites (unnecessary replication causes update
and storage problem)

- Increase concurrency and system throughput (Parallel execution of query by


dividing the query into sub queries that operate on fragments)

Disadvantages of fragmentation

- Performance degradation – if applications prevent the decomposition of the


relation into mutually exclusive fragments and the applications views are defined on
more than one fragment

- Difficulty in semantic data control (Integrity checking) as attributes are allocated


to different sites as a result of fragmentation.
Fragmentation

PROJ
PNO PNAME BUDGET LOC
P1 x 150 000 Montreal
P2 y 135 000 New York
P3 z 250 000 New York

Horizontal

PROJ1
PNO PNAME BUDGET LOC
P1 x 150 000 Montreal
P2 y 135 000 New York

PROJ2
PNO PNAME BUDGET LOC
P3 z 250 000 New York

Vertical

PROJ1
PNO BUDGET
P1 150 000
P2 135 000
P3 250 000

PROJ2
PNO PNAME LOC
P1 x Montreal
P2 y New York
P3 z New York

Note: Primary key (PNO) is included in both fragments


Degree of Fragmentation
Not to fragment at all  fragment to individual tuples/ attributes

Correctness Rules of Fragmentation


- To ensure the database does not undergo semantic change during fragmentation

Completeness
If a relation R is decomposed into fragments R1, R2,…Rn, each data item that can be
found in R can also be found in one or more Ri. For horizontal fragmentation, item =
tuple and for vertical fragmentation, item = attribute

Reconstruction
If a relation R is decomposed into fragments R1, R2,…Rn, it should be possible to
define a relational operator Δ such that

R= Δ Ri

Disjointness
If a relation R is horizontally decomposed into fragments R1, R2,…Rn, and data item,
d is in Rj, it is not in any other fragment Rk (j≠k)
For vertical fragmentation, primary key is repeated in all fragments, therefore
disjointness is defined on the non primary key attributes.

Allocation Alternatives

- Nonreplicated
- Only one copy of any fragment on the network

- Replication
- Fully replicated
- Partially replicated
Horizontal Fragmentation

Information Requirements

1) Database Information
- Concerns the global conceptual schema
- How relations are connected to one another (ER Diagram)

2) Application Information

Qualitative
- Determine the most important predicates used in user queries

- Simple Predicates – E.g SAL > 20 000, TITLE=”Programmer”

- Min term Predicates - Conjunction of simple predicates


- SAL > 20 000  TITLE=”Programmer”

Quantitative
Min term selectivity
- Number of tuples accessed by a query specified according to a given minterm
predicate

Access frequency
- Access frequency of a query in a given period
Primary Horizontal Fragmentation
- Selection operation on the owner relations of a database schema
Ri =  Fi (R) , 1  i  w

1) Determine a set of simple predicates, Pr (complete and minimal)

Simple predicates are said to be;

Complete
If and only if there is an equal probability of access by every application to
any tuple belonging to any minterm predicate defined according to Pr

Minimal
If all the predicates of a set Pr are relevant

2) Derive the set of minterm predicates from the predicates in set Pr. These minterm
predicates determine the fragments used as candidates in allocation step.

3) Elimination of meaningless minterm fragments.


Derived Horizontal Fragmentation

Defined on member relation according to selection operation specified on owner


relation

Ri = R x Si, 1  i  w, where Si =  Fi (S), 1  i  w

Refer example 5.12

When there is more than one possible derived horizontal fragmentation, which
candidate fragmentation to choose is based on 2 criteria;

Refer figure 5.7

1) Fragmentation used in more applications


- Try to facilitate the accesses of heavy users to improve system performance

2) Fragmentation with better join characteristic


- Query execution will be faster when join is performed on smaller relations
- System throughput improves when query can be executed in parallel
Checking for the correctness rules of fragmentation

Completeness
- Primary horizontal fragmentation
Fragmentation is complete if the selection predicates are complete

- Derived horizontal fragmentation


Let R be the member relation,
S be the owner relation,
A be the join attribute
Then for each tuple t of R, there should be a tuple t’ of S such that
t[A] = t’[A]

Reconstruction
- Reconstruction of a global relation from its fragments is performed by the union
operator for primary and derived horizontal fragmentation

Disjointness
- Primary horizontal fragmentation
Disjointness is guaranteed if the minterm predicates are mutually exclusive

- Derived horizontal fragmentation


Disjointness is guaranteed if the join graph is simple
Vertical Fragmentation
Objective
- Partition a relation into smaller relations so that many of the user application will
run on only one fragment

- Minimize execution time of user applications that run on the fragments by


allowing user queries to deal with smaller relation causing a smaller number of page
accesses

There are 2 heuristic approaches for vertical fragmentation


1) Grouping
- Assigning each attributes to one fragment, and at each step join some of fragments
until some criteria is satisfied

- Results in overlapping of fragments

2) Splitting
- Start with a relation and decides on the beneficial partitioning based on the access
behavior of applications to the attributes

- Non-overlapping of fragments
Information Requirements of Vertical Fragmentation
- Vertical partitioning places in one fragment those attributes usually accessed
together

- Attribute usage value,


use(qi, Aj) = 1 if attribute Aj is referenced by query qi
0 otherwise

Refer to example 5.15


Note: Attribute usage matrix

- Attribute usage values are not sufficient for attribute splitting and fragmentation as
they do not represent the weight of application frequencies. Therefore, we need to
form Attribute Affinity

Refer to example 5.16


Note: Attribute Affinity Matrix
Clustering Algorithm
- Bond energy algorithm is used to group the attributes based on attribute affinity
values
- Bond energy algorithm takes as input the attribute affinity matrix, permutes its
rows and columns, to generate Clustered Affinity Matrix in 3 steps

1) Initialization
A1 A2
A1 45 0
A2 0 80
A3 45 5
A4 0 75

2) Iteration
cont(A1,A2, A3) = 2bond(A1, A2) + 2bond(A2, A3) - 2bond(A1, A3)
= 2*225 + 2*890 – 2*4410 = -6590

cont(A1,A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) - 2bond(A1, A2)


= 2*4410 + 2*890 – 2*225 = 10150

cont(A3,A1, A2) = 2bond(A3, A1) + 2bond(A1, A2) - 2bond(A3, A1)


= 2*4410 + 2*225 – 2*890 = 7490

Since the contribution of the ordering (1-3-2) is the largest, therefore

A1 A3 A2
A1 45 45 0
A2 0 5 80
A3 45 53 5
A4 0 3 75
Continue with column A4

cont(A3,A2, A4) = 2bond(A3, A2) + 2bond(A2, A4) - 2bond(A3, A4)


= 2*890 + 2*11865 – 2*768 = 23974

cont(A3,A4, A2) = 2bond(A3, A4) + 2bond(A4, A2) - 2bond(A3, A2)


= 2*768 + 2*11865 – 2*890 = 23486

cont(A4,A3, A2) = 2bond(A4, A3) + 2bond(A3, A2) - 2bond(A4, A2)


= 2*768 + 2*890 – 2*11865 = -20414

Since the contribution of the ordering (3-2-4) is the largest, therefore

A1 A3 A2 A4
A1 45 45 0 0
A2 0 5 80 75
A3 45 53 5 3
A4 0 3 75 78
3) Row ordering

A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78

- Based on Clustered Affinity Matrix, we have 2 fragments


- When the partition algorithm is applied to CA matrix obtained from relation
PROJ, the result is the definition of fragments FPROJ = {PROJ1, PROJ2}, where
PROJ1= {A1, A3} and PROJ1= {A1, A2, A4}

Thus
PROJ1= {PNO, BUDGET}
PROJ2= {PNO, PNAME, LOC}

Hybrid / Mixed / Nested Fragmentation


- Sometimes a simple horizontal or vertical fragmentation of a database will not
sufficient to satisfy the requirements of user application

- We may have a vertical fragmentation followed by horizontal fragmentation or


vice versa

Refer to figure 5.19

- To reconstruct the original global relation in case of hybrid fragmentation, starts at


the leaves of the tree and moves upward by performing joins and unions

Refer to figure 5.20