Sie sind auf Seite 1von 19

Data Fragmentation

Correctness Rules for Data Fragmentation


• Completeness
If a relation instance R is decomposed into fragments R1,R2,….Rn each
data item in R must appear in at least one of the fragments Ri.
• Reconstruction
If a relation instance R is decomposed into fragments R1,R2,….Rn, it
must be possible to define a relational operation that will reconstruct
the relation R from the fragments R1,R2,….Rn.
• Disjointness
If a relation instance R is decomposed into fragments R1,R2,….Rn, and if a data
item is found in the fragment Ri, then it must not appear in any other
fragment. (In case of vertical fragmentation, primary key attributes must be
repeated to allow reconstruction, therefore in case of vertical fragmentation,
disjointness is defined only on non primary key attributes)
Different Types of Fragmentation

• Horizontal fragmentation
• Vertical Fragmentation
Horizontal fragmentation

• Horizontal fragmentation partitions a relation along its tuples.


• A horizontal fragment is produced by specifying a predicate that
performs a restriction on the tuples of a relation.

• In this fragmentation, the predicate is defined by using the


selection operation of the relational algebra.

• For a given relation R, a horizontal fragmentation is defined as

σρ(R)
Where ρ is predicate based on one or more attributes of the
relation R.
Horizontal fragmentation
Project-id Project- Project- Project-leader- Branch- Amount
name type id no
P01 Inventory Inside E001 B10 $1000000

P02 Sales Inside E001 B20 $300000

P03 R&D Abroad E004 B70 $8000000

P04 Educational Inside E003 B20 $400000

P05 Health Abroad E005 B60 $7000000

Table: Project
Horizontal fragmentation

• P1: σ project-type = “inside” (Project)

Project-id Project- Project- Project-leader- Branch- Amount


name type id no
P01 Inventory Inside E001 B10 $1000000

P02 Sales Inside E001 B20 $300000

P04 Educational Inside E003 B20 $400000


Horizontal fragmentation

• P2: σ project-type = “abroad” (Project)

Project-id Project- Project- Project-leader- Branch- Amount


name type id no
P03 R&D Abroad E004 B70 $8000000

P05 Health Abroad E005 B60 $7000000


Vertical Fragmentation

• Vertical Fragmentation partitions a relation along its


attributes.
• A vertical fragmentation is defined by using the projection
operation of relational algebra.
• For a given relation R, a vertical fragment is defined as
∏a ,a ,….a (R)
1 2 n

Where a1,a2,….an are attributes of the relation R.


Vertical Fragmentation

Two different types of approaches have been identified for


attribute partitioning in vertical fragmentation.
• Grouping
Grouping is started by assigning each attribute to one fragment,
and at each step, joining of some of the fragments are done until
some criteria is satisfied.
• Splitting
Splitting starts with a relation and decides on beneficial
partitioning based on the access behavior of applications to the
attributes.
Vertical Fragmentation
• Splitting
One way to do this is to create a matrix that shows the number of
accesses that refer to each attribute pair.
For example, a transaction that accesses attributes a1,a2,a3 and
a4 of relation R can be represented by the following matrix
a1 a2 a3 a4
a1 1 0 1
a2 0 1
a3 0
a4
Pairs with high affinity should appear in the same vertical fragment and
pairs with low affinity may appear in different fragments.
Vertical fragmentation

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P2 Database develop 135000 New York

P3 CAD/CAM 250000 New York

P4 Maintenance 310000 Paris

Table: Project
VF – Information Requirements
• Application Information
– Attribute affinities
• a measure that indicates how closely related the attributes
are
• This is obtained from more primitive usage data
– Attribute usage values
• Given a set of queries Q = {q1, q2,…, qq} that will run on the
relation R[A1, A2,…, An],
 1 if attribute Aj is referenced by query qi
use(qi,Aj) = 
 0 otherwise

use(qi,•) can be defined accordingly


VF – Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ
q1:SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3:SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/13
VF – Affinity Measure aff(Ai,Aj)
The attribute affinity measure between two attributes Ai and Aj of a relation
R[A1, A2, …, An] with respect to the set of applications Q = (q1, q2, …, qq) is
defined as follows :

aff (Ai, Aj)   (query access)


all queries that access A and A i j

query access   access frequency of a query  access


all sites

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/14


VF – Calculation of aff(Ai, Aj)

Assume each query in the previous example accesses the attributes once
during each execution. S S S 1 2 3
Also assume the access frequencies q1 15 20 10
q2 5 0 0
q3 25 25 25
q
4 3 0 0

Then A A A A4
1 2 3
aff(A1, A3) = 15*1 + 20*1+10*1 A 45 0 45 0
1
= 45 A 0 80 5 75
2
and the attribute affinity matrix AA is A 45 5 53 3
3
A 4 0 75 3 78

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/15


VF – Clustering Algorithm
• Take the attribute affinity matrix AA and reorganize the attribute orders
to form clusters where the attributes in each cluster demonstrate high
affinity to one another.
• Bond Energy Algorithm (BEA) has been used for clustering of entities.
BEA finds an ordering of entities (in our case attributes) such that the
global affinity measure is maximized.

AM   (affinity of Ai and Aj with their neighbors)


i j

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/16


Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1 positions
in the CA matrix. For each column, choose the placement that makes the
most contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/17


Bond Energy Algorithm
• Bonding relation ship of rows is defined as follows
bond(A2,A3)= (c1RowA2*c1RowA3)+ (c2RowA2*c2RowA3)+
(c3RowA2*c3RowA3)

A2 A3 A4

A2 80 5 75

A3 5 53 3

A4 75 3 78

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/18


Bond Energy Algorithm
The contributions of columns depending on their positions are as follows

cont(Ai, Ak, Aj) = bond(Ai, Ak)+bond(Ak, Al) –bond(Ai, Aj)

A2 A3 A4

A2 80 5 75

A3 5 53 3

A4 75 3 78

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/19

Das könnte Ihnen auch gefallen