Fragmentation Newb

Data Fragmentation
Correctness Rules for Data Fragmentation

• Completeness
If a relation instance R is decomposed into fragments R1,R2,….Rn each
data item in R must appear in at least one of the fragments Ri.
• Reconstruction
If a relation instance R is decomposed into fragments R1,R2,….Rn, it
must be possible to define a relational operation that will reconstruct
the relation R from the fragments R1,R2,….Rn.
• Disjointness
If a relation instance R is decomposed into fragments R1,R2,….Rn, and if a data
item is found in the fragment Ri, then it must not appear in any other
fragment. (In case of vertical fragmentation, primary key attributes must be
repeated to allow reconstruction, therefore in case of vertical fragmentation,
disjointness is defined only on non primary key attributes)
Different Types of Fragmentation
• Horizontal fragmentation
• Vertical Fragmentation
Horizontal fragmentation
• Horizontal fragmentation partitions a relation along its tuples.

• A horizontal fragment is produced by specifying a predicate that
performs a restriction on the tuples of a relation.
• In this fragmentation, the predicate is defined by using the

selection operation of the relational algebra.
• For a given relation R, a horizontal fragmentation is defined as
σρ(R)
Where ρ is predicate based on one or more attributes of the
relation R.
Project-id Project- Project- Project-leader- Branch- Amount
name type id no
P01 Inventory Inside E001 B10 $1000000
P02 Sales Inside E001 B20 $300000
P03 R&D Abroad E004 B70 $8000000
P04 Educational Inside E003 B20 $400000
P05 Health Abroad E005 B60 $7000000
Table: Project
• P1: σ project-type = “inside” (Project)

name type id no
P01 Inventory Inside E001 B10 $1000000
P02 Sales Inside E001 B20 $300000
P04 Educational Inside E003 B20 $400000

• P2: σ project-type = “abroad” (Project)

name type id no
P03 R&D Abroad E004 B70 $8000000
P05 Health Abroad E005 B60 $7000000

Vertical Fragmentation
• Vertical Fragmentation partitions a relation along its

attributes.
• A vertical fragmentation is defined by using the projection
operation of relational algebra.
• For a given relation R, a vertical fragment is defined as
∏a ,a ,….a (R)
1 2 n
Where a1,a2,….an are attributes of the relation R.

Two different types of approaches have been identified for

attribute partitioning in vertical fragmentation.
• Grouping
Grouping is started by assigning each attribute to one fragment,
and at each step, joining of some of the fragments are done until
some criteria is satisfied.
• Splitting
Splitting starts with a relation and decides on beneficial
partitioning based on the access behavior of applications to the
attributes.
• Splitting
One way to do this is to create a matrix that shows the number of
accesses that refer to each attribute pair.
For example, a transaction that accesses attributes a1,a2,a3 and
a4 of relation R can be represented by the following matrix
a1 a2 a3 a4
a1 1 0 1
a2 0 1
a3 0
a4
Pairs with high affinity should appear in the same vertical fragment and
pairs with low affinity may appear in different fragments.
Vertical fragmentation
PNO PNAME BUDGET LOC
P1 Instrumentation 150000 Montreal
P2 Database develop 135000 New York
P3 CAD/CAM 250000 New York
P4 Maintenance 310000 Paris
Table: Project
VF – Information Requirements
• Application Information
– Attribute affinities
• a measure that indicates how closely related the attributes
are
• This is obtained from more primitive usage data
– Attribute usage values
• Given a set of queries Q = {q1, q2,…, qq} that will run on the
relation R[A1, A2,…, An],
 1 if attribute Aj is referenced by query qi
use(qi,Aj) = 
 0 otherwise
use(qi,•) can be defined accordingly

VF – Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ
q1:SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3:SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/13
VF – Affinity Measure aff(Ai,Aj)
The attribute affinity measure between two attributes Ai and Aj of a relation
R[A1, A2, …, An] with respect to the set of applications Q = (q1, q2, …, qq) is
defined as follows :
aff (Ai, Aj)   (query access)

all queries that access A and A i j
query access   access frequency of a query  access

all sites

VF – Calculation of aff(Ai, Aj)
Assume each query in the previous example accesses the attributes once
during each execution. S S S 1 2 3
Also assume the access frequencies q1 15 20 10
q2 5 0 0
q3 25 25 25
q
4 3 0 0
Then A A A A4
1 2 3
aff(A1, A3) = 15*1 + 20*1+10*1 A 45 0 45 0
1
= 45 A 0 80 5 75
2
and the attribute affinity matrix AA is A 45 5 53 3
3
A 4 0 75 3 78

VF – Clustering Algorithm
• Take the attribute affinity matrix AA and reorganize the attribute orders
to form clusters where the attributes in each cluster demonstrate high
affinity to one another.
• Bond Energy Algorithm (BEA) has been used for clustering of entities.
BEA finds an ordering of entities (in our case attributes) such that the
global affinity measure is maximized.
AM   (affinity of Ai and Aj with their neighbors)

i j

Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a perturbation of AA
 Initialization: Place and fix one of the columns of AA in CA.
 Iteration: Place the remaining n-i columns in the remaining i+1 positions
in the CA matrix. For each column, choose the placement that makes the
most contribution to the global affinity measure.
 Row order: Order the rows according to the column ordering.

• Bonding relation ship of rows is defined as follows
bond(A2,A3)= (c1RowA2*c1RowA3)+ (c2RowA2*c2RowA3)+
(c3RowA2*c3RowA3)
A2 A3 A4
A2 80 5 75
A3 5 53 3
A4 75 3 78

The contributions of columns depending on their positions are as follows
cont(Ai, Ak, Aj) = bond(Ai, Ak)+bond(Ak, Al) –bond(Ai, Aj)
A2 A3 A4
A2 80 5 75
A3 5 53 3
A4 75 3 78

Fragmentation Newb

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fragmentation Newb

Hochgeladen von

Copyright:

Verfügbare Formate

Data Fragmentation

Correctness Rules for Data Fragmentation

• Horizontal fragmentation partitions a relation along its tuples.

• In this fragmentation, the predicate is defined by using the

• For a given relation R, a horizontal fragmentation is defined as

P02 Sales Inside E001 B20 $300000

P03 R&D Abroad E004 B70 $8000000

P04 Educational Inside E003 B20 $400000

P05 Health Abroad E005 B60 $7000000

• P1: σ project-type = “inside” (Project)

Project-id Project- Project- Project-leader- Branch- Amount

P02 Sales Inside E001 B20 $300000

P04 Educational Inside E003 B20 $400000

• P2: σ project-type = “abroad” (Project)

Project-id Project- Project- Project-leader- Branch- Amount

P05 Health Abroad E005 B60 $7000000

• Vertical Fragmentation partitions a relation along its

Where a1,a2,….an are attributes of the relation R.

Two different types of approaches have been identified for

PNO PNAME BUDGET LOC

P1 Instrumentation 150000 Montreal

P2 Database develop 135000 New York

P3 CAD/CAM 250000 New York

P4 Maintenance 310000 Paris

use(qi,•) can be defined accordingly

aff (Ai, Aj)   (query access)

query access   access frequency of a query  access

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/14

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/15

AM   (affinity of Ai and Aj with their neighbors)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/16

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/17

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/18

cont(Ai, Ak, Aj) = bond(Ai, Ak)+bond(Ak, Al) –bond(Ai, Aj)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/19

Das könnte Ihnen auch gefallen