Vertical Fragmentation

Vertical Fragmentation
Has been studied within the centralized context
design methodology physical clustering
More difficult than horizontal, because more alternatives exist. Two approaches :
grouping
attributes to fragments
splitting
relation to fragments
Overlapping fragments
grouping
Non-overlapping fragments
splitting
We do not consider the replicated key attributes to be overlapping. Advantage:

Easier to enforce functional dependencies (for integrity checking etc.)
VF Information Requirements
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are This is obtained from more primitive usage data
Attribute usage values

Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An], use(qi,) can be defined accordingly
use(qi,Aj) =
1= if attribute Aj is referenced by query qi 0 otherwise
use(qi,) can be defined accordingly
VF Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ q1: SELECT BUDGET FROM PROJ WHERE PNO=Value q2: SELECT PNAME,BUDGET FROM PROJ q3: SELECT PNAME FROM PROJ WHERE LOC=Value q4: SELECT SUM(BUDGET) WHERE LOC=Value Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC
VF Affinity Measure aff(Ai,Aj)

The attribute affinity measure between two attributes Ai and Aj of a relation R[A1, A2, , An] with respect to the set of applications Q = (q1, q2, , qq) is defined as follows : Dimana :
VF Calculation of aff(Ai, Aj)

Assume each query in the previous example accesses the attributes once during each execution
Also assume the access frequencies Then aff(A1, A3) = 15*1 + 20*1+10*1 = 45 and the attribute affinity matrix AA is
VF Clustering Algorithm
Take the attribute affinity matrix AA and reorganize the attribute orders to form clusters where the attributes in each cluster demonstrate high affinity to one another. Bond Energy Algorithm (BEA) has been used for clustering of entities. BEA finds an ordering of entities (in our case attributes) such that the global affinity measure
is maximized.
Bond Energy Algorithm

Input: The AA matrix Output: The clustered affinity matrix CA which is a perturbation of AA Initialization: Place and fix one of the columns of AA in CA. Iteration: Place the remaining n-i columns in the remaining i+1 positions in the CA matrix. For each column, choose the placement that makes the most contribution to the global affinity measure. Row order:Order the rows according to the column ordering.
Bond Energy Algorithm

Best placement? Define contribution of a placement:
where
BEA Example
Consider the following AA matrix and the corresponding CA matrix where A1 and A2 have been placed. Place A3:
Ordering (0-3-1) : cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)2bond(A0 , A1) = 2* 0 + 2* 4410 2*0 = 8820 Ordering (1-3-2) : cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)2bond(A1,A2) = 2* 4410 + 2* 890 2*225 = 10150 Ordering (2-3-4) : cont (A2,A3,A4) = 1780
BEA Example
Therefore, the CA matrix has to form
BEA Example
When A4 is placed, the final form of the CA matrix (after row organization) is
VF Algorithm
How can you divide a set of clustered attributes {A1, A2, , An} into two (or more) sets {A1, A2, , Ai} and {Ai, , An} such that there are no (or minimal) applications that access both (or more than one) of the sets.
VF ALgorithm
Define TQ = set of applications that access only TA BQ = set of applications that access only BA OQ = set of applications that access both TA and BA and CTQ = total number of accesses to attributes by application that access only TA CBQ = total number of accesses to attributes by applications that access only BA COQ = total number of accesses to attributes by applications that access both TA and BA Then find the point along the diagonal that maximizes CTQCBQCOQ2
VF Algorithm
Two problems : Cluster forming in the middle of the CA matrix
Shift a row up and a column left and apply the algorithm to find the best partitioning point Do this for all possible shifts Cost O(m2)
More than two clusters

m-way partitioning try 1, 2, , m1 split points along diagonal and try to find the best point for each of these Cost O(2m)
VF Correctness
A relation R, defined over attribute set A and key K, generates the vertical partitioning FR = {R1, R2, , Rr}. Completeness The following should be true for A: A = ARi Reconstruction Reconstruction can be achieved by R= K RiRi FR Disjointness
TID's are not considered to be overlapping since they are maintained by the system
Duplicated keys are not considered to be overlapping
Hybrid Fragmentation

Vertical Fragmentation

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Vertical Fragmentation

Hochgeladen von

Copyright:

Verfügbare Formate

Vertical Fragmentation

We do not consider the replicated key attributes to be overlapping. Advantage:

Attribute usage values

1= if attribute Aj is referenced by query qi 0 otherwise

use(qi,) can be defined accordingly

VF Affinity Measure aff(Ai,Aj)

VF Calculation of aff(Ai, Aj)

Bond Energy Algorithm

Bond Energy Algorithm

More than two clusters

Duplicated keys are not considered to be overlapping

Das könnte Ihnen auch gefallen