Sie sind auf Seite 1von 18

Vertical Fragmentation

Vertical Fragmentation
Has been studied within the centralized context
design methodology physical clustering

More difficult than horizontal, because more alternatives exist. Two approaches :
grouping
attributes to fragments

splitting
relation to fragments

Vertical Fragmentation
Overlapping fragments
grouping

Non-overlapping fragments
splitting

We do not consider the replicated key attributes to be overlapping. Advantage:


Easier to enforce functional dependencies (for integrity checking etc.)

VF Information Requirements
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are This is obtained from more primitive usage data

Attribute usage values


Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An], use(qi,) can be defined accordingly

use(qi,Aj) =

1= if attribute Aj is referenced by query qi 0 otherwise

use(qi,) can be defined accordingly

VF Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ q1: SELECT BUDGET FROM PROJ WHERE PNO=Value q2: SELECT PNAME,BUDGET FROM PROJ q3: SELECT PNAME FROM PROJ WHERE LOC=Value q4: SELECT SUM(BUDGET) WHERE LOC=Value Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC

VF Affinity Measure aff(Ai,Aj)


The attribute affinity measure between two attributes Ai and Aj of a relation R[A1, A2, , An] with respect to the set of applications Q = (q1, q2, , qq) is defined as follows : Dimana :

VF Calculation of aff(Ai, Aj)


Assume each query in the previous example accesses the attributes once during each execution
Also assume the access frequencies Then aff(A1, A3) = 15*1 + 20*1+10*1 = 45 and the attribute affinity matrix AA is

VF Clustering Algorithm
Take the attribute affinity matrix AA and reorganize the attribute orders to form clusters where the attributes in each cluster demonstrate high affinity to one another. Bond Energy Algorithm (BEA) has been used for clustering of entities. BEA finds an ordering of entities (in our case attributes) such that the global affinity measure

is maximized.

Bond Energy Algorithm


Input: The AA matrix Output: The clustered affinity matrix CA which is a perturbation of AA Initialization: Place and fix one of the columns of AA in CA. Iteration: Place the remaining n-i columns in the remaining i+1 positions in the CA matrix. For each column, choose the placement that makes the most contribution to the global affinity measure. Row order:Order the rows according to the column ordering.

Bond Energy Algorithm


Best placement? Define contribution of a placement:

where

BEA Example
Consider the following AA matrix and the corresponding CA matrix where A1 and A2 have been placed. Place A3:

Ordering (0-3-1) : cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)2bond(A0 , A1) = 2* 0 + 2* 4410 2*0 = 8820 Ordering (1-3-2) : cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)2bond(A1,A2) = 2* 4410 + 2* 890 2*225 = 10150 Ordering (2-3-4) : cont (A2,A3,A4) = 1780

BEA Example
Therefore, the CA matrix has to form

BEA Example
When A4 is placed, the final form of the CA matrix (after row organization) is

VF Algorithm
How can you divide a set of clustered attributes {A1, A2, , An} into two (or more) sets {A1, A2, , Ai} and {Ai, , An} such that there are no (or minimal) applications that access both (or more than one) of the sets.

VF ALgorithm
Define TQ = set of applications that access only TA BQ = set of applications that access only BA OQ = set of applications that access both TA and BA and CTQ = total number of accesses to attributes by application that access only TA CBQ = total number of accesses to attributes by applications that access only BA COQ = total number of accesses to attributes by applications that access both TA and BA Then find the point along the diagonal that maximizes CTQCBQCOQ2

VF Algorithm
Two problems : Cluster forming in the middle of the CA matrix
Shift a row up and a column left and apply the algorithm to find the best partitioning point Do this for all possible shifts Cost O(m2)

More than two clusters


m-way partitioning try 1, 2, , m1 split points along diagonal and try to find the best point for each of these Cost O(2m)

VF Correctness
A relation R, defined over attribute set A and key K, generates the vertical partitioning FR = {R1, R2, , Rr}. Completeness The following should be true for A: A = ARi Reconstruction Reconstruction can be achieved by R= K RiRi FR Disjointness
TID's are not considered to be overlapping since they are maintained by the system

Duplicated keys are not considered to be overlapping

Hybrid Fragmentation

Das könnte Ihnen auch gefallen