Beruflich Dokumente
Kultur Dokumente
Vertical Fragmentation
Has been studied within the centralized context
design methodology physical clustering
More difficult than horizontal, because more alternatives exist. Two approaches :
grouping
attributes to fragments
splitting
relation to fragments
Vertical Fragmentation
Overlapping fragments
grouping
Non-overlapping fragments
splitting
VF Information Requirements
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are This is obtained from more primitive usage data
use(qi,Aj) =
VF Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ q1: SELECT BUDGET FROM PROJ WHERE PNO=Value q2: SELECT PNAME,BUDGET FROM PROJ q3: SELECT PNAME FROM PROJ WHERE LOC=Value q4: SELECT SUM(BUDGET) WHERE LOC=Value Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC
VF Clustering Algorithm
Take the attribute affinity matrix AA and reorganize the attribute orders to form clusters where the attributes in each cluster demonstrate high affinity to one another. Bond Energy Algorithm (BEA) has been used for clustering of entities. BEA finds an ordering of entities (in our case attributes) such that the global affinity measure
is maximized.
where
BEA Example
Consider the following AA matrix and the corresponding CA matrix where A1 and A2 have been placed. Place A3:
Ordering (0-3-1) : cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)2bond(A0 , A1) = 2* 0 + 2* 4410 2*0 = 8820 Ordering (1-3-2) : cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)2bond(A1,A2) = 2* 4410 + 2* 890 2*225 = 10150 Ordering (2-3-4) : cont (A2,A3,A4) = 1780
BEA Example
Therefore, the CA matrix has to form
BEA Example
When A4 is placed, the final form of the CA matrix (after row organization) is
VF Algorithm
How can you divide a set of clustered attributes {A1, A2, , An} into two (or more) sets {A1, A2, , Ai} and {Ai, , An} such that there are no (or minimal) applications that access both (or more than one) of the sets.
VF ALgorithm
Define TQ = set of applications that access only TA BQ = set of applications that access only BA OQ = set of applications that access both TA and BA and CTQ = total number of accesses to attributes by application that access only TA CBQ = total number of accesses to attributes by applications that access only BA COQ = total number of accesses to attributes by applications that access both TA and BA Then find the point along the diagonal that maximizes CTQCBQCOQ2
VF Algorithm
Two problems : Cluster forming in the middle of the CA matrix
Shift a row up and a column left and apply the algorithm to find the best partitioning point Do this for all possible shifts Cost O(m2)
VF Correctness
A relation R, defined over attribute set A and key K, generates the vertical partitioning FR = {R1, R2, , Rr}. Completeness The following should be true for A: A = ARi Reconstruction Reconstruction can be achieved by R= K RiRi FR Disjointness
TID's are not considered to be overlapping since they are maintained by the system
Hybrid Fragmentation