Sie sind auf Seite 1von 19

Schema Refinement:

Normal Forms
Normal Forms
 Given <R, F>, a relation schema R together with a set of
FD’s, we want to determine if R is in a “good” shape!
 If not, we need to decompose R into smaller “good”
relations;
 How to measure this goodness and how to achieve it?
 To address these issues, we need to study normal forms
 If a relation schema is in some normal form, we know that
it is in some “good” shape, in the sense that it won’t suffer
from certain kinds of (redundancy) problems.
Normal Forms
 The normal forms based on FD’s are
 First normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)
 Boyce-Codd normal form (BCNF)
 These normal forms have increasingly restrictive
requirements

BCNF 3NF 2NF 1NF


First & Second Normal Forms
A relation scheme is said to be in first normal from (1NF) if the
values in the domain of each attribute of the relation are
atomic. In other words, only one value is associated with
each attribute and the value is not a set of values or a list of
values. A database scheme is in first normal form if every
relation scheme included in the database scheme is in 1NF.

A relation scheme R<S,F> is in second normal from (2NF) if it


is in the 1NF and if all nonprime attributes are fully
functionally dependent on the relation key(s). A database
scheme is in second normal form if every relation scheme
included in the database scheme is in second normal form.
Third Normal Form
Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and
A ∈ R.
 We say R w.r.t. F is in third normal form (3NF), if for each
FD X  A in F, at least one of the following conditions holds:
 A  X (that is, X  A is a trivial FD), or
 X is a superkey, or
 If X is not a key, then A is part of some key of R

 To determine whether <R, F> is in 3NF:


 For every non-trivial FD X  A in F, we check whether X is a
superkey. If not, we then check whether its RHS, A, is part of any
key of R. If both conditions fail, we conclude that R is not in
3NF w.r.t. F.
Boyce-Codd Normal Form
Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and
A ∈ R.
 We say R w.r.t. F is in Boyce-Codd normal form (BCNF), if
for each FD X  A in F, at least one of the following holds:
 A  X (that is the FD is trivial) or
 X is a superkey
 To determine whether <R, F> is in BCNF or not, we check
every non-trivial FD in F.
 If there exists a FD X  A in F such that X+ ≠ R, then

R is not in BCNF. Otherwise, we say R is BCNF w.r.t. F


Decomposition into BCNF
 Consider <R, F>, where R is in 1NF.
 If R is not in BCNF, we can always obtain a lossless-join
decomposition of R into a collection of BCNF relations
 However, it may not always be dependency preserving
 The basic step of a BCNF algorithm:
Suppose X  A  F is a FD violating the BCNF requirement,
where X  R and A  R
 Decompose R into XA and R – A
 If either R – A or XA is not in BCNF, decompose it further
Example
R = ABCDE
F = { A  B, C  D }

AB
R1 = AB R2 = ACDE
F1 = { A  B } F2 = { C  D }

CD

R21 = CD R22 = ACE


F21 = { C  D } F22 = { }
Decomposition into 3NF
 We can always obtain a lossless-join, dependency-preserving
decomposition of a relation into 3NF relations. How?
 We discuss 2 approaches to decompose <R, F>. First:
 Approach 1: Follow the binary decomposition method for BCNF
Let R = { R1, R2, . . . Rn} be the result. Recall that this is always
lossless-join, but may not preserve the FD’s; so need to fix it?
 Identify the set N of FD’s in F that are lost (i.e., not preserved)
 For each FD X  A in N, create a relation schema XA and add it to R
 A refinement step: if there are several FD’s with the same LHS, e.g., X  A1,
X  A2, . . . , X  Ak, we create just one relation with schema XA1…Ak

That is, we replace these k FD’s (having the same LHS) with a single equivalent
FD X  A1…Ak and create just one relation instead of k relation schemas
XA1, … ,XAk
Example (3NF Decomposition)
R = ABCDE
F = { BD  E, C  B , CE  A }

BD  E
R1 = BDE R2 = ABCD
F1 = { BD  E } F2 = {C  B , CD  A }

CB

R21 = CB R22 = ACD


F21 = { C  B } F22 = { CD  A }

CE  A is not preserved, since A ∉ {CE}+ w.r.t. F1 ⋃ F21 ⋃ F22


 We add to R, a new relation R3 = CEA with F3 = {CE  A }
Example (using a different order)
R = ABCDE
F = { BD  E, C  B , CE  A }  This decomposition is
dependency preserving,
CE  A
and of course lossless-join
R1 = CEA R2 = BCED
F1 = { CE  A } F2 = { C  B , BD  E }

BD  E
R22 = BCD
F22 = { C  B }
R21 = BDE
F21 = { BD  E } CB
R221 = BC R222 = CD
F221 = { C  B } F222 = 
Decomposition into 3NF
 Previous (binary decomposition approach):
 Lossless-join √
 May not be dependency preserving. If so, then add
extra relations XA, one for each FD X → A we lost
 Now, the synthesis approach
 Dependency preservation √
 However, may not be lossless-join. If so, we need to
add to R, only one extra relation schema that includes
the attributes that form any key of R
What would be the FDs on this newly added relation?
Decomposition into 3NF (synthesis)
Consider relation schema <R, F>
 The synthesis approach:
 Get a canonical cover Fc of F
 For each FD X  A in Fc, add schema XA to R

 If the decomposition R is not lossless, need to fix it.

Add to R an extra relation schema containing just those


attributes that form any key of R
Example
 R = ( A, B, C )
 F = { A  B, C  B }
 Decompose R into R1 = ( A, B ) and R2 = ( B, C )
 This decomposition is not lossless
 Add R3 = ( A, C )
 The decomposition R = {R1, R2, R3} is both
lossless and dependency-preserving
Ann Algorithm to Check Lossless join
Suppose relation R{A1 , . . . , Ak} is decomposed into R1,. . . , Rn
To determine if this decomposition is lossless, we use a table,
L[ 1 … n ] [ 1 . . . k ]

Initializing the table:

for each relation Ri do


for each attribute Aj do
if Aj is an attribute in Ri
then L [ i ][ j ]  aAj
else L [ i ][ j ]  biAj
Algorithm to Check Lossless (cont’d)
repeat
for each FD X  Y in F do:
if ∃ rows i and j such that L [ i ] == L [ j ], for each attribute in X,
then for ∀ column t corresponding to an attribute At in Y do:
if L [ i ][ t ] == aAt
then L [ j ][ t ]  aAt
else if L [ j ][ t ] == aAt
then L [ i ][ t ]  aAt
else L [ j ][ t ]  L [ i ][ t ]
until no change

The decomposition is lossless if, after performing this algorithm, L contains a


row of all a’s. That is, if there exists a row i in L such that: L [ i ][ t ] == aAt
for every column t corresponding to each attribute At in R
Examples
 Given ≺R,F≻, where R = ( A, B, C, D ), and F = { A  B,
A  C, C  D } is a set of FD’s on R
 Is the decomposition R = {R1, R2} lossless, where
R1 = ( A, B, C ) and R2 = ( C , D)?
 To be discussed in class
 Now consider S = ( A, B, C, D, E ) and the set G of FD’s
on S, where G = { AB  CD, A  E, C  D }
 Is decomposition of S = {S1, S2, S3} lossless, where
S1 = ( A, B, C ), S2 = ( B, C, D ), and S3 = ( C, D, E )?
 To be discussed in class
Dependency-Preserving Checking
 Let ≺R,F≻, where F = {X1  Y1,…, Xn  Yn}.
 Let R ={ R1,…,Rk } be a decomposition of R and Fi be the projection of F
on Ri
Below is an algorithm that decides dependency preservation.

preserved  TRUE
for each FD X  Y in F and while preserved == TRUE
do begin
compute X+ under F1  . . .  Fn ;
if Y ⊈ X+ then preserved  FALSE;
end
Example
 Consider R = ( A, B, C, D ), F = { A  B, B  C, C  D }
 Is the decomposition R = {R1, R2} dependency-preserving, where
R1 = ( A, B ), F1 = { A  B }, R2 = ( A, C , D), and F2 = { C  D, A  D, A  C }?
 Check if A  B is preserved
• Compute A+ under { A  B }  { C  D, A  D, A  C }
• A+ = { A, B, D }
• Check if B  A+
• Yes
• A B is preserved
 Check if B  C is preserved
• Compute B+ under { A  B }  { C  D, A  D, A  C }
• B+ = { B }
• Check if C  B+
• No
• B  C is not preserved

The decomposition is not dependency-preserving

Das könnte Ihnen auch gefallen