Normalization

Database Management
System

Normalization Process

This Lecture
Schema Refinement
Normalization

Schema Refinement - Review
Conceptual Modeling is a subjective
process
Therefore, the schema after the logical
database design phase may not be very
good (contain redundancies)
However, there are formalisms to
ensure that the schema is good.
This process is called Normalization
Schema Refinement Review
(contd.)
Relational database schema = set of
relations

Relation = set of attributes

How we group the attributes to
relations is very important
(contd.)
Too many attributes in a relation
Waste space
Anomalies

Decomposing the relation into too
smaller set of relations
Loss-less join property
Dependency preserving property
(contd.)
Too many attributes

For example,

LECTURER(id, name, address, salary,
deptno,dname building)
(contd.)
Insertion Anomaly
1. Inserting a new lecturer to the
LECTURER table
- Department information is repeated
(ensure that correct department
information is inserted).
(contd.)

2. Inserting a department with no
employees
(Impossible b/c null values for id is
not allowed)
(contd.)
Deletion Anomalies

Deleting the last lecturer from the
department will lose information about
the department
(contd.)
Update Anomalies

Updating the departments building
needs to be done for all lecturers
working for that department
(contd.).
When redundancies exists, we should
decompose the relations to smaller
relations

(contd.)
Decomposing the relation into too
smaller relations

Loss-less join property: we might lose
information if we decompose relations

Dependency-preserving property: The
set of dependencies in S can be verified
by a set of dependencies in R
1
and R
2

(contd.)
Loss-less join property:
For example,

S P D
S1 P1 D1
S2 P2 D2
S3 P1 D3
S P
S1 P1
S2 P2
S3 P1
P D
P1 D1
P2 D2
P1 D3
S
R
1
R
2
(contd.)
Joining them together, we get spurious
tuples

S P D
S1 P1 D1
S1 P1 D3
S2 P2 D2
S3 P1 D1
S3 P1 D3
R
1
R
2
(contd.)
To avoid the above mentioned issues
in the relational schema, we can apply
a formal process called Normalization

Normalization is based on functional
dependencies
(contd.)
Key points:
Redundancy is based on functional
dependencies
Therefore, normalization is based on
functional dependencies
(contd.)
Given some FDs, we can usually infer additional FDs:
A B, B C implies A C

An FD f is implied by a set of FDs F if f holds whenever
all FDs in F hold.
F
+
= closure of F is the set of all FDs that are implied by F.

How can we get F
+
?

(contd.)
Armstrongs Axioms (X, Y, Z are sets of
attributes):
Reflexivity: If X Y, then Y X
Augmentation: If X Y, then XZ YZ
for any Z
Transitivity: If X Y and Y Z, then
X Z
These are sound and complete inference
rules for FDs!

(contd.)
Couple of additional rules (that follow from AA):
Union: If X Y and X Z, then X YZ
Decomposition: If X YZ, then X Y and X Z

Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
C is the key: C CSJDPQV
Project purchases each part using single contract: JP C
Dept purchases at most one part from a supplier: SD P

JP C, C CSJDPQV imply JP CSJDPQV
SD P implies SDJ JP
SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

(contd.)
Why is F
+
important?
X RHS in relation R

X is a subset of attributes in relation R. If RHS contains
all attributes of R, then X is a superkey.
If X is not a superkey, then values for X can repeat in
different tuples resulting in redundancy!!!
So determining F
+
can help us find superkeys and
check for any redundancy.
(contd.)
Computing the closure of a set of FDs can be expensive.
(Size of closure is exponential in # attrs!)

Typically, we just want to check if a given FD X Y is in
the closure of a set of FDs F
+
. An efficient check:
Compute attribute closure of X (denoted X
+
) wrt F:
Set of all attributes A such that X A is in F
+

There is a linear time algorithm to compute this.
Check if Y is in X
+

(contd.)
Algorithm to find X
+
:
closure = X;
repeat until there is no change: {
If there is an FD U V in F such that U closure
then set closure = closure V
}
Does F = {A B, B C, CD E } imply A E?
i.e, is A E in the closure F
+
? Equivalently, is E in
A
+
?

We can use the attribute closure to find out keys of the
relation. If X
+
contains all attributes of the relation, then X
is a superkey.
(contd.)
Schema Refinement Steps:
Determine F for relation R
Find all keys in F using attribute closure
Normalize
(contd.)
There are many Normal Forms
proposed to reduce redundancies

Some of the well-known ones are:
1
st
Normal Form
2
nd
Normal Form
3
rd
Normal Form
Boyce-Codd Normal Form
(contd.)
Review of some terms
Candidate Key: Each key of a relation is called a
candidate key

Primary Key: A candidate key is chosen to be the
primary key

Prime Attribute: an attribute which is a member of
a candidate key

Nonprime Attribute: An attribute which is not
prime
(contd.)

1
st
Normal Form
A relation R is in first normal form (1NF)
if domains of all attributes in the
relation are atomic (simple &
indivisible).

(contd.)
2
nd
Normal Form:
A relation R is in second normal form
(2NF) if every nonprime attribute A in R
is not partially dependent on any key of
R

(contd.)
Example

EMP_PROJ
NIC PNUM HOURS ENAME PNAME LOC
FD1
FD2
FD3
(contd.)
NIC PNUM HOURS
NIC ENAME
PNUM PNAME PLOC
EP1
EP2
EP3
(contd.)
3
rd
Normal Form:
A relation R is in 3
rd
normal form (3NF)
if every
R is in 2NF, and
No nonprime attribute is transitively
dependent on any key

(contd.)
Example,

ENAME SSN BDATE ADD DNUM DNAME DMGR
EMP_DEPT
(contd.)
ED1

ED2

ENAME SSN BDATE ADD DNUM
DNUM DNAME DMGR
(contd.)
Boyce-Codd Normal Form (BCNF):
A relation schema is in Boyce-Codd
Normal Form
If every nontrivial functional dependency
XA hold in R, then X is a superkey of R

(contd.)

Keys: PropertyID, (County_Name, Lot#)
PROPERTY_
ID
COUNTY
_NAME
LOT# AREA PRICE TAX_
RATE
FD1
FD2
FD3
FD4
FD5
(contd.)
Decomposition into BCNF:
Consider relation R with FDs F. If X Y violates BCNF,
decompose R into R - Y and XY.
Repeated application of this idea will give us a collection of
relations that are in BCNF; lossless join decomposition, and
guaranteed to terminate.
e.g., CSJDPQV, key C, JP C, SD P, J S
To deal with SD P, decompose into SDP, CSJDQV.
To deal with J S, decompose CSJDQV into JS and CJDQV
In general, several dependencies may cause violation of BCNF.
The order in which we deal with them could lead to very
different sets of relations!
(contd.)
In general, there may not be a dependency preserving
decomposition into BCNF.
e.g., CSZ, CS Z, Z C
Cant decompose while preserving 1st FD; not in BCNF.

Similarly, decomposition of CSJDQV into SDP, JS and CJDQV is
not dependency preserving (w.r.t. the FDs JP C, SD P
and J S).
However, it is a lossless join decomposition.
In this case, adding JPC to the collection of relations gives
us a dependency preserving decomposition.
JPC tuples stored only for checking FD! (Redundancy!)

(contd.)
Obviously, the algorithm for lossless join decomp into
BCNF can be used to obtain a lossless join decomp
into 3NF (typically, can stop earlier).

To ensure dependency preservation, one idea:
If X Y is not preserved, add relation XY.
Problem is that XY may violate 3NF! e.g., consider
the addition of CJP to `preserve JP C. What if we
also have J C ?

Refinement: Instead of the given set of FDs F, use a
minimal cover for F.

(contd.)
Minimal cover G for a set of FDs F:
Closure of F = closure of G.
Right hand side of each FD in G is a single attribute.
If we modify G by deleting an FD or by deleting
attributes from an FD in G, the closure changes.

General alg. to obtain minimal cover:
Put the FDs in a standard form (i.e. single attribute in
RHS).
Minimize the Left side of each FD. For each FD, check
if we can delete attributes in LHS while preserving
equivalence to F
+
.
Delete any redundant FDs.
(contd.)
Intuitively, every FD in G is needed, and as small as
possible in order to get the same closure as F.
e.g., A B, ABCD E, EF GH, ACDF EG has
the following minimal cover:
A B, ACD E, EF G and EF H

Dependency Preserving 3NF decomposition:
Let R
1
, R
2
, , R
n
be a lossless-join decomposition of R
with a minimal cover F
Let N be dependencies of F which are not preserved
For each FD, X A in N, add XA to the decomposition
of R

(contd.)
1st diagram translated:
Workers(S,N,L,D,S)
Departments(D,M,B)
Lots associated with
workers.
Suppose all workers in a
dept are assigned the same
lot: D L
Redundancy; fixed by:
Workers2(S,N,D,S)
Dept_Lots(D,L)
Can fine-tune this:
Workers2(S,N,D,S)
Departments(D,M,B,L)
lot
dname
budget did
since
name
Works_In
Departments Employees
ssn
lot
dname
budget
did
since
name
Works_In
Departments Employees
ssn
Before:
After:
Refining an ER Diagram
Exercise
1. Consider the following two sets of functional
dependencies

F= {A ->C, AC ->D,E ->AD, E ->H}
and
G = {A ->CD, E ->AH}

Check whether or not they are equivalent.

To show equivalence, we prove that G is covered by F
and F is covered by G.
Proof that G is covered by F:
{A} + = {A, C, D} (with respect to F),
which covers A ->CD in G
{E} + = {E, A, D, H, C} (with respect to F),
which covers E ->AH in G

Proof that F is covered by G:
{A} + = {A, C, D} (with respect to G),
which covers A ->C in F
{A, C} + = {A, C, D} (with respect to G),
which covers AC ->D in F
{E} + = {E, A, H, C, D} (with respect to G),
which covers E ->AD and E ->H in F
2. Consider the relation schema EMP_DEPT and the following
set F of functional dependencies on EMP_DEPT:
F = {SSN ->{ENAME, BDATE,ADD, DNUM} ,
DNUM ->{DNAME, DMGR} }
Calculate the closures {SSN} + and {DNUM} + with respect to
F.
ENAME SSN BDATE ADD DNUM DNAME DMGR
EMP_DEPT
Answer:
{SSN} + ={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}
{DNUM} + ={DNUM, DNAME, DMGR}

3. Is the set of functional dependencies F in Exercise 2
minimal? If not, try to find an minimal set of functional
dependencies that is equivalent to F. Prove that your set is
equivalent to F.

Answer:
The set F of functional dependencies in Exercise 2 is not
minimal, because it violates rule 1 of minimality (every FD has
a single attribute for its right hand side).

The set G is an equivalent minimal set:
G= {SSN ->{ENAME}, SSN ->{BDATE},
SSN->{ADD}, SSN ->{DNUM} ,
DNUM ->{DNAME}, DNUM->{DMGR}}

To show equivalence, we prove that F is covered by G
and G is covered by F.

Proof that F is covered by G:
{SSN}+={SSN, ENAME, BDATE, ADD, DNUM,
DNAME, MGR}
(with respect to G), which covers
SSN ->{ENAME, BDATE, ADDRESS, DNUMBER} in F

(with respect to G), which covers
DNUM ->{DNAME, DMGR} in F
Proof that G is covered by F:
{SSN}+={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}
(with respect to F), which covers
SSN ->{ENAME}, SSN ->{BDATE}, SSN ->{ADD}, and
SSN ->{DNUM} in G

(with respect to F), which covers DNUM ->{DNAME} and
DNUM->{DMGR} in G

Normalization

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Normalization

Hochgeladen von

Copyright:

Verfügbare Formate

Database Management

Das könnte Ihnen auch gefallen