Sie sind auf Seite 1von 32

Part 6 Normalization

Normal Form Overview


Universe of All Data Relations (normalized / unnormalized 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form (BCNF) 4th Normal Form 5th Normal Form (PJ/NF) Domain/Key Normal Form (DK/NF)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 2

Universe of Relations
Any sequential file is a relation Not all relations are well formed Normalization provides a set of criteria to evaluate the well formedness of a relation Normal form is only one criterion for determining a good model In general, a sequential file may have repeating groups Example 1 - suppliers:
part diode bulb part diode bulb suppliers (GE, TRW, Mot) (GE, Syl) supplier1 GE GE supplier2 TRW Syl supplier3 Mot

Implemented as:

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 3

Problem with a Relation Not in First Normal Form


Retrieval:

which supplier of a part should be retrieved? in which order should suppliers be retrieved? how many suppliers do you allow for? which spaces are kept blank, and how? to add a supplier, need to retrieve all suppliers, add the new supplier to an empty slot, and replace the record to delete a supplier, need to adjust the vector (read, move around, erase, re-write) to update a supplier name, need to retrieve all suppliers, find the one to alter, and rewrite entire set of suppliers

Storage:

Insert:

Delete:

Update:

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 4

Solution to Repeating Group Problem


Eliminate repeating groups by repeating the key
Example 1 - suppliers:
part diode diode diode bulb bulb supplier GE TRW Mot GE Syl

This new table has a different key than the old one. It is part plus supplier.

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 5

First Normal Form


All underlying domains contain atomic values only (no vectors / repeating groups) Example 2 - inventory:
part # 100 100 200 200 300 warehouse # 05 08 05 10 08 wh_address Mpls StPaul Mpls Madison StPaul quantity 200 300 250 400 350

Update Anomalies:
UPDATE

address of warehouse stored in many rows if address changes, must change all rows if the last row for a warehouse is deleted, the address is lost to insert a new row, warehouse address must be known

DELETE INSERT

The problem occurs because this table is not focused on one primary key - it is about two things warehouses and parts in warehouses.

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 6

Solution to Multiple Focus Problems


A relation that is in 1NF but not in a higher normal form has a composite key (more than one attribute in the key) Establish 2 relations via projection
Example 2 - inventory:
One table about warehouses: warehouse# wh_address 05 Mpls 08 StPaul 10 Madison One table about inventory with a composite key: part# warehouse# quantity 100 05 200 100 08 300 200 05 250 200 10 400 300 08 350

The original table in 1NF can be reconstructed by a join

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 7

Second Normal Form


1NF + every non-key attribute is fully functionally dependent on the primary key
Example 3 - departments:
name smith jones king turner olson dept 402 401 402 400 401 dept_loc 100 200 100 200 200

Problem:

Functional dependency is transitive

The primary key is name dept is functionally dependent on name dept_loc is also functionally dependent on name, but it is transitive because dept functionally determines dept_loc

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 8

Problems with 2NF Relations


Update Anomalies:
UPDATE - location appears many times - if location of a department changes, must fetch and change all rows containing that location DELETE - if the last row for a department is deleted, the department location information is lost INSERT - to insert a new row, department location must be known

Solution: Establish 2 relations via projection


Example 3 - departments:
name smith jones king turner olson dept 402 401 402 400 401 and dept 400 401 402 dept_loc 200 200 100

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 9

Third Normal Form


2NF + every non-key attribute is nontransitively functionally dependent on the primary key OR Every non-key attribute is

mutually independent (none is functionally dependent on any of the others) fully functionally dependent on the primary key

OR (Kent) Each attribute in the relation is functionally dependent on the key, the whole key, and nothing but the key A relation that is 2NF but not 3NF

can be split into a collection of 3NF relations by projection can be reconstructed by join

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 10

3NF Examples
Example 4 - locations:
dept# 400 401 402 403

dept_name programming financial academic support

dept_loc 200 200 100 300

dept# and dept_name are candidate keys dept_loc is the only non-key attribute, and is, by default, non-transitively functionally dependent on the primary key This table is fine - it is only about departments

Example 5 - stock:
s# 10 10 10 20 20 30

sname GE GE GE TRW TRW Syl

p# 102 103 104 102 105 103

qty 1000 625 2000 500 1200 1300

technically in 3NF

qty is the only non-key attribute (like example 1) candidate keys are (s#, p#) and (sname, p#) didn't require components of an alternate key to be fully functionally dependent on the primary key

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 11

Problems with 3NF Relations


The problems associated with alternate key components were not recognized in the early formulations of the relational model. Have the same update anomalies as second normal form Solution: Establish 2 relations via projection
Example 5 - stock:
s# 10 20 30 sname GE TRW Syl and s# 10 10 10 20 20 30 p# 102 103 104 102 105 103 qty 1000 625 2000 500 1200 1300

or [s#, sname] and [sname, p#, qty] Because of this problem, 3NF (as we have described it) is sometimes referred to as early 3rd Normal Form

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 12

BCNF Boyce-Codd Normal Form


3NF + every determinant is a candidate key
(Determinant: any attribute on which some other attribute is fully functionally dependent)

In example 4, dept# determined dept_name; in example 5, s# determined sname In example 4, dept# was a candidate key In example 5, s# (by itself) was not a candidate key

A relation that is 3NF but not BCNF

Can be split into a collection of BCNF relations by projection Can be reconstructed by join

BCNF is sometimes referred to as late 3rd Normal Form, or even just as 3rd Normal Form

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 13

3NF to BCNF Example


Example 6 - enrollment
Rules:
1. For each subject, each student is taught by 1 teacher 2. Each teacher teaches only 1 subject (don't I wish) 3. Each subject is taught by several teachers
Student Smith Smith Jones Jones Doe a. b. c. d. e. Subject Math English Math English Math Teacher Dr. White Dr. Brown Dr. White Dr. Brown Dr. Green

Teacher dependent on Student + Subject Subject dependent on Teacher Teacher not dependent on Subject (Student, Subject) is a candidate key (Student, Teacher) is also a candidate key

Update anomalies, e.g., Dr. White changes name Relation in 3NF, but not in BCNF Teacher is a determinant (b.), but not a candidate key (d. and e.)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 14

Solution to Example 6
Solution: Form two relations:
Student Smith Smith Jones Jones Doe Teacher Dr. White Dr. Brown Dr. White Dr. Brown Dr. Green and Teacher Dr. White Dr. Brown Dr. Green Subject Math English Math

Question: How did we know to break it up this way? Answer: The rules help us make this decision. In this case, rule 2 gives us the crucial information - once you know the teacher, you know the subject. Therefore, we need two tables to enforce the rule. The [Teacher, Subject] table tells us which one subject each teacher teaches. Students, in general, need both a subject and a teacher If we specify only subject, we don't know the teacher If we specify teacher, however, we do know the subject because of the rule and the first table

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 15

Fourth Normal Form


4NF and 5NF are relevant only when all attributes in the relation are parts of the key

if in BCNF and have a non-key attribute, also in 5NF

Example 7 - skills:
Suppose we wish to store employee job skills and language skills. (An employee may have many of each.) employee skill language Jones electrical French Jones electrical German Jones mechanical French Jones mechanical German Smith plumbing Spanish

In general:

if and then and

Jones x Jones y Jones x Jones y

A B B A

The relation is in BCNF - because it is all key ... but there is redundancy

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 16

Converting to 4NF
Ask the following questions:

Could the relation have non-key attributes? Could any combination be missing?

If either answer is NO, need to break up relation to achieve 4NF


Example 7 - skills:
employee skill language

should be broken up into two relations:


employee Jones Jones Smith skill electrical mechanical plumbing and employee Jones Jones Smith language French German Spanish

if job skill and language are independent

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 17

Problems without 4NF


Problem occurs when dealing with multiple, independent facts How do we represent them in a single relation?
Disjoint: Jones Jones Jones Jones Smith Smith Jones Jones Smith electrical mechanical French German plumbing Spanish electrical mechanical plumbing French German Spanish

Random mix:

(do extras - repeat, - blank, - anything?) Cross product: Jones Jones Jones Jones Smith electrical mechanical electrical mechanical plumbing French French German German Spanish

Check for independence!

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 18

Fifth Normal Form


PJ/NF or Projection-Join Normal Form
(Kent) - Deals with cases where information can be reconstructed from smaller pieces of information which can be maintained with less redundancy

Example 8 - dealerships:
1. Agents represent Companies 2. Companies make Products 3. Agents sell Products

Which Agent sells which Product for which Company?


Agent smith smith jones

Company ford gm ford

Product car truck car

this form is necessary in the general case


4. if an agent sells a product, and an agent represents a company, then the agent must sell the product made by the company

BUT if we put a rule into effect that reads:

So, to obey the rule, we must add


smith smith ford gm truck car

NOW, with the rule and the new rows, we have REDUNDANCY

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 19

Converting to 5NF
This time, we must break the relation into three parts (will not break in two)
Example 8 - dealerships:
Agent smith smith jones smith smith Company ford gm ford ford gm Product car truck car truck car

BREAK INTO 3
Agent Company Agent Product Company Product

smith smith jones

ford gm ford

smith smith jones

car truck car

ford ford gm gm

car truck car truck

A relation is already in 5NF if its information content cannot be reconstructed from several smaller record types (having different keys)

Only have 5NF problems if there are symmetry constraints (a pair of rows requires the existence of one or more additional rows)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 20

Domain/Key Normal Form


No insertion/deletion anomalies Impossible to make an insertion/deletion that violates a constraint Constraint types:

domain constraints key constraints

Example 9 - customers
cust# 1234 1325 1421 1511 branch west south east south

where valid branches are west, east, and south

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 21

Enforcing Domain Integrity in DK/NF


Example 9 - customers:
cust# 1234 1325 1421 1511 1600 branch west south east south north

If this update is possible, not in DK/NF


One possibility for prohibiting this update is to maintain a table of legal branches and write code to prohibit the entry of a branch not in the table
legal branch west south east

Problem: What's to stop someone from placing north in the legal branch table? Possible partial solution: Restrict access to the legal branch table

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 22

Normalization Example: AutoCAD Database


Manufacturing plant electrical wiring specifications Blueprints contain:
parts at locations wired connections attributes for each wire and location

AutoCAD transmits variable-length records

only transmits data for smart parts one record per part all data must be related to one or more parts

Objectives:

number of wires from any source to any destination sub-classified by voltage, shielding, and intrinsic safety characteristics obtain conduit count from wire count (by hand)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 23

Wiring Problem Statement

Part: AS 303

Wire 52

Part: AS 404 Wire 65 Part: AS 405

Wire 53

Panel (3)

Panel (4)

Problem:

Count the wires going from panel (3) to panel (4)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 24

Wiring Database Normalization


0th Normal Form
Part# AS303 AS404 AS405 Loc (3) (4) (4) Loc_Desc Panel 3 Panel 4 Panel 4 Wire1 52 52 53 Volt IS Wire2 Volt IS Wire3 ... 240 53 24 IS 54 240 55 120 24 IS

1st Normal Form (no repeating groups) but the 2-part key creates partial dependencies
Part# AS303 AS303 AS303 AS404 AS404 AS405 Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Loc (3) (3) (3) (4) (4) (4) Loc (3) (3) (3) (4) (4) (4) Volt 240 24 240 120 Loc_Desc Panel 3 Panel 3 Panel 3 Panel 4 Panel 4 Panel 4 Loc_Desc Panel 3 Panel 3 Panel 3 Panel 4 Panel 4 Panel 4 IS IS Wire# 52 53 54 52 55 53 Wire# 52 53 54 52 55 53 Volt 240 24 240 240 120 24 IS IS

IS

2nd Normal Form (no partial dependencies)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 25

Wiring Database Normalization (Continued)


2nd Normal Form (apply rule again) with the step-by-step approach, you only eliminate one partial dependency at a time
Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Part# AS303 AS404 AS405 Wire# 52 53 54 52 55 53 Volt 240 24 240 120 Loc (3) (4) (4) IS IS

Loc_Desc Panel 3 Panel 4 Panel 4

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 26

Wiring Database Normalization (Concluded)


3rd Normal Form (no transitive dependencies)
Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Part# AS303 AS404 AS405 Loc (3) (4) Wire# 52 53 54 52 55 53 Volt 240 24 240 120 Loc (3) (4) (4) Loc_Desc Panel 3 Panel 4 IS IS

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 27

A Case Study (March)


Population:

MBA students MIS majors last quarter in the program Nomad 2 4GL Customer, with attributes Dealer, with attributes Manufacturer, with attributes Contracts - customer, dealer, manufacturer, with symmetry constraints Given case description fully analyzed Use existing Nomad database Perform 8 queries Perform 11 updates

Language:

Case:

Task:

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 28

Relations Given in Case Study (March)


Design:

3 groups of 14 students same case, same queries, same updates different schemas (1NF, 3NF, and 5NF)

1NF Schema:
(d#, m#, c#, mfgr_attr, cust_attr, dealer_attr)

3NF Schema:
(c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#, m#)

5NF Schema:
(c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#) (d#, m#) (c#, m#)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 29

Preliminary Case Results (March)


Tasks Correctly Performed
Normal Form: First Third Fifth Queries (8) 7.21 (90%) 4.50 (56%) 4.42 (55%) Updates (11) 5.07 (46%) 3.64 (33%) 3.21 (29%)

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 30

Database Design Issues Revisited


Ease of query formulation Ease of enforcing referential integrity constraints Ease of avoiding update anomalies Normalization focuses only on avoiding update anomalies Being normal is not enough Possible solutions:
1. Don't normalize 2. Don't normalize beyond BCNF 3. Normalize to 5NF, but back off Problems with 1-3: update anomalies, bad data, knowledge of database storage needed 4. Don't let users at base tables 5. Create views that are in low normal forms 6. Pre-define joins that give users the data they need Solutions 4-6 are more work, but generally worth the effort

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 31

Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 32