Sie sind auf Seite 1von 32

# Part 6 Normalization

## Normal Form Overview

Universe of All Data Relations (normalized / unnormalized 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form (BCNF) 4th Normal Form 5th Normal Form (PJ/NF) Domain/Key Normal Form (DK/NF)

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 2

Universe of Relations
Any sequential file is a relation Not all relations are well formed Normalization provides a set of criteria to evaluate the well formedness of a relation Normal form is only one criterion for determining a good model In general, a sequential file may have repeating groups Example 1 - suppliers:
part diode bulb part diode bulb suppliers (GE, TRW, Mot) (GE, Syl) supplier1 GE GE supplier2 TRW Syl supplier3 Mot

Implemented as:

Normalization

Part 6, Page 3

## Problem with a Relation Not in First Normal Form

Retrieval:

which supplier of a part should be retrieved? in which order should suppliers be retrieved? how many suppliers do you allow for? which spaces are kept blank, and how? to add a supplier, need to retrieve all suppliers, add the new supplier to an empty slot, and replace the record to delete a supplier, need to adjust the vector (read, move around, erase, re-write) to update a supplier name, need to retrieve all suppliers, find the one to alter, and rewrite entire set of suppliers

Storage:

Insert:

Delete:

Update:

Normalization

Part 6, Page 4

## Solution to Repeating Group Problem

Eliminate repeating groups by repeating the key
Example 1 - suppliers:
part diode diode diode bulb bulb supplier GE TRW Mot GE Syl

This new table has a different key than the old one. It is part plus supplier.

Normalization

Part 6, Page 5

## First Normal Form

All underlying domains contain atomic values only (no vectors / repeating groups) Example 2 - inventory:
part # 100 100 200 200 300 warehouse # 05 08 05 10 08 wh_address Mpls StPaul Mpls Madison StPaul quantity 200 300 250 400 350

Update Anomalies:
UPDATE

address of warehouse stored in many rows if address changes, must change all rows if the last row for a warehouse is deleted, the address is lost to insert a new row, warehouse address must be known

DELETE INSERT

The problem occurs because this table is not focused on one primary key - it is about two things warehouses and parts in warehouses.

Normalization

Part 6, Page 6

## Solution to Multiple Focus Problems

A relation that is in 1NF but not in a higher normal form has a composite key (more than one attribute in the key) Establish 2 relations via projection
Example 2 - inventory:
One table about warehouses: warehouse# wh_address 05 Mpls 08 StPaul 10 Madison One table about inventory with a composite key: part# warehouse# quantity 100 05 200 100 08 300 200 05 250 200 10 400 300 08 350

Normalization

Part 6, Page 7

## Second Normal Form

1NF + every non-key attribute is fully functionally dependent on the primary key
Example 3 - departments:
name smith jones king turner olson dept 402 401 402 400 401 dept_loc 100 200 100 200 200

Problem:

## Functional dependency is transitive

The primary key is name dept is functionally dependent on name dept_loc is also functionally dependent on name, but it is transitive because dept functionally determines dept_loc

Normalization

Part 6, Page 8

## Problems with 2NF Relations

Update Anomalies:
UPDATE - location appears many times - if location of a department changes, must fetch and change all rows containing that location DELETE - if the last row for a department is deleted, the department location information is lost INSERT - to insert a new row, department location must be known

## Solution: Establish 2 relations via projection

Example 3 - departments:
name smith jones king turner olson dept 402 401 402 400 401 and dept 400 401 402 dept_loc 200 200 100

Normalization

Part 6, Page 9

## Third Normal Form

2NF + every non-key attribute is nontransitively functionally dependent on the primary key OR Every non-key attribute is

mutually independent (none is functionally dependent on any of the others) fully functionally dependent on the primary key

OR (Kent) Each attribute in the relation is functionally dependent on the key, the whole key, and nothing but the key A relation that is 2NF but not 3NF

can be split into a collection of 3NF relations by projection can be reconstructed by join

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 10

3NF Examples
Example 4 - locations:
dept# 400 401 402 403

## dept_loc 200 200 100 300

dept# and dept_name are candidate keys dept_loc is the only non-key attribute, and is, by default, non-transitively functionally dependent on the primary key This table is fine - it is only about departments

Example 5 - stock:
s# 10 10 10 20 20 30

## qty 1000 625 2000 500 1200 1300

technically in 3NF

qty is the only non-key attribute (like example 1) candidate keys are (s#, p#) and (sname, p#) didn't require components of an alternate key to be fully functionally dependent on the primary key

Normalization

Part 6, Page 11

## Problems with 3NF Relations

The problems associated with alternate key components were not recognized in the early formulations of the relational model. Have the same update anomalies as second normal form Solution: Establish 2 relations via projection
Example 5 - stock:
s# 10 20 30 sname GE TRW Syl and s# 10 10 10 20 20 30 p# 102 103 104 102 105 103 qty 1000 625 2000 500 1200 1300

or [s#, sname] and [sname, p#, qty] Because of this problem, 3NF (as we have described it) is sometimes referred to as early 3rd Normal Form

Normalization

Part 6, Page 12

## BCNF Boyce-Codd Normal Form

3NF + every determinant is a candidate key
(Determinant: any attribute on which some other attribute is fully functionally dependent)

In example 4, dept# determined dept_name; in example 5, s# determined sname In example 4, dept# was a candidate key In example 5, s# (by itself) was not a candidate key

## A relation that is 3NF but not BCNF

Can be split into a collection of BCNF relations by projection Can be reconstructed by join

BCNF is sometimes referred to as late 3rd Normal Form, or even just as 3rd Normal Form

Normalization

Part 6, Page 13

## 3NF to BCNF Example

Example 6 - enrollment
Rules:
1. For each subject, each student is taught by 1 teacher 2. Each teacher teaches only 1 subject (don't I wish) 3. Each subject is taught by several teachers
Student Smith Smith Jones Jones Doe a. b. c. d. e. Subject Math English Math English Math Teacher Dr. White Dr. Brown Dr. White Dr. Brown Dr. Green

Teacher dependent on Student + Subject Subject dependent on Teacher Teacher not dependent on Subject (Student, Subject) is a candidate key (Student, Teacher) is also a candidate key

Update anomalies, e.g., Dr. White changes name Relation in 3NF, but not in BCNF Teacher is a determinant (b.), but not a candidate key (d. and e.)

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 14

Solution to Example 6
Solution: Form two relations:
Student Smith Smith Jones Jones Doe Teacher Dr. White Dr. Brown Dr. White Dr. Brown Dr. Green and Teacher Dr. White Dr. Brown Dr. Green Subject Math English Math

Question: How did we know to break it up this way? Answer: The rules help us make this decision. In this case, rule 2 gives us the crucial information - once you know the teacher, you know the subject. Therefore, we need two tables to enforce the rule. The [Teacher, Subject] table tells us which one subject each teacher teaches. Students, in general, need both a subject and a teacher If we specify only subject, we don't know the teacher If we specify teacher, however, we do know the subject because of the rule and the first table

Normalization

Part 6, Page 15

## Fourth Normal Form

4NF and 5NF are relevant only when all attributes in the relation are parts of the key

## if in BCNF and have a non-key attribute, also in 5NF

Example 7 - skills:
Suppose we wish to store employee job skills and language skills. (An employee may have many of each.) employee skill language Jones electrical French Jones electrical German Jones mechanical French Jones mechanical German Smith plumbing Spanish

In general:

## Jones x Jones y Jones x Jones y

A B B A

The relation is in BCNF - because it is all key ... but there is redundancy

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 16

Converting to 4NF

Could the relation have non-key attributes? Could any combination be missing?

## If either answer is NO, need to break up relation to achieve 4NF

Example 7 - skills:
employee skill language

## should be broken up into two relations:

employee Jones Jones Smith skill electrical mechanical plumbing and employee Jones Jones Smith language French German Spanish

Normalization

Part 6, Page 17

## Problems without 4NF

Problem occurs when dealing with multiple, independent facts How do we represent them in a single relation?
Disjoint: Jones Jones Jones Jones Smith Smith Jones Jones Smith electrical mechanical French German plumbing Spanish electrical mechanical plumbing French German Spanish

Random mix:

(do extras - repeat, - blank, - anything?) Cross product: Jones Jones Jones Jones Smith electrical mechanical electrical mechanical plumbing French French German German Spanish

Normalization

Part 6, Page 18

## Fifth Normal Form

PJ/NF or Projection-Join Normal Form
(Kent) - Deals with cases where information can be reconstructed from smaller pieces of information which can be maintained with less redundancy

Example 8 - dealerships:
1. Agents represent Companies 2. Companies make Products 3. Agents sell Products

## Which Agent sells which Product for which Company?

Agent smith smith jones

## this form is necessary in the general case

4. if an agent sells a product, and an agent represents a company, then the agent must sell the product made by the company

## So, to obey the rule, we must add

smith smith ford gm truck car

NOW, with the rule and the new rows, we have REDUNDANCY

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 19

Converting to 5NF
This time, we must break the relation into three parts (will not break in two)
Example 8 - dealerships:
Agent smith smith jones smith smith Company ford gm ford ford gm Product car truck car truck car

BREAK INTO 3
Agent Company Agent Product Company Product

ford gm ford

ford ford gm gm

## car truck car truck

A relation is already in 5NF if its information content cannot be reconstructed from several smaller record types (having different keys)

Only have 5NF problems if there are symmetry constraints (a pair of rows requires the existence of one or more additional rows)

Normalization

Part 6, Page 20

## Domain/Key Normal Form

No insertion/deletion anomalies Impossible to make an insertion/deletion that violates a constraint Constraint types:

## domain constraints key constraints

Example 9 - customers
cust# 1234 1325 1421 1511 branch west south east south

Normalization

Part 6, Page 21

## Enforcing Domain Integrity in DK/NF

Example 9 - customers:
cust# 1234 1325 1421 1511 1600 branch west south east south north

## If this update is possible, not in DK/NF

One possibility for prohibiting this update is to maintain a table of legal branches and write code to prohibit the entry of a branch not in the table
legal branch west south east

Problem: What's to stop someone from placing north in the legal branch table? Possible partial solution: Restrict access to the legal branch table

## Copyright 1971-2002 Thomas P. Sturm

Normalization

Part 6, Page 22

Manufacturing plant electrical wiring specifications Blueprints contain:
parts at locations wired connections attributes for each wire and location

only transmits data for smart parts one record per part all data must be related to one or more parts

Objectives:

number of wires from any source to any destination sub-classified by voltage, shielding, and intrinsic safety characteristics obtain conduit count from wire count (by hand)

Normalization

Part 6, Page 23

Part: AS 303

Wire 52

Wire 53

Panel (3)

Panel (4)

Problem:

Normalization

Part 6, Page 24

## Wiring Database Normalization

0th Normal Form
Part# AS303 AS404 AS405 Loc (3) (4) (4) Loc_Desc Panel 3 Panel 4 Panel 4 Wire1 52 52 53 Volt IS Wire2 Volt IS Wire3 ... 240 53 24 IS 54 240 55 120 24 IS

1st Normal Form (no repeating groups) but the 2-part key creates partial dependencies
Part# AS303 AS303 AS303 AS404 AS404 AS405 Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Loc (3) (3) (3) (4) (4) (4) Loc (3) (3) (3) (4) (4) (4) Volt 240 24 240 120 Loc_Desc Panel 3 Panel 3 Panel 3 Panel 4 Panel 4 Panel 4 Loc_Desc Panel 3 Panel 3 Panel 3 Panel 4 Panel 4 Panel 4 IS IS Wire# 52 53 54 52 55 53 Wire# 52 53 54 52 55 53 Volt 240 24 240 240 120 24 IS IS

IS

Normalization

Part 6, Page 25

## Wiring Database Normalization (Continued)

2nd Normal Form (apply rule again) with the step-by-step approach, you only eliminate one partial dependency at a time
Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Part# AS303 AS404 AS405 Wire# 52 53 54 52 55 53 Volt 240 24 240 120 Loc (3) (4) (4) IS IS

Normalization

Part 6, Page 26

## Wiring Database Normalization (Concluded)

3rd Normal Form (no transitive dependencies)
Part# AS303 AS303 AS303 AS404 AS404 AS405 Wire# 52 53 54 55 Part# AS303 AS404 AS405 Loc (3) (4) Wire# 52 53 54 52 55 53 Volt 240 24 240 120 Loc (3) (4) (4) Loc_Desc Panel 3 Panel 4 IS IS

Normalization

Part 6, Page 27

## A Case Study (March)

Population:

MBA students MIS majors last quarter in the program Nomad 2 4GL Customer, with attributes Dealer, with attributes Manufacturer, with attributes Contracts - customer, dealer, manufacturer, with symmetry constraints Given case description fully analyzed Use existing Nomad database Perform 8 queries Perform 11 updates

Language:

Case:

Normalization

Part 6, Page 28

## Relations Given in Case Study (March)

Design:

3 groups of 14 students same case, same queries, same updates different schemas (1NF, 3NF, and 5NF)

1NF Schema:
(d#, m#, c#, mfgr_attr, cust_attr, dealer_attr)

3NF Schema:
(c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#, m#)

5NF Schema:
(c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#) (d#, m#) (c#, m#)

Normalization

Part 6, Page 29

## Preliminary Case Results (March)

Normal Form: First Third Fifth Queries (8) 7.21 (90%) 4.50 (56%) 4.42 (55%) Updates (11) 5.07 (46%) 3.64 (33%) 3.21 (29%)

Normalization

Part 6, Page 30

## Database Design Issues Revisited

Ease of query formulation Ease of enforcing referential integrity constraints Ease of avoiding update anomalies Normalization focuses only on avoiding update anomalies Being normal is not enough Possible solutions:
1. Don't normalize 2. Don't normalize beyond BCNF 3. Normalize to 5NF, but back off Problems with 1-3: update anomalies, bad data, knowledge of database storage needed 4. Don't let users at base tables 5. Create views that are in low normal forms 6. Pre-define joins that give users the data they need Solutions 4-6 are more work, but generally worth the effort

Normalization

Part 6, Page 31

Normalization

Part 6, Page 32