Sie sind auf Seite 1von 8

Normalization

Normalization is the process of efficiently organizing data in a database with two goals in mind First goal: eliminate redundant data
for example, storing the same data in more than one table

Normalization

Second Goal: ensure data dependencies make sense


for example, only storing related data in a table
1

Benefits of Normalization
Less storage space Quicker updates Less data inconsistency Clearer data relationships Easier to add data Flexible Structure
Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance

Functional Dependencies
A form of constraint
hence, part of the schema

Finding them is part of the database design Also used in normalizing the relations Warning: this is the most abstract, and hardest part of the database design.
4

Functional dependency between A and B

Functional Dependencies
Definition: If two tuples agree on the attributes A1, A2, , An then they must also agree on the attributes B1, B2, , Bm Formally:

Examples
EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer

EmpID Name, Phone, Position Position Phone but Phone Position


5 6

A1, A2, , An B1, B2, , Bm

In General
To check A B, erase all other columns

Example
EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer

A B X1 Y1 X2 Y2
check if the remaining relation is many-one (called functional in mathematics)
7

Position Phone

Typical Examples of FDs


Product: name price, manufacturer name, age

Example
Product(name, category, color, department, price)

Person:

ssn

Consider these FDs: Company: name stockprice, president

name color category department color, category price

What do they say ?


9 10

Example
FDs are constraints on relations: On some instances they hold On others they dont name color category department color, category price

Example

name color category department color, category price

name Gizmo Tweaker Gizmo

category Gadget Gadget Stationary

color Green Black Green

department Toys Toys Office-supp.

price 49 99 59

name Gizmo Tweaker

category Gadget Gadget

color Green Green

department Toys Toys

price 49 99

Does this instance satisfy all the FDs ?

11

What about this one ?

12

Example
If some FDs are satisfied, then others are satisfied too

Inference Rules for FDs


A1, A2, , An B1, B2, , Bm

If all these FDs are true:

name color category department color, category price

Is equivalent to

Splitting rule and Combining rule


A1 ... Am B1 ... Bm

Then this FD also holds:

name, category price

A1, A2, , An B1 A1, A2, , An B2 ..... A1, A2, , An Bm


13 14

Why ??

Inference Rules for FDs (continued)


A1, A2, , An Ai where i = 1, 2, ..., n If Trivial Rule

Inference Rules for FDs (continued)


Transitive Closure Rule
A1, A2, , An B1, B2, , Bm

and
A1 Am

B1, B2, , Bm C1, C2, , Cp

Why ? then
15

A1, A2, , An C1, C2, , Cp Why ?


16

Functional Dependencies
K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ) We expect this functional dependency to hold: loan_number amount but would not expect the following to hold: amount customer_name

A1

Am

B1

Bm

C1

...

Cp

17

18

Functional Dependencies
A functional dependency is trivial if
Example:
customer_name, loan_number customer_name customer_name customer_name

Functional Dependencies
Consider the relation:
PLOTS (prop#, state, plot#, area, price, Tax_rate)
Information about plots available in India. The constraints on the relation are: Prop# is unique throughout India Plot# are unique within a given state For a given_state, tax_rate is fixed Plots having the same area have the same price, irrespective of the state in which they are located

Write all the FDs on the relation PLOTS


19 20

Functional Dependencies
PLOTS Prop# FD1 FD2 FD3 FD4 State Plot# Area Price Tax_rate PK CK PLOTS

Functional Dependencies
Prop# FD1 FD2 State Plot# Area PK CK

State FD3

Tax_rate FD4

Area

Price

Identify redundancy in PLOTS Identify update anomalies in PLOTS


21

22

Conversion to 1NF
A relational schema R is in first normal form if the domains of all attributes of R are atomic Repeating groups must be eliminated Proper primary key developed Uniquely identifies attribute values (rows) Combination of PROJ_NUM and EMP_NUM Dependencies can be identified Desirable dependencies based on primary key Less desirable dependencies
Partial based on part of composite primary key Transitive one nonprime attribute depends on another nonprime 23 attribute

First Normal Form (1NF)


A database schema is in First Normal Form if all tables are flat
Student Student
Name GPA Courses
Math

Name Alice Bob Carol

GPA 3.8 3.7 3.9

Alice

3.8

Takes
Student Alice Course Math Math DB DB OS OS

DB OS

Course
Course Math DB OS 24

Bob

3.7

DB OS

Carol

Carol

3.9

Math OS

May need to add keys

Alice Bob Alice Carol

Dependency Diagram (1NF)

Each attribute must be atomic (single value)


No repeating columns within a row (composite attributes) No multivalued columns. All key attributes defined All attributes dependent on primary key

1NF Summarized

Figure 4.4

1NF simplifies attributes Queries become easier.

25

26

Conversion to 2NF
Start with 1NF format: Write each key component on separate line Write original key on last line Each component is new table Write dependent attributes after each key
PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)

Each attribute must be functionally dependent on the primary key.


If the primary key is a single attribute, then the relation is in 2NF The test for 2NF involves testing for FDs whose lefthandside attribute are part of the primary key Disallow partial dependency, where nonkeys attributes depend on part of a composite primary key In short, remove partial dependencies

Second Normal Form (2NF)

2NF improves data integrity. Prevents update, insert, and delete anomalies.
27 28

2NF Conversion Results


Figure 4.5

2 NF
Based on the concept of Full FDs (FFD) If A & B are sets of attributes of R, B is said to be FFD on A if AB, but no proper subset of A determines B No partial dependencies on the PK Is PLOTS in 2NF? YES Single attribute PK All relations with single attribute PK are in 2 NF!! 2 NF applies to relations with composite keys

29

30

2 NF
A relation that is in 1NF & every non-PK attribute is fully functionally dependent on the PK, is said to be in 2 NF Remove all 1 NF Partial Dependencies 2 NF

2NF Summarized
In 1NF Includes no partial dependencies
No attribute dependent on a portion of primary key

Still possible to exhibit transitive dependency


Attributes may be functionally dependent on nonkey attributes

31

32

Conversion to 3NF
Create separate table(s) to eliminate transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)

3 NF
Based on the concept of transitive dependency No non-PK attribute should be transitively dependent on the PK Transitive Dependency If AB & BC, then A transitively determines C through B, provided B & C do not determine A Is PLOTS in 3NF? NO

33

34

3 NF
PLOTS

3 NF
Price Tax_rate PK CK

Prop# FD1 FD2 FD3

State

Plot#

Area

A relation that is in 1NF & 2 NF & no non-PK attribute is transitively dependent on the PK, is said to be in 3 NF Remove all 2 NF 3 NF Transitive Dependencies

FD4

Prop# transitively determines tax_rate through state Prop# transitively determines price through area
35 36

2NF Example - 1
Inventory (Item, Supplier, Cost, Supplier Address) We first check if Cost is fully functionally dependent upon the ENTIRE Primary-Key If I know just Item, can I find out Cost? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Cost? No. We need to know what the Item is as well. So, Cost is fully functionally dependent upon the ENTIRE Primary-Key

2NF Example - 2
Inventory (Item, Supplier, Cost, Supplier Address) We then check if Supplier Address is fully functionally dependent upon the ENTIRE Primary-Key If I know just Item, can I find out Supplier Address? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Supplier Address? Yes. The suppliers address does not depend on the Item.

So, Supplier Address is NOT fully functionally


dependent upon the ENTIRE Primary-Key NOT 2NF

37

38

So putting things together


Inventory Description Supplier Cost Inventory Description Supplier Cost Supplier Name Supplier Address Supplier Address

Transitive Dependence
Give a relation R, EmpNo EName Salary Address Assume the following FD hold:

Ename Address
Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary EmpNo Ename, Ename Addresst , EmpNo Address key(EmpNo), a transitive dependency exists
R1 R2

The above relation is now in 2NF since the relation has no non-key attributes.

EmpNo

EName

Salary

Ename

Address

Database Normalization
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other values with a row.)

A Table That Is In 3NF But Not In BCNF

If a table contains only one candidate key, the 3NF and the BCNF are equivalent. BCNF is a special case of 3NF.

Figure 5.7

41

The Decomposition of a Table Structure to Meet BCNF Requirements

Sample Data for a BCNF Conversion

Figure 5.8

44

Decomposition into BCNF

BCNF
Based on FDs that take into account all candidate keys of a relation For a relation with only 1 CK, 3NF & BCNF are equivalent A relation is said to be in BCNF if every determinant is a CK Is PLOTS in BCNF? NO
45 46

BCNF vs. 3NF


BCNF goes further than 3NF, some say too far. A 3NF table that has no overlapping composite keys is in BCNF. A teacher teaches only one subject.
For a given subject a given student has only one teacher. student subject 3NF, not BCNF keys: (student, subject) (student, teacher) teacher is a determinant teacher subject teacher student teacher

BCNF but tables are not independent

Das könnte Ihnen auch gefallen