Beruflich Dokumente
Kultur Dokumente
Normalization is the process of efficiently organizing data in a database with two goals in mind First goal: eliminate redundant data
for example, storing the same data in more than one table
Normalization
Benefits of Normalization
Less storage space Quicker updates Less data inconsistency Clearer data relationships Easier to add data Flexible Structure
Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance
Functional Dependencies
A form of constraint
hence, part of the schema
Finding them is part of the database design Also used in normalizing the relations Warning: this is the most abstract, and hardest part of the database design.
4
Functional Dependencies
Definition: If two tuples agree on the attributes A1, A2, , An then they must also agree on the attributes B1, B2, , Bm Formally:
Examples
EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer
In General
To check A B, erase all other columns
Example
EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer
A B X1 Y1 X2 Y2
check if the remaining relation is many-one (called functional in mathematics)
7
Position Phone
Example
Product(name, category, color, department, price)
Person:
ssn
Example
FDs are constraints on relations: On some instances they hold On others they dont name color category department color, category price
Example
price 49 99 59
price 49 99
11
12
Example
If some FDs are satisfied, then others are satisfied too
Is equivalent to
Why ??
and
A1 Am
Why ? then
15
Functional Dependencies
K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ) We expect this functional dependency to hold: loan_number amount but would not expect the following to hold: amount customer_name
A1
Am
B1
Bm
C1
...
Cp
17
18
Functional Dependencies
A functional dependency is trivial if
Example:
customer_name, loan_number customer_name customer_name customer_name
Functional Dependencies
Consider the relation:
PLOTS (prop#, state, plot#, area, price, Tax_rate)
Information about plots available in India. The constraints on the relation are: Prop# is unique throughout India Plot# are unique within a given state For a given_state, tax_rate is fixed Plots having the same area have the same price, irrespective of the state in which they are located
Functional Dependencies
PLOTS Prop# FD1 FD2 FD3 FD4 State Plot# Area Price Tax_rate PK CK PLOTS
Functional Dependencies
Prop# FD1 FD2 State Plot# Area PK CK
State FD3
Tax_rate FD4
Area
Price
22
Conversion to 1NF
A relational schema R is in first normal form if the domains of all attributes of R are atomic Repeating groups must be eliminated Proper primary key developed Uniquely identifies attribute values (rows) Combination of PROJ_NUM and EMP_NUM Dependencies can be identified Desirable dependencies based on primary key Less desirable dependencies
Partial based on part of composite primary key Transitive one nonprime attribute depends on another nonprime 23 attribute
Alice
3.8
Takes
Student Alice Course Math Math DB DB OS OS
DB OS
Course
Course Math DB OS 24
Bob
3.7
DB OS
Carol
Carol
3.9
Math OS
1NF Summarized
Figure 4.4
25
26
Conversion to 2NF
Start with 1NF format: Write each key component on separate line Write original key on last line Each component is new table Write dependent attributes after each key
PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
2NF improves data integrity. Prevents update, insert, and delete anomalies.
27 28
2 NF
Based on the concept of Full FDs (FFD) If A & B are sets of attributes of R, B is said to be FFD on A if AB, but no proper subset of A determines B No partial dependencies on the PK Is PLOTS in 2NF? YES Single attribute PK All relations with single attribute PK are in 2 NF!! 2 NF applies to relations with composite keys
29
30
2 NF
A relation that is in 1NF & every non-PK attribute is fully functionally dependent on the PK, is said to be in 2 NF Remove all 1 NF Partial Dependencies 2 NF
2NF Summarized
In 1NF Includes no partial dependencies
No attribute dependent on a portion of primary key
31
32
Conversion to 3NF
Create separate table(s) to eliminate transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)
3 NF
Based on the concept of transitive dependency No non-PK attribute should be transitively dependent on the PK Transitive Dependency If AB & BC, then A transitively determines C through B, provided B & C do not determine A Is PLOTS in 3NF? NO
33
34
3 NF
PLOTS
3 NF
Price Tax_rate PK CK
State
Plot#
Area
A relation that is in 1NF & 2 NF & no non-PK attribute is transitively dependent on the PK, is said to be in 3 NF Remove all 2 NF 3 NF Transitive Dependencies
FD4
Prop# transitively determines tax_rate through state Prop# transitively determines price through area
35 36
2NF Example - 1
Inventory (Item, Supplier, Cost, Supplier Address) We first check if Cost is fully functionally dependent upon the ENTIRE Primary-Key If I know just Item, can I find out Cost? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Cost? No. We need to know what the Item is as well. So, Cost is fully functionally dependent upon the ENTIRE Primary-Key
2NF Example - 2
Inventory (Item, Supplier, Cost, Supplier Address) We then check if Supplier Address is fully functionally dependent upon the ENTIRE Primary-Key If I know just Item, can I find out Supplier Address? No. We can have > 1 supplier for the same product. If I know just Supplier, and I find out Supplier Address? Yes. The suppliers address does not depend on the Item.
37
38
Transitive Dependence
Give a relation R, EmpNo EName Salary Address Assume the following FD hold:
Ename Address
Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary EmpNo Ename, Ename Addresst , EmpNo Address key(EmpNo), a transitive dependency exists
R1 R2
The above relation is now in 2NF since the relation has no non-key attributes.
EmpNo
EName
Salary
Ename
Address
Database Normalization
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other values with a row.)
If a table contains only one candidate key, the 3NF and the BCNF are equivalent. BCNF is a special case of 3NF.
Figure 5.7
41
Figure 5.8
44
BCNF
Based on FDs that take into account all candidate keys of a relation For a relation with only 1 CK, 3NF & BCNF are equivalent A relation is said to be in BCNF if every determinant is a CK Is PLOTS in BCNF? NO
45 46