Sie sind auf Seite 1von 6

Database Normalization

In relational database design, the process of organizing data to minimize redundancy.


Normalization is a systematic way of ensuring that a database structure is suitable for general-
purpose querying and free of certain undesirable characteristics—insertion, update, and
deletion anomalies—that could lead to a loss of data integrity. Normalization usually involves
dividing a database into two or more tables and defining relationships between the tables. The
objective is to isolate data so that additions, deletions, and modifications of a field can be made
in just one table and then propagated through the rest of the database via the defined
relationships.

Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and
what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second
Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce
defined the Boyce-Codd Normal Form (BCNF) in 1974. Higher normal forms were defined by
other theorists in subsequent years, the most recent being the Sixth normal form (6NF)
introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.

Functional dependency

In a given table, an attribute Y is said to have a functional dependency on a set of


attributes X (written X → Y) if and only if each X value is associated with precisely one Y
value. For example, in an "Employee" table that includes the attributes "Employee ID"
and "Employee Date of Birth", the functional dependency {Employee ID} → {Employee
Date of Birth} would hold.

Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset


of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is
{Employee Address} → {Employee Address}.

Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is

• functionally dependent on X, and


• not functionally dependent on any proper subset of X. {Employee
Address} has a functional dependency on {Employee ID, Skill}, but not a full
functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by
virtue of X→Y and Y→Z.

Multivalued dependency

A multivalued dependency is a constraint according to which the presence of certain


rows in a table implies the presence of certain other rows.
Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple


tables each having a subset of the attributes of T.

Superkey

A superkey is a combination of attributes that can be uniquely used to identify a


database record. A table might have many superkeys.

Candidate key

A candidate key is a special subset of superkeys that do not have any extraneous
information in them.

Examples: Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>.
This table has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name>
and <SSN, Name>. Of those listed, only <SSN> is a candidate key, as the others contain
information not necessary to uniquely identify records

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee
Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a
number of possible unique keys. A primary key is a key which the database designer
has designated for this purpose.

1NF A relation R is in first normal form (1NF) if and only if all underlying
domains contain atomic values only

Example: 1NF but not 2NF

FIRST (supplier_no, status, city, part_no, quantity)

Functional Dependencies:

(supplier_no, part_no) ® quantity

(supplier_no) ® status

(supplier_no) ® city

city ® status (Supplier's status is determined by location)

Comments:

Non-key attributes are not mutually independent (city ® status).


Non-key attributes are not fully functionally dependent on the primary key (i.e., status and city
are dependent on just part of the key, namely supplier_no).

Anomalies:

INSERT: We cannot enter the fact that a given supplier is located in a given city until that
supplier supplies at least one part (otherwise, we would have to enter a null value for a column
participating in the primary key C a violation of the definition of a relation).

DELETE: If we delete the last (only) row for a given supplier, we lose the information that the
supplier is located in a particular city.

UPDATE: The city value appears many times for the same supplier. This can lead to
inconsistency or the need to change many values of city if a supplier moves.

Decomposition (into 2NF):

SECOND (supplier_no, status, city)

SUPPLIER_PART (supplier_no, part_no, quantity)

2NF A relation R is in second normal form (2NF) if and only if it is in 1NF and
every non-key attribute is fully dependent on the primary key

Example (2NF but not 3NF):

SECOND (supplier_no, status, city)

Functional Dependencies:

supplier_no ® status

supplier_no ® city

city ® status

Comments:

Lacks mutual independence among non-key attributes.

Mutual dependence is reflected in the transitive dependencies: supplier_no ® city, city ® status.

Anomalies:

INSERT: We cannot record that a particular city has a particular status until we have a supplier
in that city.

DELETE: If we delete a supplier which happens to be the last row for a given city value, we lose
the fact that the city has the given status.

UPDATE: The status for a given city occurs many times, therefore leading to multiple updates
and possible loss of consistency.
Decomposition (into 3NF):

SUPPLIER_CITY (supplier_no, city)

CITY_STATUS (city, status)

3NF A relation R is in third normal form (3NF) if and only if it is in 2NF and
every non-key attribute is non-transitively dependent on the primary key. An
attribute C is transitively dependent on attribute A if there exists an
attribute B such that: A ® B and B ® C. Note that 3NF is concerned with
transitive dependencies which do not involve candidate keys. A 3NF relation
with more than one candidate key will clearly have transitive dependencies of
the form: primary_key ® other_candidate_key ® any_non-key_column

An alternative (and equivalent) definition for relations with just one candidate key is:

A relation R having just one candidate key is in third normal form (3NF) if and
only if the non-key attributes of R (if any) are: 1) mutually independent, and
2) fully dependent on the primary key of R. A non-key attribute is any column
which is not part of the primary key. Two or more attributes are mutually
independent if none of the attributes is functionally dependent on any of the
others. Attribute Y is fully functionally dependent on attribute X if X ® Y,
but Y is not functionally dependent on any proper subset of the (possibly
composite) attribute X

For relations with just one candidate key, this is equivalent to the simpler:

A relation R having just one candidate key is in third normal form (3NF) if and
only if no non-key column (or group of columns) determines another non-key
column (or group of columns)

Example (3NF but not BCNF):

SUPPLIER_PART (supplier_no, supplier_name, part_no, quantity)

Functional Dependencies:

We assume that supplier_name's are always unique to each supplier. Thus we have two
candidate keys:

(supplier_no, part_no) and (supplier_name, part_no)

Thus we have the following dependencies:

(supplier_no, part_no) ® quantity

(supplier_no, part_no) ® supplier_name


(supplier_name, part_no) ® quantity

(supplier_name, part_no) ® supplier_no

supplier_name ® supplier_no

supplier_no ® supplier_name

Comments:

Although supplier_name ® supplier_no (and vice versa), supplier_no is not a non-key column —
it is part of the primary key! Hence this relation technically satisfies the definition(s) of 3NF (and
likewise 2NF, again because supplier_no is not a non-key column).

Anomalies:

INSERT: We cannot record the name of a supplier until that supplier supplies at least one part.

DELETE: If a supplier temporarily stops supplying and we delete the last row for that supplier,
we lose the supplier's name.

UPDATE: If a supplier changes name, that change will have to be made to multiple rows
(wasting resources and risking loss of consistency).

Decomposition (into BCNF):

SUPPLIER_ID (supplier_no, supplier_name)

SUPPLIER_PARTS (supplier_no, part_no, quantity)

BCNF A relation R is in Boyce-Codd normal form (BCNF) if and only if every


determinant is a candidate key

The definition of BCNF addresses certain (rather unlikely) situations which 3NF does not
handle. The characteristics of a relation which distinguish 3NF from BCNF are given below.
Since it is so unlikely that a relation would have these characteristics, in practical real-life design
it is usually the case that relations in 3NF are also in BCNF. Thus many authors make a "fuzzy"
distinction between 3NF and BCNF when it comes to giving advice on "how far" to normalize a
design. Since relations in 3NF but not in BCNF are slightly unusual, it is a bit more difficult to
come up with meaningful examples. To be precise, the definition of 3NF does not deal with a
relation that:

1. has multiple candidate keys, where


2. those candidate keys are composite, and
3. the candidate keys overlap (i.e., have at least one common attribute)

Example:

An example of a relation in 3NF but not in BCNF (and exhibiting the three properties listed) was
given above in the discussion of 3NF. The following relation is in BCNF (and also in 3NF):

SUPPLIERS (supplier_no, supplier_name, city, zip)


We assume that each supplier has a unique supplier_name, so that supplier_no and
supplier_name are both candidate keys.

Functional Dependencies:

supplier_no ® city

supplier_no ® zip

supplier_no ® supplier_name

supplier_name ® city

supplier_name ® zip

supplier_name ® supplier_no

Comments:

The relation is in BCNF since both determinants (supplier_no and supplier_name) are unique
(i.e., are candidate keys).

The relation is also in 3NF since even though the non-primary-key column supplier_name
determines the non-key columns city and zip, supplier_name is a candidate key. Transitive
dependencies involving a second (or third, fourth, etc.) candidate key in addition to the primary
key do not violate 3NF.

Note that even relations in BCNF can have anomalies.

Anomalies:

INSERT: We cannot record the city for a supplier_no without also knowing the supplier_name

DELETE: If we delete the row for a given supplier_name, we lose the information that the
supplier_no is associated with a given city.

UPDATE: Since supplier_name is a candidate key (unique), there are none.

Decomposition:

SUPPLIER_INFO (supplier_no, city, zip)

SUPPLIER_NAME (supplier_no, supplier_name)

Larry Newcomer (Updated January 06,2000)

Das könnte Ihnen auch gefallen