Sie sind auf Seite 1von 4

Background to normalization: definitions

Functional dependency: Attribute B has a functional dependency on attribute A (i.e., A B) if, for each value of
attribute A, there is exactly one value of attribute B.
If value of A is repeating in tuples then value of B will also repeat.
In our example, Employee Address has a functional dependency on Employee ID, because a particular Employee
ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be true: several
employees could live at the same address and therefore one Employee Address value could correspond to more
than one Employee ID. Employee ID is therefore not functionally dependent on Employee Address.)
An attribute may be functionally dependent either on a single attribute or on a combination of attributes.
It is not possible to determine the extent to which a design is normalized without understanding what functional
dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the
problem domain. For example, an Employer may require certain employees to split their time between two
locations, such as New York City and London, and therefore want to allow Employees to have more than one
Employee Address. In this case, Employee Address would no longer be functionally dependent on Employee ID.
Another way to look at the above is by reviewing basic mathematical functions:
Let F(x) be a mathematical function of one independent variable. The independent variable is analogous to the
attribute A. The dependent variable (or the dependent attribute using the lingo above), and hence the term
functional dependency, is the value of F(A); A is an independent attribute. As we know, mathematical functions
can have only one output. Notationally speaking, it is common to express this relationship in mathematics as F(A)
= B; or, B F(A).
There are also functions of more than one independent variablecommonly, this is referred to as multivariable
functions. This idea represents an attribute being functionally dependent on a combination of attributes. Hence,
F(x,y,z) contains three independent variables, or independent attributes, and one dependent attribute, namely,
F(x,y,z). In multivariable functions, there can only be one output, or one dependent variable, or attribute.
Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself.
{Employee ID, Employee Address} {Employee Address} is trivial, as is {Employee Address}
{Employee Address}.
Full functional dependency
An attribute is fully functionally dependent on a set of attributes X if it is

functionally dependent on X, and

not functionally dependent on any proper subset of X. {Employee Address} has a functional
dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also
dependent on {Employee ID}.

Transitive dependency
A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY
and YZ.
Multivalued dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a table implies
the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each
having a subset of the attributes of T.
Superkey
A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words,
two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address,
Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a
superkey.

Candidate key
A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it
is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.
Non-prime attribute
A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be
a non-prime attribute in the "Employees' Skills" table.
Primary key
Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible
unique keys. A primary key is a key which the database designer has designated for this purpose.

The main normal forms are summarized below.


Normal form

Defined by

Brief definition

First normal form

Two versions: E.F. Codd (1970), C.J. Date

Table faithfully represents a relation and has no

(1NF)

(2003)

"repeating groups"

Second normal form


(2NF)

No non-prime attribute in the table is functionally


E.F. Codd (1971)

dependent on a part (proper subset) of a candidate


key

Third normal form


(3NF)

E.F. Codd (1971); see also Carlo Zaniolo's


equivalent but differently-expressed definition
(1982)

Boyce-Codd normal
form (BCNF)

Fourth normal form


(4NF)

Fifth normal form


(5NF)

Domain/key normal
form (DKNF)

Raymond F. Boyce and E.F. Codd (1974)

Ronald Fagin (1977)

Ronald Fagin (1979)

Ronald Fagin (1981)

Every non-prime attribute is non-transitively


dependent on every key of the table

Every non-trivial functional dependency in the table


is a dependency on a superkey

Every non-trivial multivalued dependency in the table


is a dependency on a superkey

Every non-trivial join dependency in the table is


implied by the superkeys of the table

Every constraint on the table is a logical consequence


of the table's domain constraints and key constraints

Sixth normal form

Chris Date, Hugh Darwen, and Nikos Lorentzos Table features no non-trivial join dependencies at all

(6NF)

(2002)

(with reference to generalized join operator)

What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization
process:
eliminating redundant data (for example, storing the same data in more than one table)
ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.

Advantages of normalization:
Removal of redundant data storage.
Close modeling of real world entities, processes, and their relationships.
Structuring of data to make the logical modeling flexible.
The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are
referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first
normal form or 1NF) through five (fifth normal form or 5NF). We often see 1NF, 2NF, and 3NF along with the
occasional 4NF. Fifth normal form is very rarely seen..
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table.

Create separate tables for each group of related data and identify each row with a unique column or set of
columns (the primary key).

Second Normal Form (2NF)


Second normal form (2NF) further addresses the concept of removing duplicative data:

Meet all the requirements of the first normal form.

Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)


Third normal form (3NF) goes one large step further:

Meet all the requirements of the second normal form.

Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF)


Finally, fourth normal form (4NF) has one additional requirement:

Meet all the requirements of the third normal form.

A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the
criteria of a 1NF database.

If you'd like to ensure your database is normalized, explore our other articles in this series:

Das könnte Ihnen auch gefallen