Sie sind auf Seite 1von 9

In the field of Relational Database design, normalization is a systematic way of ensuring

that a database structure is suitable for general-purpose querying and free of certain
undesirable characteristics—insertion, update, and deletion anomalies—that could lead to
a loss of data integrity.

According to E. F. Codd the objectives of normalization were stated as follows:


1. To free the collection of relations from undesirable insertion, update and deletion
dependencies.
2. To reduce the need for restructuring the collection of relations as new types of data are
introduced, and thus increase the life span of application programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where these statistics
are liable to change as time goes by.

E. F. Code the inventor of Relational Model, introduced the concept of normalization


(1NF at 1970, 2-3NF at 1971, then with R. F. Boyce defined the BCFN in 1974).

C. Date, H. Darwin, R. Fagin, N. Lorentzos defined other higher forms upto 6NF by
2002.

As of now there are total 8 normal forms, as follows:


1. First normal form (1NF)
2. Second normal form (2NF)
3. Third normal form (3NF)
4. Boyce-Codd normal form (BCNF)
5. Fourth normal form (4NF)
6. Fifth normal form (5NF)
7. Domain/key normal form (DKNF)
8. Sixth normal form (6NF)

But to keep our data consistent & non-redundant the first 3 Normal Forms are sufficient.

1. The 1st Normal Form


- There are no duplicate rows and each row should have a unique identifier (or Primary
key).
- Values in the domains on which each relation is defined are required to be atomic.
Meaning a field value cannot be decomposed into smaller pieces or should not be divided
into parts with more than one kind of data in it.
OR
- The values in each column of a table are atomic. By atomic we mean that there are no
sets of values within a column.
Like: A Person’s Name column could be further divided into First, Middle, Last Name
columns.
- A table should be free from repeating groups:
2. The 2nd Normal Form
- A table should be in 1st Normal Form.
- Any Candidate key (K) and any Attribute (A) that is not a constituent of a candidate
key, A depends upon whole of K rather than just part of it.
OR
- a 1NF table is in 2NF if and only if all its non-prime attributes are functionally
dependent on the whole of a candidate key.
OR
- A non-prime attribute is one that does not belong to any candidate key.
- Any non-key columns must be dependent on the entire primary key. In the case of a
composite primary key, this means that a non-key column cannot depend on only part of
the composite key.

3. The 3rd Normal Form


- A table should be in 2nd Normal Form.
- Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on
every candidate/primary key of R.
- All columns should depend directly on the primary key. Tables violate the Third
Normal Form when one column depends on another column, which in turn depends on
the primary key (a transitive dependency).

1st Normal Form (1NF)

A table (relation) is in 1NF if:


1. There are no duplicated rows in the table.
2. Each cell is single-valued (no repeating groups or arrays).
3. Entries in a column (field) are of the same kind.

*The requirement that there be no duplicated rows in the table means that the table has a key (although the
key might be made up of more than one column, even possibly, of all the colomns).

2nd Normal Form (2NF)

A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key. Since a partial
dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition
of 2NF is sometimes phrased as, “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”

3rd Normal Form (3NF)

A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

Boyce-Codd Normal Form (BCNF)

A table is in BCNF if it is in 3NF and if every determinant is a candidate key.


4th Normal Form (4NF)

A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

5th Normal Form (5NF)

A table is in 5NF, also called “Projection-join Normal Form” (PJNF), if it is in 4NF and if every join
dependency in the table is a consequence of the candidate keys of the table.

Domain-Key Normal Form (DKNF)

A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and
domains.

Insertion Anomaly
It is a failure to place information about a new database entryinto all the places in the database where
information about the new entry needs to be stored. In a properly normalized database, information about a
new entry needs to be inserted into only one place in the database, in an inadequatly normalized database,
information about a new entry may need to be inserted into more than one place, and human fallibility being
what it is, some of the needed additional insertionsmay be missed.

Deletion Anomaly
It is a failure to remove information about an existing database entry when it is time to remove that entry. In
a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deletedfrom
only one place in the database, in an inadequatly normalized database, information about that old entry may
need to be deleted from more than one place.

Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus “update
anomalies” can be either of the kinds discussed above.

All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption of the
database. Properly normalized database are much less susceptible to corruption than are un-normalized
databases.

Algebra is a formal structure, which consists of sets, and operations performed on these sets.
Relational algebra is a formal structure for manipulating relations. There are various relational
operators for manipulating relations. These operators take two tables, perform the operation, and
return back the desired output.

There are basically eight operators in relational algebra to perform operations on tables. Each of
them is discussed in detail.
1. Union
2. Intersection
3. Difference
4. Cartesian Product
5. Project
6. Restrict
7. Divide
8. Join

The first four operators, i.e. Union, Intersection, Difference, and Cartesian Product are taken from
set theory with some changes to make them work with tables. The last four, i.e. Project, Restrict,
Divide, and Join are designed specifically for tables. The most important operators for logical
design are project and join.

Union

The union operator takes two tables and returns a table including all the rows appearing in either
or both of the tables.

Union is always denoted by the ‘U’ symbol. Suppose there is a table named P and a table named
Q, then the union of the two will be PUQ.

An example, which describes the union operation, is given below.

P Q
Name EmpId
Roger A11
John A12
Jones A13
Result of P Union Q

Name EmpId
Roger A11
John A12
Jones A13
Intersection

The intersection operator takes two tables and returns a table, which consists of all the common
rows appearing in both the tables. If no row is common between the two tables, then the result is
NULL.

Below is an example, which describes the intersection operation.

P Q
Name EmpId Name EmpId
Roger A11 Roger A11
Binnie A12
Annie A13
Result of P Intersection Q

Name EmpId
Roger A11
Here, only one row is common between P and Q.

Difference

The difference operator takes two tables and returns a table, which consists of all rows appearing
in one of the tables but not in the other. The symbol for difference is ‘-’. If there are two tables
named A and B, then the difference between them is “A-B”. It consists of all the rows in A but not
in B.

Below is an example, which describes the difference operation.

A
B
Name
Name
Roger
Anil
Annie
Emma
Binnie
Rocky
Richie
Result of A Difference B

Name
Roger
Annie
Binnie
Richie
Cartesian Product

The Cartesian product operator takes two tables and returns a table that consists of rows that are
possible combination of two rows, i.e. one from each table. The symbol of the Cartesian operator
is ‘X’.

Below is an example, which describes the Cartesian product.

A B
Name Working days
Roger Monday
Binnie Tuesday
Wednesday
Resultant Table

Name Working days


Roger Monday
Binnie Monday
Roger Tuesday
Binnie Tuesday
Roger Wednesday
Binnie Wednesday
Here, the resultant table obtained is the possible combination of rows from both the tables.

Project

The project operator takes a table and returns a table that is a subset of the original table. The
result obtained is known as projection. A projection is also known as decomposition of the table.

To understand projection, let’s take an example. A table named Employee is given below. The
projection of the table on two attributes, Empname and salary, is also shown below. The
projection is the subset of the original table.

Employee

Name Salary DeptId


Smith 800 30
Harry 1250 30
Warden 1260 30
Blake 1245 30
Result of Projection on the Employee table

Name Salary
Smith 800
Harry 1250
Warden 1260
Blake 1245
Restrict

The restrict operator takes a table and returns a table that consists of a subset of rows. The
selected rows must satisfy a specific criteria or restriction condition. It is implemented using the
WHERE clause of the SELECT statement.

The result so obtained is known as selection.

An example, which describes the restrict operation, is given below.


The table named student is given below:

Name Age
Lucky 12
Linda 13
Melinda 13
Binnie 13
James 12
Now, when the following query is issued on this table, the result will be as follows:

SELECT Name
FROM Student
WHERE age>12

Linda
Melinda
Binnie
Divide

The divide operator takes two tables, one table having two columns and the other having one
column. It searches in the two-column table for those sets of rows, for which the values in one of
the two columns match all the values in the one-column table and for which all the corresponding
row values in the second column are the same. The result obtained is a single-column table that
consists of all the matching values found in the second column.

An example, which describes the division operation, is given below.

A table named student is given with two columns: Name and Grade; and a table named Grade is
given with one column Grade.

Name
Name Grade
Roger A
Roger B Grade
Melinda A Grade
Melinda C A
Peter A B
Peter B
Doreen B
Doreen C
Resultant table

Roger
Peter
The result contains the value from the Name table. The two names obtained have both the values
that are in the Grade table. Only Roger and Peter contain matching values, i.e. A and B from the
Grade table.

Join

A join or an equijoin combines the rows of two tables that have equivalent values for the specified
columns. The two tables are joined using an equality operator “=”. It is also known as simple join
and inner join. A join is basically a combination of Cartesian product and restrict operation. The
join operation takes all the possible combination of rows from the two tables as it is done in
Cartesian product, and then it selects those records for which the values in a common attribute
meet some specified criteria. For example, consider the two tables given below named Employee
and Salary. Here, the common attribute between the two tables is EmpId. When we perform a
join on these two tables, then all the records, which have the same values on EmpId in both the
tables will be retrieved. Hence, it is a combination of both Cartesian product and join operations.

Employee
Name EmpId Salary
Richard A11 EmpId Salary
Doreen A12 A11 2300
Peter A13 A13 3000
Nancy A14
When we perform a join operation on these two tables, we get a result as given below:

SELECT *
FROM Employee, Salary
WHERE Employee.EmpId= Salary.EmpId

Result

EmpId Name Salary


A11 Richard 2300
A13 Peter 3000
The join and project operators are very important to normal forms. These operators specify
whether a particular decomposition of table improves the logical design of the database or not.

Note: Why is a relational database said to be “closed” when relational operators are applied to
any of its tables?

When a relational operator is applied to a table in a relational database, the result is another
table. That means applying relational operators to the results of relational operations will yield
another table. Therefore, a relational database is said to be closed when relational operators are
applied to any of its tables.

Das könnte Ihnen auch gefallen