Sie sind auf Seite 1von 45

DATABASE PERFORMANCE TUNING AND

NORMALIZATION
Colloquium Report Submitted for the Partial Requirement for
the Award of Degree of Master of Computer Applications

Submitted by:

KM BABITA

Roll No: 1503814936

Course: MCA

Under the supervision of

Prof. Chandra Mani Sharma

Department of IT

Institute of Technology & Science Mohan Nagar Ghaziabad

Department of Information Technology


Institute of Technology & Science
Mohan Nagar, Ghaziabad
(2017)

1
Declaration

I, Km Babita, student of MCA-VI Semester (2015-2017 ) at I.T.S


Mohan Nagar Ghaziabad undertake that colloquium report on topic
Database Normalization is my authentic work. It has not been
published earlier verbatim or has been submitted before for the award
of any other degree. I have appropriately mentioned the references of
contents that I have used in my report. In case of any conflict of
interests, I shall be solely responsible.

(Signature of Student)

Km Babita

Roll No. 1503814936

MCA-VI Semester

2
Index
1. Abstract

2. Objectives

3. Introduction

Database
Component
Advantages
Disadvantages

4. Normalization of database

5. Normalization tips

6. Problem without Normalization

7. Normalization rules

First normal form


Second normal form
Third normal form
Fourth normal form
Fifth normal form
BCNF

8. Advantages

9. Disadvantages

10. Literature review

11. Comparative analysis

12. Related tools

13. Resent issues & research challenges

14. Future trends

15. Conclusion

3
16. References

Abstract:-
In mathematical logic and theoretical computer science, a rewrite
has the normalization abstract is terminating if every term is

4
strongly normalizing; that is, if every sequence of rewrites
eventually terminates with an irreducible term, also called a normal
form. A rewrite system may also have the weak normalization
property, meaning that eventually yields a normal form that is an
irreducible term. The use of relational database (RDBMS)
technology and different levels of normalization (1st, 2nd, 3rd, 4th
normal data structures) is proliferating throughout the data
processing industry. RDBMS systems are valued for their ability to
maintain the integrity of data, reduce unnecessary data redundancy,
and provide maximum flexibility in retrieval.

This is about designing a complete interactive tool, named


JMathNorm, for relational database (RDB) normalization using
Mathematical. It is an extension of the prototype developed by the
same authors [1] with the inclusion of Second Normal Form
(2NF), and Boyce-Codd Normal Form (BCNF) in addition to the
existing Third normal Form (3NF) module. The tool developed in
this study is complete and can be used for real-time database
design as well as an aid in teaching fundamental concepts of DB
normalization to students with limited mathematical background.
JMathNorm also supports interactive use of modules for
experimenting the fundamental set operations such as closure, and
full closure together with modules to obtain the minimal cover of
the functional dependency set and testing an attribute for a
candidate key. JMathNorms GUI interface is written in Java and
utilizes Mathematicians JLink facility to drive the Mathematical
kernel.

Objectives:-

5
A basic objective of the first normal form defined by Codd in 1970
was to permit data to be queried and manipulated using a "universal
data sub-language" grounded in first-order logic. (SQL is an example
of such a data sub-language, albeit one that Codd regarded as
seriously flawed.)

The objectives of normalization beyond 1NF (First Normal Form)


were stated as follows by Codd:

1. To free the collection of relations from undesirable insertion,


update and deletion dependencies;
2. To reduce the need for restructuring the collection
of relations, as new types of data are introduced, and thus
increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query
statistics.

6
Introduction:-
Database:-
A database is a collection of information that is organized
so that it can easily be accessed, managed, and updated. In
one view, databases can be classified according to types of
content: bibliographic, full-text, numeric, and images. In
computing, databases are sometimes classified according
to their organizational approach. The most prevalent
approach is the relational database, a tabular database in
which data is defined so that it can be reorganized and
accessed in a number of different ways. A distributed
database is one that can be dispersed or replicated among
different points in a network.
A database management system (DBMS) is system
software for creating and managing databases. The DBMS
provides users and programmers with a systematic way to
create, retrieve, update and manage data. A DBMS makes
it possible for end users to create, read, update and
delete data in a database. The DBMS essentially serves as
an interface between the database and end users
or application programs, ensuring that data is consistently
organized and remains easily accessible.
The DBMS manages three important things: the data, the
database engine that allows data to be accessed, locked and
modified -- and the database schema, which defines the
databases logical structure. These three foundational
elements help provide concurrency, security, data
integrity and uniform administration procedures. Typical
database administration tasks supported by the DBMS
include change management, performance
monitoring/tuning and backup and recovery. Many
7
database management systems are also responsible for
automated rollbacks, restarts and recovery as well as
the logging and auditing of activity.
The DBMS is perhaps most useful for providing a
centralized view of data that can be accessed by multiple
users, from multiple locations, in a controlled manner. A
DBMS can limit what data the end user sees, as well as
how that end user can view the data, providing many views
of a single database schema. End users and software
programs are free from having to understand where the
data is physically located or on what type of storage media
it resides because the DBMS handles all requests. The
DBMS can offer both logical and physical data
independence. That means it can protect users and
applications from needing to know where data is stored or
having to be concerned about changes to the physical
structure of data. With relational DBMSs (RDBMSs), this
API is SQL, a standard programming language for
defining, protecting and accessing data in a RDBMS.

8
Components of the Database System Environment:-
There are five major components in the database system environment
and their interrelationships are.

Hardware
Software
Data
Users
Procedures

1. Hardware: The hardware is the actual computer system used for


keeping and accessing the database. Conventional DBMS hardware
consists of secondary storage devices, usually hard disks, on which
the database physically resides, together with the associated Input-
Output devices, device controllers and so forth. Databases run on a'
range of machines, from Microcomputers to large mainframes. Other
hardware issues for a DBMS includes database machines, which is
hardware designed specifically to support a database system.

2. Software: The software is the actual DBMS. Between the physical


databases itself (i.e. the data as actually stored) and the users of the
system is a layer of software, usually called the Database

9
Management System or DBMS. All requests from users for access to
the database are handled by the DBMS. One general function
provided by the DBMS is thus the shielding of database users from
complex hardware-level detail.

The DBMS allows the users to communicate with the database. In a


sense, it is the mediator between the database and the users. The
DBMS controls the access and helps to maintain the consistency of
the data. Utilities are usually included as part of the DBMS. Some of
the most common utilities are report writers and application
development.

3. Data: It is the most important component of DBMS environment


from the end users point of view. As shown in observes that data acts
as a bridge between the machine components and the user
components. The database contains the operational data and the meta-
data, the 'data about data'.

The database should contain all the data needed by the organization.
One of the major features of databases is that the actual data are
separated from the programs that use the data. A database should
always be designed, built and populated for a particular audience and
for a specific purpose.

4. Users: There are a number of users who can access or retrieve data
on demand using the applications and interfaces provided by the

10
DBMS. Each type of user needs different software capabilities. The
users of a database system can be classified in the following groups,
depending on their degrees of expertise or the mode of their
interactions with the DBMS. The users can be:

Naive Users

Online Users

Application Programmers

Sophisticated Users

Data Base Administrator (DBA)

Naive Users: Naive Users are those users who need not be aware of
the presence of the database system or any other system supporting
their usage. Naive users are end users of the database who work
through a menu driven application program, where the type and range
of response is always indicated to the user.

A user of an Automatic Teller Machine (ATM) falls in this category.


The user is instructed through each step of a transaction. He or she
then responds by pressing a coded key or entering a numeric value.
The operations that can be performed by valve users are very limited
and affect only a precise portion of the database. For example, in the
case of the user of the Automatic Teller Machine, user's action affects
only one or more of his/her own accounts.

Online Users: Online users are those who may communicate with the
database directly via an online terminal or indirectly via a user
interface and application program. These users are aware of the
presence of the database system and may have acquired a certain
amount of expertise with in the limited interaction permitted with a
database.

11
Sophisticated Users: Such users interact with the system without,
writing programs.

Instead, they form their requests in database query language. Each


such query is submitted to a very processor whose function is to
breakdown DML statement into instructions that the storage manager
understands.

Specialized Users: Such users are those, who write specialized


database application that do not fit into the fractional data-processing
framework. For example: Computer-aided design systems, knowledge
base and expert system, systems that store data with complex data
types (for example, graphics data and audio data).

Application Programmers: Professional programmers are those who


are responsible for developing application programs or user interface.
The application programs could be written using general purpose
programming language or the commands available to manipulate a
database.

Database Administrator: The database administrator (DBA) is the


person or group in charge for implementing the database system,
within an organization. The "DBA has all the system privileges
allowed by the DBMS and can assign (grant) and remove (revoke)
levels of access (privileges) to and from other users. DBA is also
responsible for the evaluation, selection and implementation of
DBMS package.

5. Procedures: Procedures refer to the instructions and rules that


govern the design and use of the database. The users of the system
and the staff that manage the database require documented procedures
on how to use or run the system.

These may consist of instructions on how to:

12
Log on to the DBMS.

Use a particular DBMS facility or application program.

Start and stop the DBMS.

Make backup copies of the database.

Handle hardware or software failures

Advantages of DBMS:-

The database management system has promising potential


advantages, which are explained below:

1. Controlling Redundancy: In file system, each application has its


own private files, which cannot be shared between multiple
applications. 1t can often lead to considerable redundancy in the
13
stored data, which results in wastage of storage space. By having
centralized database most of this can be avoided. It is not possible that
all redundancy should be eliminated. Sometimes there are sound
business and technical reasons for maintaining multiple copies of the
same data. In a database system, however this redundancy can be
controlled.

For example: In case of college database, there may be the number of


applications like General Office, Library, Account Office, Hostel etc.
Each of these applications may maintain the following information
into own private file applications:

It is clear from the above file systems, that there is some common
data of the student which has to be mentioned in each application, like
Roll no, Name, Class, Phone No Address etc. This will cause the
problem of redundancy which results in wastage of storage space and
difficult to maintain, but in case of centralized database, data can be
shared by number of applications and the whole college can maintain
its computerized data with the following database:

14
It is clear in the above database that Roll no, Name, Class, Father
Name, Address, Phone No, Date_of_birth which are stored repeatedly
in file system in each application, need not be stored repeatedly in
case of database, because every other application can access this
information by joining of relations on the basis of common column
i.e. Roll no. Suppose any user of Library system need the Name,
Address of any particular student and by joining of Library and
General Office relations on the basis of column Roll no he/she can
easily retrieve this information.

Thus, we can say that centralized system of DBMS reduces the


redundancy of data to great extent but cannot eliminate the
redundancy because Roll no is still repeated in all the relations.

2. Integrity can be enforced: Integrity of data means that data in


database is always accurate, such that incorrect information cannot be
stored in database. In order to maintain the integrity of data, some
integrity constraints are enforced on the database. A DBMS should
provide capabilities for defining and enforcing the constraints.

For Example: Let us consider the case of college database and


suppose that college having only BTech, MTech, MSc, BCA, BBA
15
and BCOM classes. But if a \.,ser enters the class MCA, then this
incorrect information must not be stored in database and must be
prompted that this is an invalid data entry. In order to enforce this, the
integrity constraint must be applied to the class attribute of the student
entity. But, in case of file system tins constraint must be enforced on
all the application separately (because all applications have a class
field).

In case of DBMS, this integrity constraint is applied only once on the


class field of the

General Office (because class field appears only once in the whole
database), and all other applications will get the class information
about the student from the General Office table so the integrity
constraint is applied to the whole database. So, we can conclude that
integrity constraint can be easily enforced in centralized DBMS
system as compared to file system.

3. Inconsistency can be avoided: When the same data is duplicated


and changes are made at one site, which is not propagated to the other
site, it gives rise to inconsistency and the two entries regarding the
same data will not agree. At such times the data is said to be
inconsistent. So, if the redundancy is removed chances of having
inconsistent data is also removed.

4. Data can be shared: As explained earlier, the data about Name,


Class, Father __name etc. of General Office is shared by multiple
applications in centralized DBMS as compared to file system so now
applications can be developed to operate against the same stored data.
The applications may be developed without having to create any new
stored files.

5. Standards can be enforced: Since DBMS is a central system, so


standard can be enforced easily may be at Company level,

16
Department level, National level or International level. The
standardized data is very helpful during migration or interchanging of
data. The file system is an independent system so standard cannot be
easily enforced on multiple independent applications.

6. Restricting unauthorized access: When multiple users share a


database, it is likely that some users will not be authorized to access
all information in the database. For example, account office data is
often considered confidential, and hence only authorized persons are
allowed to access such data. In addition, some users may be permitted
only to retrieve data, whereas other is allowed both to retrieve and to
update. Hence, the type of access operation retrieval or update must
also be controlled. Typically, users or user groups are given account
numbers protected by passwords, which they can use to gain access to
the database. A DBMS should provide a security and authorization
subsystem, which the DBA uses to create accounts and to specify
account restrictions. The DBMS should then enforce these restrictions
automatically.

7. Solving Enterprise Requirement than Individual


Requirement: Since many types of users with varying level of
technical knowledge use a database, a DBMS should provide a variety
of user interface. The overall requirements of the enterprise are more
important than the individual user requirements. So, the DBA can
structure the database system to provide an overall service that is
"best for the enterprise".

For example: A representation can be chosen for the data in storage


that gives fast access for the most important application at the cost of
poor performance in some other application. But, the file system
favors the individual requirements than the enterprise requirements

8. Providing Backup and Recovery: A DBMS must provide


facilities for recovering from hardware or software failures. The

17
backup and recovery subsystem of the DBMS is responsible for
recovery. For example, if the computer system fails in the middle of a
complex update program, the recovery subsystem is responsible for
making sure that the .database is restored to the state it was in before
the program started executing.

9. Cost of developing and maintaining system is lower: It is much


easier to respond to unanticipated requests when data is centralized in
a database than when it is stored in a conventional file system.
Although the initial cost of setting up of a database can be large, but
the cost of developing and maintaining application programs to be far
lower than for similar service using conventional systems. The
productivity of programmers can be higher in using non-procedural
languages that have been developed with DBMS than using
procedural languages.

10. Data Model can be developed: The centralized system is able to


represent the complex data and interfile relationships, which results
better data modeling properties. The data madding properties of
relational model is based on Entity and their Relationship, which is
discussed in detail in chapter 4 of the book.

11. Concurrency Control: DBMS systems provide mechanisms to


provide concurrent access of data to multiple users.

18
Disadvantages of DBMS:-

The disadvantages of the database approach are summarized as


follows:

1. Complexity: The provision of the functionality that is expected of


a good DBMS makes the DBMS an extremely complex piece of
software. Database designers, developers, database administrators and
end-users must understand this functionality to take full advantage of
it. Failure to understand the system can lead to bad design decisions,
which can have serious consequences for an organization.

19
2. Size: The complexity and breadth of functionality makes the
DBMS an extremely large piece of software, occupying many
megabytes of disk space and requiring substantial amounts
of memory to run efficiently.

3. Performance: Typically, a File Based system is written for a


specific application, such as invoicing. As result, performance is
generally very good. However, the DBMS is written to be more
general, to cater for many applications rather than just one. The effect
is that some applications may not run as fast as they used to.

4. Higher impact of a failure: The centralization of resources


increases the vulnerability of the system. Since all users and
applications rely on the ~vailabi1ity of the DBMS, the failure of any
component can bring operations to a halt.

5. Cost of DBMS: The cost of DBMS varies significantly, depending


on the environment and functionality provided. There is also the
recurrent annual maintenance cost.

6. Additional Hardware costs: The disk storage requirements for the


DBMS and the database may necessitate the purchase of additional
storage space. Furthermore, to achieve the required performance it
may be necessary to purchase a larger machine, perhaps even a
machine dedicated to running the DBMS. The procurement of
additional hardware results in further expenditure.

7. Cost of Conversion: In some situations, the cost oftlle DBMS and


extra hardware may be insignificant compared with the cost of
converting existing applications to run on the new DBMS and
hardware. This cost also includes the cost of training staff to use these

20
new systems and possibly the employment of specialist staff to help
with conversion and running of the system. This cost is one of the
main reasons why some organizations feel tied to their current
systems and cannot switch to modern database technology.

21
Normalization of Database:-

Database Normalization is a technique of organizing the data in the


database. Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anomalies. It is a multi-step
process that puts data into tabular form by removing duplicated data
from the relation tables. If a database design is not perfect, it may
contain anomalies, which are like a bad dream for any database
administrator. Managing a database with anomalies is next to
impossible.

Normalization is a process of organizing the data in database to avoid


data redundancy, insertion anomaly, update anomaly & deletion
anomaly. In other words, we can say that normalization is a database
design technique which organizes tables in a manner that reduces
redundancy and dependency of data.

Normalization is the process of reducing duplication in a database,


with the ultimate goal of eliminating duplicate data entirely. While
duplicated data can cause a database to be greedy with disk space, the
bigger issue is consistency. Duplication creates the risk of data
corruption when information is inserted, updated, or deleted, by
having a particular piece of information in more than once place.

Normalization is used for mainly two purpose,

Eliminating redundant (useless) data.

Ensuring data dependencies make sense i.e. data is logically


stored.

22
Normalization tips:-

Achieving a Well-Designed Database:- In relational-database design


theory, normalization rules identify certain attributes that must be
present or absent in a well-designed database. There are a few rules
that can help you achieve a sound database design:

A table should have an identifier. The fundamental rule of


database design theory is that each table should have a unique row
identifier, a column or set of columns used to distinguish any
single record from every other record in the table. Each table
should have an ID column, and no two records can share the same
ID value. The column or columns serving as the unique row
identifier for a table are the primary key of the table. In the
Adventure Works database, each table contains an identity column
as the primary key column. For example, VendorID is primary key
for the Purchasing. Vendor table.

A table should store only data for a single type of entity.


Trying to store too much information in a table can hinder the
efficient and reliable management of the data in the table. In the
AdventureWorks sample database, the sales order and customer
information is stored in separate tables. Although you can have
columns that contain information for both the sales order and the
customer in a single table, this design leads to several problems.
The customer information, name and address, must be added and
stored redundantly for each sales order. This uses additional

23
storage space in the database. If a customer address changes, the
change must be made for each sales order. Also, if the last sales
order for a customer is removed from the Sales.SalesOrderHeader
table, the information for that customer is lost.

A table should [try to] avoid nullable columns. Tables can


have columns defined to allow for null values. A null value
indicates that there is no value. Although it can be useful to allow
for null values in isolated cases, you should use them sparingly.
This is because they require special handling that increases the
complexity of data operations. If you have a table with several
nullable columns and several of the rows have null values in the
columns, you should consider putting these columns in another
table linked to the primary table. By storing the data in two
separate tables, the primary table can be simple in design and still
handle the occasional need for storing this information.

A table should not have repeating values or columns. The


table for an item in the database should not contain a list of values
for a specific piece of information. For example, a product in the
AdventureWorks database might be purchased from multiple
vendors. If there is a column in the Production.Product table for
the name of the vendor, this creates a problem. One solution is to
store the name of all vendors in the column. However, this makes
it difficult to show a list of the individual vendors. Another
solution is to change the structure of the table to add another
column for the name of the second vendor. However, this allows
for only two vendors. Additionally, another column must be added
if a book has three vendors. If you find that you have to store a list
of values in a single column, or if you have multiple columns for a
single piece of data, such as TelephoneNumber1, and
TelephoneNumber2, you should consider putting the duplicated
data in another table with a link back to the primary table. The

24
AdventureWorks database has a Production.Product table for
product information, a Purchasing.Vendor table for vendor
information, and a third table, Purchasing.ProductVendor. This
third table stores only the ID values for the products and the IDs of
the vendors of the products. This design allows for any number of
vendors for a product without modifying the definition of the
tables, and without allocating unused storage space for products
with a single vendor.

Problem Without Normalization:-

Without Normalization, it becomes difficult to handle and update the


database, without facing data loss. Insertion, Updation and Deletion
Anomalies are very frequent if Database is not normalized. To
understand these anomalies let us take an example of Student table.

S_id S_Name S_Address Subject_opted

401 Adam Noida Bio

402 Alex Panipat Maths

403 Stuart Jammu Maths

25
404 Adam Noida Physics

Updation Anomaly: To update address of a student who occurs


twice or more than twice in a table, we will have to
update S_Address column in all the rows, else data will become
inconsistent. If data items are scattered and are not linked to
each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
scattered over several places, a few instances get updated
properly while a few others are left with old values. Such
instances leave the database in an inconsistent state.
Insertion Anomaly: Suppose for a new admission, we have a
Student id(S_id), name and address of a student but if student
has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anomaly. We tried to
insert data in a record that does not exist at all.
Deletion Anomaly: If (S_id) 401 has only one subject and
temporarily he drops it, when we delete that row, entire student
record will be deleted along with it. We tried to delete a record,
but parts of it were left undeleted because of unawareness, the
data is also saved somewhere else.

26
Normalization Rule:-

Normalization rule are divided into following normal form.

27
1. First Normal Form

2. Second Normal Form

3. Third Normal Form

4. Fourth Normal Form

5. Fifth Normal Form

6. BCNF

First Normal Form (1NF):-

As per First Normal Form, no two Rows of data must contain


repeating group of information i.e each set of column must have a
unique value, such that multiple columns cannot be used to fetch the
same row. Each table should be organized into rows, and each row
should have a primary key that distinguishes it as unique.

In other words, we can say that as per the rule of first normal form, an
attribute (column) of a table cannot hold multiple values. It should
hold only atomic values. There are following first normal form rules:-

Each table cell should contain single value.


Each record needs to be unique.

The Primary key is usually a single column, but sometimes more than
one column can be combined to create a single primary key. For
example consider a table which is not in First normal form

28
Student Table :

Student Age Subject

Adam 15 Biology, Maths

Alex 14 Maths

Stuart 17 Maths

In First Normal Form, any row must not have a column in which
more than one value is saved, like separated with commas. Rather
than that, we must separate such data into multiple rows.

Student Table following 1NF will be :

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths

Stuart 17 Maths

29
Using the First Normal Form, data redundancy increases, as there will
be many columns with same data in multiple rows but each row as a
whole will be unique.

Second Normal Form (2NF):-

As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table
that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key
for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form. There are
following second normal form rules:-

Rule 1- Be in 1NF
Rule 2- Single Column Primary Key

In example of First Normal Form there are two rows for Adam, to
include multiple subjects that he has opted for. While this is
searchable, and follows First normal form, it is an inefficient use of
space. Also in the above Table in First Normal Form, while the
candidate key is {Student, Subject}, Age of Student only depends on
Student column, which is incorrect as per Second Normal Form. To
achieve second normal form, it would be helpful to split out the

30
subjects into an independent table, and match them up using the
student names as foreign keys.

New Student Table following 2NF will be :

Student Age

Adam 15

Alex 14

Stuart 17

In Student Table the candidate key will be Student column, because


all other column i.e Age is dependent on it.

New Subject Table introduced for 2NF will be :

Student Subject

Adam Biology

Adam Maths

Alex Maths

Stuart Maths

31
In Subject Table the candidate key will be {Student, Subject}
column. Now, both the above tables qualifies for Second Normal
Form and will never suffer from Update Anomalies. Although there
are a few complex cases in which table in Second Normal Form
suffers Update Anomalies, and to handle those scenarios Third
Normal Form is there.

32
Third Normal Form (3NF):-

Third Normal form applies that every non-prime attribute of table


must be dependent on primary key, or we can say that, there should
not be the case that a non-prime attribute is determined by another
non-prime attribute. So this transitive functional dependency should
be removed from the table and also the table must be in Second
Normal form. For example, consider a table with following fields.
There are following third normal form rules:-

Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies

Student_Detail Table :

Student_id Student_name DOB Street city State Zip

In this table Student_id is Primary key, but street, city and state
depends upon Zip. The dependency between zip and other fields is
called transitive dependency. Hence to apply 3NF, we need to move
the street, city and state to new table, with Zip as primary key.

New Student_Detail Table :

Student_id Student_name DOB Zip

Address Table :

Zip Street City state

33
The advantage of removing transitive dependency is,

Amount of data duplication is reduced.

Data integrity achieved.

Fourth Normal Form:-


Under fourth normal form, a record type should not contain two or
more independent multi-valued facts about an entity. In addition, the
record must satisfy third normal form. The term "independent" will be
discussed after considering an example.

Consider employees, skills, and languages, where an employee may


have several skills and several languages. We have here too many-to-
many relationships, one between employees and skills, and one
between employees and languages. Under fourth normal form, these
two relationships should not be represented in a single record such as

| EMPLOYEE | SKILL | LANGUAGE |

Instead, they should be represented in the two records

| EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE |

Note that other fields, not involving multi-valued facts, are permitted
to occur in the record, as in the case of the QUANTITY field in the
earlier PART/WAREHOUSE example.

34
35
Fifth Normal Form:-
Fifth normal form deals with cases where information can be
reconstructed from smaller pieces of information that can be
maintained with less redundancy. Second, third, and fourth normal
forms also serve this purpose, but fifth normal form generalizes to
cases not covered by the others.
We will not attempt a comprehensive exposition of fifth normal form,
but illustrate the central concept with a commonly used example,
namely one involving agents, companies, and products. If agents
represent companies, companies make products, and agents sell
products, then we might want to keep a record of which agent sells
which product for which company. This information could be kept in
one record type with three fields:

AGENT COMPANY PRODUCT


Smith Ford Car

Smith GM Truck

This form is necessary in the general case. For example, although


agent Smith sells cars made by Ford and trucks made by GM, he does
not sell Ford trucks or GM cars. Thus we need the combination of
three fields to know which combinations are valid and which are not.

But suppose that a certain rule was in effect: if an agent sells a certain
product, and he represents a company making that product, then he
sells that product for that company.

36
AGENT COMPANY PRODUCT
Smith Ford Car
Smith Ford Truck
Smith GM Car
Smith GM Truck
Jones Ford Car

37
Boyce and Code Normal Form (BCNF):-

Boyce and Code Normal Form is a higher version of the Third


Normal form. This form deals with certain type of anomaly that is not
handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. Boyce-Codd
Normal Form (BCNF) is one of the forms of database normalization.
A database table is in BCNF if and only if there are no non-trivial
functional dependencies of attributes on anything other than a
superset of a candidate key. BCNF is really an extension of 3rd
Normal Form (3NF). A relational schema R is considered to be
in BoyceCodd normal form (BCNF) if, for every one of its
dependencies X Y, one of the following conditions holds true:

X Y is a trivial functional dependency (i.e., Y is a subset of


X)
X is a super key for schema R

For a table to be in BCNF, following conditions must be satisfied:

R must be in 3rd Normal Form

and, for each functional dependency ( X -> Y ), X should be a


super Key.

Consider the following relationship: R(A,B,C,D) and following


dependencies:-

A->BCD

BC->AD

38
D->B

Above relationship is already in 3rd NF. Keys are A and BC.

Hence, in the functional dependency, A->BCD, A is the super key, in


second relation, BC->AD, BC is a key.

Hence we can break our relationship R into two relationships R1 and


R2.

R (A, B, C, D)

R1 (A, D, C) R2 (D, B)

Breaking, table into two tables, one with A, D and C while the other
with D and B.

Usually tables that are in Third Normal Form are already in Boyce
Codd Normal Form. Boyce Codd Normal Form (BCNF) is considered
a special condition of third Normal form. A table is in BCNF if every
determinant is a candidate key. A table can be in 3NF but not in
BCNF. This occurs when a non key attribute is a determinant of a key
attribute. The dependency diagram may look like the one below

39
The table is in 3NF. A and B are the keys and C and D depend on both
A and B. There are no transitive dependencies existing between the
non key attributes, C and D.

The table is not in BCNF because a dependency exists between C and


B. In other words if we know the value of C we can determine the
value of B.

We can also show the dependencies as

AB->CD

C->B

Advantages of normalization:-

Reduce the redundancy.

A large amount added bendable database design.

Better overall database association.

Data uniformity inside the database.

Disadvantages of normalization:-

More tables to join: By spreading out your data into more tables,
you increase the need to join tables.
Tables contain codes instead of real data: Repeated data is stored
as codes rather than meaningful data. Therefore, there is always
a need to go to the lookup table for the value.
Data model is difficult to query against: The data model is
optimized for applications, not for ad hoc querying.

40
of data is to eliminate the data redundancy because it is
difficult in a relational
Literature review:-
Data normalization is a standardized way of making data
structure clean and keeping it efficient by eliminating the data
duplication and errors in data operation. The aim
of conducting data norma liza tion process in a set
database to store objects sharing similar attributes in several tables.
For a successful database design, data normalization plays a vital role.
Without normalization the database operations can generate
errors and the database system can be poor, inefficient and
inaccurate. .It is a process in which data attributes within a data
model are organized to increase the cohesion of entity types. The aim
of conducting data norma liza tion
process in a setof data is to elimina te the
data redundancy because it is difficult in a relational database to
store objects sharing similar attributes in several tables. For a
successful database design, data normalization plays a vital role.

41
Without normalization the database operations can generate
errors and the database system can be poor, inefficient and
Comparative Analysis:-

Related Tools:-

This tool is designed mainly to help students learn functional


dependencies, normal forms, and normalization. It can also be used to
test your table for normal forms or normalize your table to 2NF, 3NF
or BCNF using a given set of functional dependencies.

Recent Issues & research challenges:-

Database normalization, or data normalization, is a technique to


organize the contents of the tables for transactional databases and data
warehouses. Normalization is part of successful database design;
without normalization, database systems can be inaccurate, slow, and
inefficient, and they might not produce the data you expect.

Future trends:-

42
Conclusion:-

With the help of data normalization the data can be modified in the
database like for inserting, updating, deleting the data. Vinces ware
house has large no. of items that needs to be entered into the database.
The data present in warehouse has huge amount of demoralized data
and this data normalization has to be summarized and normalized
which helps in increasing the performance of the database. These
updates and modifications are common in any database. If the data is

not normalized or poorly normalized lead to burden over the


excessive I/O disk which results in the poor performance of the
database.
While we have tried to present the normal forms in a simple and
understandable way, we are by no means suggesting that the data
design process is correspondingly simple. The design process
involves many complexities which are quite beyond the scope of this
43
report. In the first place, an initial set of data elements and records has
to be developed, as candidates for normalization. Then the factors
affecting normalization have to be assessed:

Single-valued vs. multi-valued facts.

Dependency on the entire key.

Independent vs. dependent facts.

The presence of mutual constraints.

The presence of non-unique or non-singular representations.

And, finally, the desirability of normalization has to be assessed, in


terms of its performance impact on retrieval applications.

References:-

1. Codd, E.F. "Further Normalization of the Data Base Relational


Model". (Presented at Courant Computer Science Symposia
Series 6, "Data Base Systems", New York City, May 2425,
1971.) IBM Research Report RJ909 (August 31, 1971).
Republished in Randall J. Rustin (ed.), Data Base Systems:
Courant Computer Science Symposia Series 6. Prentice-Hall,
1972.

2. Codd, E. F. "Recent Investigations into Relational Data Base


Systems". IBM Research Report RJ1385 (April 23, 1974).

44
Republished in Proc. 1974 Congress (Stockholm, Sweden,
1974). , N.Y.: North-Holland (1974).

3. C.J. Date. An Introduction to Database Systems. Addison-


Wesley (1999), p. 290

4. "The adoption of a relational model of data ... permits the


development of a universal data sub-language based on an
applied predicate calculus. A first-order predicate calculus
suffices if the collection of relations is in first normal form.
Such a language would provide a yardstick of linguistic power
for all other proposed data languages, and would itself be a
strong candidate for embedding (with appropriate syntactic
modification) in a variety of host languages (programming,
command- or problem-oriented)." Codd, "A Relational Model
of Data for Large Shared Data Banks", p. 381

5. Codd, E.F. Chapter 23, "Serious Flaws in SQL", in the


Relational Model for Database Management: Version 2.
Addison-Wesley (1990), pp. 371389

6. Codd, E.F. "Further Normalization of the Data Base Relational


Model", p. 34

Retrieved May 31, 2015, from http://sqlmag.com/database-


performance-tuning/sql-design-why-you-need-database-
normalizationhttp://sqlmag.com/database-performance-tuning/sql-
design-why-you-need-database-normalizatio

45

Das könnte Ihnen auch gefallen