Beruflich Dokumente
Kultur Dokumente
NORMALIZATION
Colloquium Report Submitted for the Partial Requirement for
the Award of Degree of Master of Computer Applications
Submitted by:
KM BABITA
Course: MCA
Department of IT
1
Declaration
(Signature of Student)
Km Babita
MCA-VI Semester
2
Index
1. Abstract
2. Objectives
3. Introduction
Database
Component
Advantages
Disadvantages
4. Normalization of database
5. Normalization tips
7. Normalization rules
8. Advantages
9. Disadvantages
15. Conclusion
3
16. References
Abstract:-
In mathematical logic and theoretical computer science, a rewrite
has the normalization abstract is terminating if every term is
4
strongly normalizing; that is, if every sequence of rewrites
eventually terminates with an irreducible term, also called a normal
form. A rewrite system may also have the weak normalization
property, meaning that eventually yields a normal form that is an
irreducible term. The use of relational database (RDBMS)
technology and different levels of normalization (1st, 2nd, 3rd, 4th
normal data structures) is proliferating throughout the data
processing industry. RDBMS systems are valued for their ability to
maintain the integrity of data, reduce unnecessary data redundancy,
and provide maximum flexibility in retrieval.
Objectives:-
5
A basic objective of the first normal form defined by Codd in 1970
was to permit data to be queried and manipulated using a "universal
data sub-language" grounded in first-order logic. (SQL is an example
of such a data sub-language, albeit one that Codd regarded as
seriously flawed.)
6
Introduction:-
Database:-
A database is a collection of information that is organized
so that it can easily be accessed, managed, and updated. In
one view, databases can be classified according to types of
content: bibliographic, full-text, numeric, and images. In
computing, databases are sometimes classified according
to their organizational approach. The most prevalent
approach is the relational database, a tabular database in
which data is defined so that it can be reorganized and
accessed in a number of different ways. A distributed
database is one that can be dispersed or replicated among
different points in a network.
A database management system (DBMS) is system
software for creating and managing databases. The DBMS
provides users and programmers with a systematic way to
create, retrieve, update and manage data. A DBMS makes
it possible for end users to create, read, update and
delete data in a database. The DBMS essentially serves as
an interface between the database and end users
or application programs, ensuring that data is consistently
organized and remains easily accessible.
The DBMS manages three important things: the data, the
database engine that allows data to be accessed, locked and
modified -- and the database schema, which defines the
databases logical structure. These three foundational
elements help provide concurrency, security, data
integrity and uniform administration procedures. Typical
database administration tasks supported by the DBMS
include change management, performance
monitoring/tuning and backup and recovery. Many
7
database management systems are also responsible for
automated rollbacks, restarts and recovery as well as
the logging and auditing of activity.
The DBMS is perhaps most useful for providing a
centralized view of data that can be accessed by multiple
users, from multiple locations, in a controlled manner. A
DBMS can limit what data the end user sees, as well as
how that end user can view the data, providing many views
of a single database schema. End users and software
programs are free from having to understand where the
data is physically located or on what type of storage media
it resides because the DBMS handles all requests. The
DBMS can offer both logical and physical data
independence. That means it can protect users and
applications from needing to know where data is stored or
having to be concerned about changes to the physical
structure of data. With relational DBMSs (RDBMSs), this
API is SQL, a standard programming language for
defining, protecting and accessing data in a RDBMS.
8
Components of the Database System Environment:-
There are five major components in the database system environment
and their interrelationships are.
Hardware
Software
Data
Users
Procedures
9
Management System or DBMS. All requests from users for access to
the database are handled by the DBMS. One general function
provided by the DBMS is thus the shielding of database users from
complex hardware-level detail.
The database should contain all the data needed by the organization.
One of the major features of databases is that the actual data are
separated from the programs that use the data. A database should
always be designed, built and populated for a particular audience and
for a specific purpose.
4. Users: There are a number of users who can access or retrieve data
on demand using the applications and interfaces provided by the
10
DBMS. Each type of user needs different software capabilities. The
users of a database system can be classified in the following groups,
depending on their degrees of expertise or the mode of their
interactions with the DBMS. The users can be:
Naive Users
Online Users
Application Programmers
Sophisticated Users
Naive Users: Naive Users are those users who need not be aware of
the presence of the database system or any other system supporting
their usage. Naive users are end users of the database who work
through a menu driven application program, where the type and range
of response is always indicated to the user.
Online Users: Online users are those who may communicate with the
database directly via an online terminal or indirectly via a user
interface and application program. These users are aware of the
presence of the database system and may have acquired a certain
amount of expertise with in the limited interaction permitted with a
database.
11
Sophisticated Users: Such users interact with the system without,
writing programs.
12
Log on to the DBMS.
Advantages of DBMS:-
It is clear from the above file systems, that there is some common
data of the student which has to be mentioned in each application, like
Roll no, Name, Class, Phone No Address etc. This will cause the
problem of redundancy which results in wastage of storage space and
difficult to maintain, but in case of centralized database, data can be
shared by number of applications and the whole college can maintain
its computerized data with the following database:
14
It is clear in the above database that Roll no, Name, Class, Father
Name, Address, Phone No, Date_of_birth which are stored repeatedly
in file system in each application, need not be stored repeatedly in
case of database, because every other application can access this
information by joining of relations on the basis of common column
i.e. Roll no. Suppose any user of Library system need the Name,
Address of any particular student and by joining of Library and
General Office relations on the basis of column Roll no he/she can
easily retrieve this information.
General Office (because class field appears only once in the whole
database), and all other applications will get the class information
about the student from the General Office table so the integrity
constraint is applied to the whole database. So, we can conclude that
integrity constraint can be easily enforced in centralized DBMS
system as compared to file system.
16
Department level, National level or International level. The
standardized data is very helpful during migration or interchanging of
data. The file system is an independent system so standard cannot be
easily enforced on multiple independent applications.
17
backup and recovery subsystem of the DBMS is responsible for
recovery. For example, if the computer system fails in the middle of a
complex update program, the recovery subsystem is responsible for
making sure that the .database is restored to the state it was in before
the program started executing.
18
Disadvantages of DBMS:-
19
2. Size: The complexity and breadth of functionality makes the
DBMS an extremely large piece of software, occupying many
megabytes of disk space and requiring substantial amounts
of memory to run efficiently.
20
new systems and possibly the employment of specialist staff to help
with conversion and running of the system. This cost is one of the
main reasons why some organizations feel tied to their current
systems and cannot switch to modern database technology.
21
Normalization of Database:-
22
Normalization tips:-
23
storage space in the database. If a customer address changes, the
change must be made for each sales order. Also, if the last sales
order for a customer is removed from the Sales.SalesOrderHeader
table, the information for that customer is lost.
24
AdventureWorks database has a Production.Product table for
product information, a Purchasing.Vendor table for vendor
information, and a third table, Purchasing.ProductVendor. This
third table stores only the ID values for the products and the IDs of
the vendors of the products. This design allows for any number of
vendors for a product without modifying the definition of the
tables, and without allocating unused storage space for products
with a single vendor.
25
404 Adam Noida Physics
26
Normalization Rule:-
27
1. First Normal Form
6. BCNF
In other words, we can say that as per the rule of first normal form, an
attribute (column) of a table cannot hold multiple values. It should
hold only atomic values. There are following first normal form rules:-
The Primary key is usually a single column, but sometimes more than
one column can be combined to create a single primary key. For
example consider a table which is not in First normal form
28
Student Table :
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which
more than one value is saved, like separated with commas. Rather
than that, we must separate such data into multiple rows.
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
29
Using the First Normal Form, data redundancy increases, as there will
be many columns with same data in multiple rows but each row as a
whole will be unique.
As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table
that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key
for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form. There are
following second normal form rules:-
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key
In example of First Normal Form there are two rows for Adam, to
include multiple subjects that he has opted for. While this is
searchable, and follows First normal form, it is an inefficient use of
space. Also in the above Table in First Normal Form, while the
candidate key is {Student, Subject}, Age of Student only depends on
Student column, which is incorrect as per Second Normal Form. To
achieve second normal form, it would be helpful to split out the
30
subjects into an independent table, and match them up using the
student names as foreign keys.
Student Age
Adam 15
Alex 14
Stuart 17
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
31
In Subject Table the candidate key will be {Student, Subject}
column. Now, both the above tables qualifies for Second Normal
Form and will never suffer from Update Anomalies. Although there
are a few complex cases in which table in Second Normal Form
suffers Update Anomalies, and to handle those scenarios Third
Normal Form is there.
32
Third Normal Form (3NF):-
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
Student_Detail Table :
In this table Student_id is Primary key, but street, city and state
depends upon Zip. The dependency between zip and other fields is
called transitive dependency. Hence to apply 3NF, we need to move
the street, city and state to new table, with Zip as primary key.
Address Table :
33
The advantage of removing transitive dependency is,
Note that other fields, not involving multi-valued facts, are permitted
to occur in the record, as in the case of the QUANTITY field in the
earlier PART/WAREHOUSE example.
34
35
Fifth Normal Form:-
Fifth normal form deals with cases where information can be
reconstructed from smaller pieces of information that can be
maintained with less redundancy. Second, third, and fourth normal
forms also serve this purpose, but fifth normal form generalizes to
cases not covered by the others.
We will not attempt a comprehensive exposition of fifth normal form,
but illustrate the central concept with a commonly used example,
namely one involving agents, companies, and products. If agents
represent companies, companies make products, and agents sell
products, then we might want to keep a record of which agent sells
which product for which company. This information could be kept in
one record type with three fields:
Smith GM Truck
But suppose that a certain rule was in effect: if an agent sells a certain
product, and he represents a company making that product, then he
sells that product for that company.
36
AGENT COMPANY PRODUCT
Smith Ford Car
Smith Ford Truck
Smith GM Car
Smith GM Truck
Jones Ford Car
37
Boyce and Code Normal Form (BCNF):-
A->BCD
BC->AD
38
D->B
R (A, B, C, D)
R1 (A, D, C) R2 (D, B)
Breaking, table into two tables, one with A, D and C while the other
with D and B.
Usually tables that are in Third Normal Form are already in Boyce
Codd Normal Form. Boyce Codd Normal Form (BCNF) is considered
a special condition of third Normal form. A table is in BCNF if every
determinant is a candidate key. A table can be in 3NF but not in
BCNF. This occurs when a non key attribute is a determinant of a key
attribute. The dependency diagram may look like the one below
39
The table is in 3NF. A and B are the keys and C and D depend on both
A and B. There are no transitive dependencies existing between the
non key attributes, C and D.
AB->CD
C->B
Advantages of normalization:-
Disadvantages of normalization:-
More tables to join: By spreading out your data into more tables,
you increase the need to join tables.
Tables contain codes instead of real data: Repeated data is stored
as codes rather than meaningful data. Therefore, there is always
a need to go to the lookup table for the value.
Data model is difficult to query against: The data model is
optimized for applications, not for ad hoc querying.
40
of data is to eliminate the data redundancy because it is
difficult in a relational
Literature review:-
Data normalization is a standardized way of making data
structure clean and keeping it efficient by eliminating the data
duplication and errors in data operation. The aim
of conducting data norma liza tion process in a set
database to store objects sharing similar attributes in several tables.
For a successful database design, data normalization plays a vital role.
Without normalization the database operations can generate
errors and the database system can be poor, inefficient and
inaccurate. .It is a process in which data attributes within a data
model are organized to increase the cohesion of entity types. The aim
of conducting data norma liza tion
process in a setof data is to elimina te the
data redundancy because it is difficult in a relational database to
store objects sharing similar attributes in several tables. For a
successful database design, data normalization plays a vital role.
41
Without normalization the database operations can generate
errors and the database system can be poor, inefficient and
Comparative Analysis:-
Related Tools:-
Future trends:-
42
Conclusion:-
With the help of data normalization the data can be modified in the
database like for inserting, updating, deleting the data. Vinces ware
house has large no. of items that needs to be entered into the database.
The data present in warehouse has huge amount of demoralized data
and this data normalization has to be summarized and normalized
which helps in increasing the performance of the database. These
updates and modifications are common in any database. If the data is
References:-
44
Republished in Proc. 1974 Congress (Stockholm, Sweden,
1974). , N.Y.: North-Holland (1974).
45