Sie sind auf Seite 1von 23

2.1.

Entity Relationship Modeling


Introduction
The Entity-Relationship model (or ER model) is a way of graphically representing the logical
relationships of objects in order to create a database. Creation of an ER diagram is the first
step in designing a database. It helps the designer(s) to understand and to specify the desired
components of the database and the relationships among those components. An ER model is a
graphical representation which contains entities or "items", relationships among the entities
and attributes of the entities and relationships.
The following are the three basic elements in the ER model.

Entities : Any objects or items

Attribute: The Attribute is nothing but a property of an entity

Relationships : The links between various entities

Let us take University database as an example and try to understand how ER model is arrived
at.
Example:
A university consists of a number of departments. Each department offers several courses.
Each course includes a number of modules. Students enroll in a particular course and study
modules towards the completion of that course. Each module is taught by a lecturer from the
appropriate department, and each lecturer teaches a group of students.
Entities
Entities are real world items or concepts that exist on their own and are represented as objects
or things of interest. An entity type is a collection of entities that share a common definition.
Identify all nouns in our university example,
A university consists of a number of departments. Each department offers several courses.
Each course includes a number of modules. Students enroll in a particular course and study
modules towards the completion of that course. Each module is taught by a lecturer from the
appropriate department, and each lecturer teaches a group of students.
This scenario consists of students, lecturers, modules, courses and departments. So here the
physical things(Physical things are those which exist in this world, that we can touch, feel
etc.) like students, lecturers and abstract things(An abstract thing is an idea or a concept in
your mind. It is not something that you can physically reach out and touch, smell, hear, taste,
see) like modules,department etc., make an entity type. If we take students as an entity type,
then each student in the university is an entity. The entities are represented as nouns in the
description because they are objects or things.
We can touch an entity of physical things and feel the entity of abstract things but an entity
type is simply an idea. Student is an idea of physical things (entity type) while Scott, Nancy,
Lindsey, and Mackenzie are touchable (Student names are entities). Department is an idea of
abstract things (entity type) while IT,CSE,ECE and CIVIL are entities.
Entity Diagrams

In an E-R Diagram, an entity is usually drawn as a rectangle.


The box is labeled with the name of the entity type. The entities identified in our
example are shown in Figure 2.1.

Figure 2.1 : Entities


Weak
Entity
If an entity depends on another existing entity then it is considered as weak. A weak entity
cannot be identified by its own attributes. A weak entity is represented by double rectangles
in
E-R
diagram.
Example:
SubModule is a good example for weak entity. The SubModule will be meaningless without a
Module entity and so it depends on the existence of Module as shown in Figure 2.2

Figure 2.2 : Weak Entity


Attributes
Attributes represent properties, facts, aspects or details of an entity. There are attributes or

particular properties that describe each entity.


In our University database each student in the university will have a Student ID, Name,
Course taken etc. Similarly each lecturer will have his/her own properties of ID, Name,
department etc.
Attributes will have a name, an associated entity and properties of an entity. Attributes are
often nouns also.
Attributes in ER diagram

In an E/R Diagram attributes are represented by an oval.

A line is used to link an attribute to its entity.

The figure below represents the entities and their corresponding attributes in the University
database.

Figure 2.3 : Entities and Attributes


Multivalued Attribute
A multivalued attribute is an attribute that has more than one value attached to it. For instance
if phone number and graduating degree are the attributes of an Entity called Person, then
those attributes could have multiple values, as a person could have multiple phone numbers
or could hold multiple graduating degrees. We represent a multivalued attribute by double
oval in E-R diagram.
Single Valued Attribute: Attribute that holds a single value; in Our example the attributes of
Students such as Roll number, Age, Date of Birth, City etc., can have only a single value.
In our example, a Student can have multiple phone numbers, and so Phone number is a
multivalued attribute.

Figure 2.4 : Multivalued Attributes


Relationships
The association between two or more entities is called a relationship. In our University
database, each student studies several Modules and each Lecturer teaches several Students.
Here the entity types Student - Modules and Lecturer - Students have a relationship. The
Verbs most often describe relationships between entities.
Identify the verbs(relationships) in our University database example:
A university consists of a number of departments. Each department offers several courses.
Each course includes a number of modules. Students enroll in a particular course and study
modules towards the completion of that course. Each module is taught by a lecturer from the
appropriate a department, and each lecturer teaches a group of students.
Each relationship has a name, a set of entities that participate in it, a degree and a cardinality
ratio. The degree is the number of entities that participate in that relationship(most have
degree 2, For example in figure 2.3 each Lecturer teaches several Students, so we can say that
this relationship has degree 2. Here the degree is 2 because it has two entities related to it).
Relationships in an ER diagram
Relationships are denoting links between two entities.

The name of the relationship is given in a diamond box (For example Belongs to
as shown in Figure 5.1).

Cardinality Ratio
Each entity can be involved in three types of relationships as shown:
One to One (1:1)

Each student belongs to one University. We can illustrate this ratio by writing
ones on the lines indicating the relationship as shown in Figure 2.5.

Figure 2.5 : One-one Mapping

The notation for the 1:1 relationship is shown in Figure 2.6.

Figure 2.6 : One-one Mapping


One to Many (1:M)

A lecturer teaches many students, and this One to Many relationship is illustrated
in figure 2.7.

Figure 2.7 : One-Many

The notation for the 1:M relationship is shown in Figure 2.8.

Figure 2.8 : One-Many


Many to Many (M:M)

Each student takes many modules, and each module is taken by many students as
shown in figure 2.9.

Figure 2.9 : Many-Many


Making E/R Models
Till now we have seen how to identify the basic elements in an ER Diagram. Finally, to make
an E/R model you need to identify:

Entities

Attributes

Relationships

Cardinality ratios

Now lets see how an ER model will look like when all these elements are put together. The
final ER Model of our University database is shown in the Figure 2.10. In this figure we have

shown the entities and the relationship between the entities which depict the complete ER
model of a University. Here Department, Course, Module, Lecturer and Student are the
entities.
The relationships in the Figure 2.10 are defined as Department Offers many Courses and
those two entities have One to Many relationship. A Department Assigns Many
Lecturers(One(1) To Many(n)). Each Lecturer teaches Many Students(One(1) To Many(n)).
Every Student takes several Modules(Many(n) To Many(n)). Every Module includes Many
Courses(Many(n) To Many(n)). A Course is enrolled by Many Students(One(1) to Many(n)).
The
ER
Model
for
the
above
example
is
given
below:
The complete ER Model for our University database will be as shown in the diagram below.
It is an Integrated ER model containing the Entities and Relationships for a University
database.

Figure 2.10 : University ER Model


Summary

ER Diagrams play a major role in database designing.

The ER Diagrams act as a non-technical communication tool.

This tool is used by both technical and non-technical users.

Entities represent real world things; They can be conceptual as a transaction or


physical as a bank.

Figure 2.11 : ER Model Summary

2.2. Normalization - First Normal Form, Second Normal Form and Third
Normal Form
The database design technique that is used to organize tables in a manner that reduces
redundancy and dependency of data is called Normalization. It is the scientific process of
decomposing complex tables(Relations) into smaller and easily manageable tables. The use
of normalization is to accurately access data from database. Without normalization, database
systems can be inaccurate, redundant, slow and inefficient. They might not produce the data
that is expected. Listed below are the advantages of normalization.
Advantages

Smaller, simpler and well-structured relations.

Avoids unnecessary duplication of data. That is, it helps to reduce redundancy.

Provides data integrity.

Helps to avoid update anomalies. That is, it isolates data so that additions, deletions,
and modifications of a field can be made in just one table. The changes are then
propagated to the rest of the database through the defined relationships.

Save storage space.

Edgar Codd invented the relational model and he proposed the theory of normalization with
the introduction of First Normal Form. He continued to extend the theory with Second and
Third Normal Forms. Later Edgar Codd joined with Raymond F. Boyce to develop the theory
of Boyce-Codd Normal Form(BCNF).
Theory of Normalization is still developing. For example, the discussions on 6th Normal
Form are in progress. However, in most practical applications normalization achieves its best
in Third Normal Form. The evolution of Normalization theories is illustrated below:

Figure 2.12 : Normalization Evolution


Let's understand a few things before we proceed

--

What is a KEY ?
A KEY is a value used to uniquely identify a row in a table. It could be a single column or a
combination
of
multiple
columns.
Note: The columns in a table that are NOT used to uniquely identify a record or row in a table
are called non-key columns.
What is a primary Key?
A primary key is a single column value that is used to uniquely identify a database record.

Figure 2.13 : Primary Key

The primary key column in a table must always have a value.

The primary key column in a table cannot have duplicate values. Each primary key
value must be unique.

The primary key values cannot be modified.

The primary key column should have a value when a new record is inserted into the
table.

Example:
The table below contains the details of students. Here studentId is Primary Key which is used
to uniquely identify the details of a student from the table.

Figure 2.14 : Primary Key Illustration


Composite Key
If two or more columns are used to uniquely identify a record then combination of those
multiple
columns
constitutes
a
composite
key.
In the Student table given below, we have StudentId, TestId and Mark. Here one student can
take multiple tests and one test can be taken by multiple students. In this case in order to
uniquely identify the mark of a student in a test we require both StudentId and TestId. This is
a composite key.
Student Table

Table 2.1
Functional Dependency
In simple terms, functional dependency can be explained as follows. If you know one
attribute then you can get another attribute. Then both these attributes are said to be
functionally dependent. In the Student table given below, we can get the attribute 'Name' if
you know the attribute 'StudentId', then Name and StudentId are functionally dependent. Here
we can say StudentId is determinant and Name as dependent.
For example, let's consider the Student table given below. Table 2.2 stores student
details(StudentId, Name, Languages Known), student's department details (Dept_No,
Dept_Name) and lecturer details (LecturerInCharge, Designation) for Students.
In this approach, we keep repeating the languages known and department details data for all
the students in the same field. This is called an UnNormalized table. Instead of storing the
same data again and again, we could normalize the data and create related tables.

Let's see how we can normalize the table,create related tables and learn forms with the
Student table(which is not normalized):
Student Table (UnNormalized Table):

Table 2.2
First Normal Form
To move from unnormalized form to first normal form all multi-valued attributes (called
repeating groups) should be removed. The repeating groups nust be eliminated. All attributes
must be atomic.
Table 2.2 is not in 1NF since there are repeating groups (more than 1 value in a field). The
column "Languages Known" has(English, Hindi and Tamil) in the Row(Tuple)1 and (English
and Hindi) in the Row(Tuple) 2 .To satisfy 1NF we can create separate rows for each value in
Languages Known by duplicating the values in the remaining columns. Table 2.3 represents
the same.
1NF Rules

Each column in a table should contain single value.

Each record needs to be unique as shown in Table 2.3

Table 2.3 : 1NF Form


Second
Normal
Form
Partial functional dependencies must be removed. If two attributes of a table are combined to
form a composite key, then the non-key attributes of that table must depend on both the
attributes of the composite key. They must not depend on one of the attributes, which is the
part of the composite key.
2NF Rules

Rule 1- The table should be in 1NF.

Rule 2- The Single Column must be used as Primary Key.

A relation in 1NF will be in second normal form (2NF) if there are no partial
dependencies.

Partial dependency
It is the functional dependency on part of the primary key instead of the entire primary key.
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the columns in Table 2.3. Here, assume that StudentId and Dept_No
together act as the key (Composite key). As per 2NF all non-key attributes must be dependent
on whole key.
In Table 2.3 the attribute 'Dept_Name' is functionally dependent on whole key
(StudentId+Dept_No). That is, you can get the department name only if you know both
StudentId and Dept_No. All other column attributes can be identified by just providing
'StudentId'. So for all other columns StudentId acts as the primary key. So split the table as
given below to satisfy 2NF.
Student

Table 2.4
Department

Table 2.5
Languages

Table 2.6
Introducing
Foreign
Key
A foreign key is a field in a table that matches the primary key column of another table. The
cross-reference tables can be achieved by Foreign Key.
In Table 2.7,Dept_No is the foreign Key

Table 2.7

Figure 2.15 : Foreign Key


Foreign key refers primary key of another table. It helps to connect the two tables.

The values of a foreign key and a primary key may be different.

The foreign key ensures that a row in a table is mapped to a corresponding row in
another table.

Foreign key does not have to be unique; most often it is not unique.

Foreign Key

Figure 2.16 : Foreign Key Illustration


Why do you need a foreign key?
Foreign key is required in RDBMS for the concept of Referential Integrity.
Referential integrity
It is a concept used in database to ensure that there is consistency in table relationships. If one
table has a foreign key to another table, then the concept of referential integrity states that
you cannot add a record to the table that contains the foreign key unless there is a
corresponding record in the link/relationship with the other table.
For example, consider the Figure 2.16 given in the previous page, where Dept_No in the
Student table is foreign key of Dept_No in Department table. Here let's try to add a student
with StudentId as "103" and Dept_No as "D003" in Student table as shown below. But the
entry for Dept_No "D003" is not present in Department table which means we have added a
student to a department which does not exist. This leads to inconsistency of data across
related tables. Hence RDMS has the concept of referential integrity which does not allow to
add a record to the table that contains the foreign key unless there is a corresponding record
in the table to which it is linked.
Student

Table 2.8
Department

Table 2.9
Transitive functional dependencies
When changing a non-key column might cause any of the other non-key columns to change,
it is called transitive functional dependency. Attributes that are not a part of the key must not
depend on any non-key attribute.
Consider the table 2.9. Changing the non-key column Lecturer In Charge , may change
Designation. Here Dept_No acts as the key. All other columns are non-key attributes. As per
3NF non-key attributes should not be dependent on any other non-key attributes but 'Lecturer
In Charge' is dependent on 'Designation'. Both Lecturer In Charge and Designation are nonkey attributes. So it forms transitive dependency. So, to satisfy 3NF let's split the table in a
short while.
Third Normal Form
Third normal form (3NF) is the third step in database normalization and it builds on the first
(INF)and second normal forms(2NF).
The Third Normal Form(3NF) states that all column references in the referenced data that are
not dependent on the primary key should be removed. Another way of putting this statement
is that only foreign key columns should be used to reference another table, and the other
columns from the parent table should not exist in the reference table.
The Second Normal form(2NF) covers in case of multi-column primary keys. 3NF is meant
to cover single column keys as mentioned in transitive functional dependencies above.
3NF Rules

Rule 1- The table should be in 2NF.

Rule 2- The table has no transitive functional dependencies which is explained above.

We need to divide our table if it has to be moved from second normal form(2NF) into Third
Normal form(3NF). In table 2.1 Dept_No acts as the key. All other columns are non-key
attributes. The non-key attributes should not be dependent on any other non-key attributes as
per third normal form. The 'Designation' is dependent on 'Lecturer In Charge' and these are
non key attributes in the Lecturer table explained. It forms transitive dependency. So, to
satisfy 3NF split the table as follows.
Student

Table 2.10

Department

Table 2.11
Lecturer

Table 2.12
Languages

Table 2.13
The example given above cannot be decomposed further to attain higher forms of
normalization because it is already normalized to the highest level.Normally only complex
data bases would need next levels of normalization.
2.3. Joins

What
are
Joins?
A join is a technique where records from two or more tables are retrieved through a single
SQL query and shown as a single output. As it forms a set, It can be saved as a table or used
as it is. A join is a means of combining columns from two tables by using values common to
both tables. It allows us to combine data from more than one table into a single result set. A
join condition is used in the WHERE clause of select, update and delete queries.
Note: The query will give results from two tables as Cartesian product(A Cartesian product is
defined as all possible combinations of rows in all tables). If join condition is omitted. The
first table's rows are joined with all rows of the second table. For example, if the first table
has 30 rows and the second table has 10 rows, the result will be 30 * 10, or 300 rows. This

query
Let's

will
use
the

take
two
tables

long
below
to

time
explain
the

to
join

execute.
conditions.

Table "Student"

Table 2.14
Table "Department"

Table 2.15
In the above example the column that is common between both the tables is Dept_No. Using
Dept_No,the Student and Department tables can be joined to combine data from both the
tables as shown below.

Figure 2.17 : Joining of tables

Lets consider a scenario to retrieve the details of student who belong to 'CSE' department. We
have to join two tables based on the common column present in the two tables.

Figure 2.18 : Mapping data


Result: After joining two tables:

Table 2.16

2.4. Summary

The Entity-Relationship model (or ER model) is a way of graphically representing the


logical relationships of objects in order to create a database.

An ER model is a graphical representation which contains entities or "items",


relationships among them and attributes of the entities and the relationships.

The database design technique which is used to organize tables in a manner that
reduces redundancy and dependency of data is called as Normalization.

There are three forms of normalization. They are First Normal form(1NF),Second
Normal form(2NF) and Third Normal form(3NF).

A key is a value used to uniquely identify a row in a table. One or more columns
could be used to form a key for a table.

A primary key is a single column value used to identify a database record uniquely.

A composite key is a primary key derived by combining multiple columns and is used
to identify a record uniquely.

The field in a table which matches the primary key column of another table is called
as foreign key. The cross-reference tables can be achieved by foreign key.

First Normal Form-The multi-valued attributes (called repeating groups) should be


removed i.e. elimination of repeating groups. All attributes must be atomic.

Second Normal Form- Partial functional dependencies must be removed. The


attributes that are not a part of the key should be dependent on the entire key for that
entity.

Third normal Form- States that all column reference in referenced data that are not
dependent on the primary key(transitive dependency) should be removed.

Join is a means of combining fields from two tables by using values common to both.
It allows to combine data from more than one table into a single result set.

ADDITIONAL DATA

SQL JOIN
An SQL JOIN clause is used to combine rows from two or more tables, based on a common
field between them.
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN
return all rows from multiple tables where the join condition is met.
Let's look at a selection from the "Orders" table:
OrderID

CustomerID

OrderDate

10308

1996-09-18

10309

37

1996-09-19

10310

77

1996-09-20

Then, have a look at a selection from the "Customers" table:


CustomerI
D

CustomerName

ContactName

Country

Alfreds Futterkiste

Maria Anders

Germany

Ana Trujillo Emparedados y helados

Ana Trujillo

Mexico

Antonio Moreno Taquera

Antonio Moreno

Mexico

Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the "CustomerID"
column.
Then, if we run the following SQL statement (that contains an INNER JOIN):

Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Try it yourself

it will produce something like this:


OrderID

CustomerName

OrderDate

10308

Ana Trujillo Emparedados y helados

9/18/1996

10365

Antonio Moreno Taquera

11/27/1996

10383

Around the Horn

12/16/1996

10355

Around the Horn

11/15/1996

10278

Berglunds snabbkp

8/12/1996

Different SQL JOINs


Before we continue with examples, we will list the types of the different SQL JOINs you can
use:

INNER JOIN: Returns all rows when there is at least one match in BOTH
tables

LEFT JOIN: Return all rows from the left table, and the matched rows from
the right table

RIGHT JOIN: Return all rows from the right table, and the matched rows
from the left table

FULL JOIN: Return all rows when there is a match in ONE of the tables