Sie sind auf Seite 1von 48

# What is RDBMS? Explain its features.

RDBMS is a database management system based on relational model defined by E.F.Codd. Data is stored in the
form of rows and columns. The relations among tables are also stored in the form of the table. RDBMS is used to
manage relational database. RDBMS is a collection of organized set of tables from which data can be accessed
easily. It consists of number of table and each table has its own primary key.
Features:
- Provides data to be stored in tables
- Persists data in the form of rows and columns
- Provides facility primary key, to uniquely identify the rows
- Creates indexes for quicker data retrieval
- Provides a virtual table creation in which sensitive data can be stored and simplified query can be applied.(views)
- Sharing a common column in two or more tables(primary key and foreign key)
- Provides multi user accessibility that can be controlled by individual users

Keys:
The key is defined as the column or attribute of the database table. For example if a table has id, name and address
as the column names then each one is known as the key for that table. We can also say that the table has 3 keys as
id, name and address. The keys are also used to identify each record in the database table. The following are the
various types of keys available in the DBMS system.

1. Candidate key
A Candidate key is a set of one or more attributes that can uniquely identify a row in a given table
Let's explain candidate key in detail using an example. Consider a table employee as given below

(Assume that employee_id and email are unique)


In this example, attribute Employee_ID can uniquely identify a row of the table employee. Hence Employee_ID is a
candidate key.Similary the attribute Email can also uniquely identify a row of the table employee. Hence Email is
also a candidate key.
Hence for the table employee, we have two candidate keys - Employee_ID and Email.

2. Super key
Any superset of a candidate Key is a super key.
Let's explain super key in detail using the same employee table mentioned above. Here we can have many superkeys
like {Employee_ID,FirstName}, {Employee_ID,LastName}, {Employee_ID,FirstName,LastName},
{Email,FirstName} etc because any superset of a candidate key is a superkey

3. Primary key
The Database designer will choose one key from the list of available candidate Keys to uniquely identify row in the
given table.This key is known as primary key.
To select primary key from all candidate keys,

Preference is given to numeric column(s)

Preference is given to single attribute

Preference is given to minimal composite key

Let's explain primary key in detail using the same employee table mentioned above. Here we have two candidate
keys - Employee_ID and Email. Remember our selection criteria for primary keys. Usually peference is given to
numeric column(s). Hence we can select Employee_ID as the primary key of the table "employee".

4. Foreign key
A Foreign Key is a set of attribute (s) whose values are required to match values of a Candidate key in the same or
another table

Let's explain Foreign key in detail using an example. Consider an employee table and a department table as given

below:
Here we can say that Department_ID of employee table is a foreign key refering Department_Number which is a
candidate key of the department table.
A table which has a foreign Key referring to its own candidate key is known as self-referencing table.
The constraint that values of a given foreign key must match the values of the corresponding candidate key is
known as referential constraint.

5. Non-Key Attributes
The attributes other than the Candidate Key attributes in a table/relation are called Non-Key attributes. In other
words, The attributes which do not participate in any of the Candidate keys are called Non-Key attributes.

6. Composite Key
A composite key is a key that contains more than one attribute.
# Explain Normalization with proper example.
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic
approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated
data from the relation tables. Normalization is the process of database designing which make sure that there will
not be any redundant data.
Normalization is used for mainly two purposes:

Eliminating redundant (useless) data.


Ensuring data dependencies make sense i.e. data is logically stored.

The normalization rules are divided into following form.


1.
2.
3.
4.

First Normal Form


Second Normal Form
Third Normal Form
BCNF

First Normal Form (1NF):


As per First Normal Form, no two rows of data must contain repeating group of information i.e. each set of column
must have a unique value, such that multiple columns cant be used to fetch the same row. Each table should be
organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary Key is usually a single column, but sometimes more than one column can be combined to create a
single primary key. For example, consider a table which is not in First normal form.
Student Table:
Student
Ram
Shyam
Hari

Age
15
14
17

Subject
Biology, Maths
Maths
Maths

In the First Normal Form, any row must not have a column in which more than one value is saved, like separated
with commas. Rather than that, we must separate such data into multiple rows.
Student Table Following 1NF will be:
Student
Ram
Ram
Shyam
Hari

Age
15
15
14
17

Subject
Biology
Maths
Maths
Maths

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple
rows but each row as a whole will be unique.
Second Normal Form (2NF):
As per the Second Normal Form, there must not be any partial dependency of any column on primary key. It means
that for a table that has concatenated primary key, each column in the table that is not part of the primary key must
depend upon the entire concatenated key for its existence, If any column depends only on one part of the
concatenated key, then the table fails Second Normal Form.
In example of First Normal Form there are two rows for Ram, to include multiple subjects that he has opted for.
While this is searchable, and follows First Normal Form, it is an inefficient use of space. Also in the above table in
First Normal Form, while the candidate is {student, Subject}, Age of student only depends on student column
which is incorrect as per Second Normal Form. To achieve Second Normal Form, it would be helpful to split out the
subjects into an independent table, and match them up using the student names as foreign keys.
New Student Table following 2NF will be:
Student

Age

Ram
Shyam
Hari

15
14
17

In student Table, the candidate key will be Student column, because all other column i.e. Age is dependent on it.
New Student Table introduced for 2NF will be:
Student
Ram
Ram
Shyam
Hari

Subject
Biology
Maths
Maths
Maths

In Subject Table, the candidate key will be {Student, Subject} column. Now, both the above tables qualifies for
Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in
which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form
is there.
Third Normal Form (3NF):
Third Normal Form applies that every non-prime attribute of table must be dependent on primary key. The
transitive functional dependency should be removed from the table. The table must be in Second Normal Form.
For example, consider a table with following fields.
Student_Detail Table:
Student_id
Student_name
DOB
Street
City
State
Zip
In the above Table, Student_id is primary key, but street, city and state depends upon Zip. The dependency
between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city
and state to a new table, with Zip as primary key.
New Student_Detail Table:
Student_id

Student_name

DOB

Zip

Address Table:
Zip

Street

City

State

The advantage of removing transitive dependency is,

Amount of data duplication is reduced.


Data integrity achieved.

Boyce and Codd Normal Form (BCNF):


Boyce and Codd Normal Form is a higher version of the Third Normal Form. This form deals with certain type of
anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate key is said to
be in BCNF

Explain ACID properties in database.


The ACID model is one of the oldest and most important concepts of database theory. It sets forward four goals that
every database management system must strive to achieve: atomicity, consistency, isolation and durability. No
database that fails to meet any of these four goals can be considered reliable.

Lets take a moment to examine each one of these characteristics in detail:

Atomicity states that database modifications must follow an all or nothing rule. Each transaction is said
to be atomic. If one part of the transaction fails, the entire transaction fails. It is critical that the database
management system maintain the atomic nature of transactions in spite of any DBMS, operating system or
hardware failure.

Consistency states that only valid data will be written to the database. If, for some reason, a transaction is
executed that violates the databases consistency rules, the entire transaction will be rolled back and the database
will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it
will take the database from one state that is consistent with the rules to another state that is also consistent with
the rules.

Isolation requires that multiple transactions occurring at the same time not impact each others execution.
For example, if Joe issues a transaction against a database at the same time that Mary issues a differen t
transaction, both transactions should operate on the database in an isolated manner. The database should either
perform Joes entire transaction before executing Marys or vice-versa. This prevents Joes transaction from

reading intermediate data produced as a side effect of part of Marys transaction that will not eventually be
committed to the database. Note that the isolation property does not ensure which transaction will execute first,
merely that they will not interfere with each other.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured
through the use of database backups and transaction logs that facilitate the restoration of committed transactions
in spite of any subsequent software or hardware failures.
Take a few minutes to review these characteristics and commit them to memory. If you spend any significant
portion of your career working with databases, youll see them again and again. They provide the basic building
blocks of any database transaction model.

Data processing
What are the steps the may occure in systematic data processing?
Any data processing system may use all or a subset of the activities given below
1.

Collection of Data

2.

Recording of Data

3.

Sorting of Data

4.

Data Classification

5.

Calculation

6.

Retrieval of data

7.

Summarizing

8.

Communicating

What are the major data processing models?


1.

Batch processing:

In this model, transactions are collected in a group and processed together.


2.

On-line (interactive) processing:

In this model, transactions are processed as and when they appear.


3.

Real-time processing:

It is a parallel time relationship with on-going activity and the current activity is controlled by the
information produced.

Traditional Approach for Data Storage and the Need of DBMS

Traditional Data Storage Model

1.
2.

In traditional approach, information is stored in flat files which are maintained by the file system
under the operating systems control.
Application programs go through the file system in order to access these flat files
How data is stored in flat files

Data is stored in flat files as records.

Records consist of various fields which are delimited by a space, comma, pipe, any special
character etc.

End of records and end of files will be marked using any predetermined character set or special
characters in order to identify them
Example: Storing employee data in flat files

Problems with traditional approach for storing data

1.

Data Security
The data stored in the flat file(s) can be easily accessible and hence it is not secure.
Example: Consider an online banking application where we store the account related information of all
customers in flat files. A customer will have access only to his account related details. However from a flat
file, it is difficult to put such constraints. It is a big security issue.

2.

Data Redundancy
In this storage model, the same information may get duplicated in two or more files. This may lead to to
higher storage and access cost. it also may lead to data inconsistency.
For Example, assume the same data is repeated in two or more files. If a change is made to data stored in
one file, other files also needs to be change accordingly.
Example: Assume employee details such as firstname, lastname, emailid are stored in employee_details
file and employee_salary file. If a change needs to be made to emailid, both employee_details file and
emplyee_salary file need to be updated otherwise it will lead to inconsistent data.
However, it is possible to design file systems with minimal redundancy. Also note that Data redundancy is
sometimes preferred.
Example: Assume employee details such as firstname, lastname, emailid are stored only in
employee_details file and not in employee_salary file. If we need to access an employee salary along with
firstname of the employee, we have to retrieve details from two files. This would mean an increased
overhead.

3.

Data Isolation
Data Isolation means that all the related data is not available in one file. Usually the data is scattered in
various files having different formats. Hence writing new application programs to retrieve the appropriate
data is difficult.

4.

Program/Data Dependence
In traditional file approach, application programs are closely dependent on the files in which data is
stored. If we make any changes in the physical format of the file(s), like addition of a data field , etc, all
application programs needs to be changed accordingly. Consequently, for each of the application
programs that a programmer writes or maintains, the programmer must be concerned with data
management. There is no centralized execution of the data management functions. Data management is
scattered among all the application programs.
Example: Consider the banking system. An employee_salary file exists which has details about the salary
of employees. An employee_salary record is described by
employee_id
firstname
lastname
salary_amount
An application program is available to display all the details about the salary of all employees. Assume a
new data field, the date_of_joining is added to the employee_salary file. Since the application program
depends on the file, it also needs to be altered.
If the physical format of the employee_salary file for example the field delimiter, record delimiter, etc. are
changed, it necessitates that the application program which depends on it, also be altered.

5.

Lack of Flexibility
The traditional systems are able to retrieve information for predetermined requests for data. If we need
unanticipated data, huge programming effort is needed to make the information available, provided the
information is there in the files. By the time the information is made available, it may no longer be
required or useful.
Example : Consider a software application which is able to generate employee salary report. Assume that
all the data is stored in flat files. Suppose we now have a requirement to retrieve all the employee details
whose salary is greater than Rs.10000. It is not easy to generate such on-demand reports and lot of time is
needed for application developers to modify the application to meet such requirements.

6.

Concurrent Access Anomalies

Many traditional systems allow multiple users to access and update the same piece of data
simultaneously. However this concurrent updates may result in inconsistent data. To guard against this
possibility, the system must maintain some form of supervision. But supervision is difficult because data
may be accessed by many different application programs and these application programs may not have
been coordinated previously.
Example : Consider a personal information system which has the data of all employees. Now there may be
an employee updating his address details in the system and at the same time, an administrator may be
taking a report containing the data of all employees. This is called concurrent access. Since the employee's
address is being updated at the same time, there is a possibility of the administrator reading an incorrect
address.

Introduction to Database Technology and DBMS

What is Database?
Database is a computer based record keeping system which is used to record ,maintain and retrieve data.
It is an organized collection of interrelated (persistent) data.

What is Database Management System (DBMS)?


A Database Management System (DBMS) is a collection of interrelated files and set of programs which
allows users to access and modify files. It provides a convenient and efficient way to store, retrieve and
modify information. Application programs request DBMS to retrieve, modify/insert/delete data for them
and thus it acts as a layer of abstraction between the application programs and the file system.

DBMS acts as a layer of abstraction on top of the File system.

For interacting with the DBMS we use a Query language called Structured Query Language (SQL)
General Block Diagram

Difference Between File based Data Storage System and DBMS

Types of Databases

1. Centralized Database
In Centralized database system, all data is stored at a single site. It offers a great control in accessing and
updating data. However failure chances are high because the system depends on the availability of
resources at the central site
Example: Think about a banking application which uses centralized database. In this case the data is
stored in a common place. Applications running in various banks may communicate to the common
database over network to access or insert/update/delete information.

2. Distributed Database
In Distributed Database system, the database is stored on several computers. Computers in a distributed
system may communicate with one another through internet/intranet/telephone lines etc. Most of the
distributed systems will be geographically separated and managed. Distributed databases can also
separately

be

administered

Example: Think about a banking application which uses distributed database. Bank's head office may be
in India where as branch offices may be in United States and United Kingdom. In this case the bank
database can be distributed across the branch offices and head office whereas the individual offices are
connected through a network.

Services provided by a DBMS


1.

Data management

2.

Data definition

3.

Transaction support

4.

Concurrency control

5.

Recovery

6.

Security and integrity

7.

Facilities to import and export data

8.

user management

9.

backup

10.

performance analysis

11.

logging

12.

audit

13.

physical storage control


Three layer Architecture

Detailed System Architecture

The external view is how the use views it.

The Conceptual view is how the DBA views it.

The Internal view is how the data is actually stored.

An example of the three levels

Users of a DBMS
1. Database Administrator (DBA)
DBA takes care of the administrative tasks of DBMS as the name suggests and his major responsibilities are given
below.

Management of information

Liaison with users

Enforcing security and integrity rules

Database backup and recovery

Monitoring database performance


2. Database designers
Database designers design the database components
3. Application programmers
Application programmers write programs to access/insert/update/delete data from/to database by making use of
the various database components.
4. End users
End users use DBMS

Advantages of a DBMS
1.

Data independence

2.

Reduced data redundancy

3.

Increased security

4.

Better flexibility

5.

Effective data sharing

6.

Enforces integrity constraints

7.

Enables backup and recovery

Conceptual Database Design - Entity Relationship(ER) Modeling


Database Design Techniques
1.

ER Modeling (Top down Approach)

2.

Normalization (Bottom Up approach)


Let's see ER Modeling in detail.
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementation
We need to be familiar with the following terms to go further.
Entity

Anything that has an independent existence and about which we collect data. It is also known as entity
type.
In ER modeling, notation for entity is given below.

Entity instance
Entity instance is a particular member of the entity type.
Example for entity instance : A particular employee
Regular Entity
An entity which has its own key attribute is a regular entity.
Example for regular entity : Employee.
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its own is a
weak entity.
Example for a weak entity : In a parent/child relationship, a parent is considered as a strong entity and
the child is a weak entity.
In ER modeling, notation for weak entity is given below.

Attributes
Properties/characteristics which describe entities are called attributes.
In ER modeling, notation for attribute is given below.

Domain of Attributes

The set of possible values that an attribute can take is called the domain of the attribute. For example, the
attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this set can be termed
as the domain of the attribute day.
Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is called key
attribute.
E.g the employee_id of an employee, pan_card_number of a person etc.If the key attribute consists of two
or more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.

Simple attribute
If an attribute cannot be divided into simpler components, it is a simple attribute.
Example for simple attribute : employee_id of an employee.
Composite attribute
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into First_name,
Middle_name, and Last_name.
Single valued Attributes
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
example for single valued attribute : age of a student. It can take only one value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute. Multivalued
example for multi valued attribute : telephone number of an employee, a particular employee may have
multiple telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.

Stored Attribute
An attribute which need to be stored permanently is a stored attribute
Example for stored attribute : name of a student
Derived Attribute
An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute : age of employee which can be calculated from date of birth and current
date.
In ER modeling, notation for derived attribute is given below.

Relationships
Associations between entities are called relationships
Example : An employee works for an organization. Here "works for" is a relation between the entities
employee and organization.
In ER modeling, notation for relationship is given below.

However in ER Modeling, To connect a weak Entity with others, you should use a weak relationship
notation as given below

Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the general form
for degree n. Special cases are unary, binary, and ternary ,where the degree is 1, 2, and 3, respectively.
Example for unary relationship : An employee ia a manager of another employee
Example for binary relationship : An employee works-for department.

Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have four
possible connectivities as given below.

One to one (1:1) relationship

One to many (1:N) relationship

Many to one (M:1) relationship

Many to many (M:N) relationship


The minimum and maximum values of this connectivity is called the cardinality of the relationship
Example for Cardinality One-to-One (1:1)
Employee is assigned with a parking space.

One employee is assigned with only one parking space and one parking space is assigned to only one
employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1)
In ER modeling, this can be mentioned using notations as given below

Example for Cardinality One-to-Many (1:N)


Organization has employees

One organization can have many employees , but one employee works in only one organization. Hence it
is a 1:N relationship and cardinality is One-To-Many (1:N)
In ER modeling, this can be mentioned using notations as given below

Example for Cardinality Many-to-One (M :1)


It is the reverse of the One to Many relationship. employee works in organization

One employee works in only one organization But one organization can have many employees. Hence it is
a M:1 relationship and cardinality is Many-to-One (M :1)
In ER modeling, this can be mentioned using notations as given below.

Cardinality Many-to-Many (M:N)


Students enrolls for courses

One student can enroll for many courses and one course can be enrolled by many students. Hence it is a
M:N relationship and cardinality is Many-to-Many (M:N)
In ER modeling, this can be mentioned using notations as given below

Relationship Participation
1. Total
In total participation, every entity instance will be connected through the relationship to another instance
of the other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of the
department. In other words, only few instances of employee entity participate in the above relationship.
So employee entitys participation is partial in the said relationship.
However each department will be headed by some employee. So department entitys participation is total
in the said relationship.

Advantages and Disadvantages of ER Modeling ( Merits and Demerits of ER Modeling )


Advantages
1.

ER Modeling is simple and easily understandable. It is represented in business users language and
it can be understood by non-technical specialist.

2.

Intuitive and helps in Physical Database creation.

3.

Can be generalized and specialized based on needs.

4.

Can help in database design.

5.

Gives a higher level description of the system.


Disadvantages

1.

Physical design derived from E-R Model may have some amount of ambiguities or inconsistency.

2.

Sometime diagrams may lead to misinterpretations


Entity Relationship (ER) Modeling - Learn With a Complete Example
Prerequisite :Basic knowledge about ER Modeling. It is recommened to read the previous topic if you have
not done so before proceeding further.
Here we are going to design an Entity Relationship (ER) model for a college database . Say we have the
following statements.

1.

A college contains many departments

2.

Each department can offer any number of courses

3.

Many instructors can work in a department

4.

An instructor can work only in one department

5.

For each department there is a Head

6.

An instructor can be head of only one department

7.

Each instructor can take any number of courses

8.

A course can be taken by only one instructor

9.

A student can enroll for any number of courses

10.

Each course can have any number of students


Good to go. Let's start our design.(Remember our previous topic and the notations we have used for
entities, attributes, relations etc )
Step 1 : Identify the Entities
What are the entities here?
From the statements given, the entities are

1.

Department

2.

Course

3.

Instructor

4.

Student
Stem 2 : Identify the relationships

1.

One department offers many courses. But one particular course can be offered by only one
department. hence the cardinality between department and course is One to Many (1:N)

2.

One department has multiple instructors . But instructor belongs to only one department. Hence
the cardinality between department and instructor is One to Many (1:N)

3.

One department has only one head and one head can be the head of only one department. Hence
the cardinality is one to one. (1:1)

4.

One course can be enrolled by many students and one student can enroll for many courses. Hence
the cardinality between course and student is Many to Many (M:N)

5.

One course is taught by only one instructor. But one instructor teaches many courses. Hence the
cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes

"Departmen_Name" can identify a department uniquely. Hence Department_Name is the key


attribute for the Entity "Department".

Course_ID is the key attribute for "Course" Entity.

Student_ID is the key attribute for "Student" Entity.

Instructor_ID is the key attribute for "Instructor" Entity.


Step 4: Identify other relevant attributes

For the department entity, other attributes are location

For course entity, other attributes are course_name,duration

For instructor entity, other attributes are first_name, last_name, phone

For student entity, first_name, last_name, phone


Step 4: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.

ER Model : Basic Concepts


Entity relationship model defines the conceptual view of database. It works around real world entity and association among them. At view
level, ER model is considered well for designing databases.

Entity
A real-world thing either animate or inanimate that can be easily identifiable and distinguishable. For example, in a school database, student,
teachers, class and course offered can be considered as entities. All entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with attribute sharing similar values. For example,
Students set may contain all the student of a school; likewise Teachers set may contain all the teachers of school from all faculties. Entities
sets need not to be disjoint.

Attributes
Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have
name, class, age as attributes.
There exist a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has
to be alphabetic. A student's age cannot be negative, etc.

TYPES OF ATTRIBUTES:

Simple attribute:
Simple attributes are atomic values, which cannot be divided further. For example, student's phone-number is an atomic value of 10 digits.

Composite attribute:

Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first_name and
last_name.
Derived attribute:

Derived attributes are attributes, which do not exist physical in the database, but there values are derived from other attributes presented in
the database. For example, average_salary in a department should be saved in database instead it can be derived. For another example,
age can be derived from data_of_birth.
Single-valued attribute:

Single valued attributes contain on single value. For example: Social_Security_Number.


Multi-value attribute:

Multi-value attribute may contain more than one values. For example, a person can have more than one phone numbers, email_addresses
etc.
These attribute types can come together in a way like:

simple single-valued attributes

simple multi-valued attributes

composite single-valued attributes

composite multi-valued attributes

ENTITY-SET AND KEYS


Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, roll_number of a student makes her/him identifiable among students.

Super Key: Set of attributes (one or more) that collectively identifies an entity in an entity set.

Candidate Key: Minimal super key is called candidate key that is, supers keys for which no proper subset are a superkey. An entity
set may have more than one candidate key.

Primary Key: This is one of the candidate key chosen by the database designer to uniquely identify the entity set.

Relationship
The association among entities is called relationship. For example, employee entity has relation works_at with department. Another example
is for student who enrolls in some course. Here, Works_at and Enrolls are called relationship.

RELATIONSHIP SET:

Relationship of similar type is called relationship set. Like entities, a relationship too can have attributes. These attributes are called
descriptive attributes.

DEGREE OF RELATIONSHIP
The number of participating entities in an relationship defines the degree of the relationship.

Binary = degree 2

Ternary = degree 3

n-ary = degree

MAPPING CARDINALITIES:
Cardinality defines the number of entities in one entity set which can be associated to the number of entities of other set via relationship set.

One-to-one: one entity from entity set A can be associated with at most one entity of entity set B and vice versa.

[Image: One-to-one relation]

One-to-many: One entity from entity set A can be associated with more than one entities of entity set B but from entity set B one
entity can be associated with at most one entity.

[Image: One-to-many relation]

Many-to-one: More than one entities from entity set A can be associated with at most one entity of entity set B but one entity from
entity set B can be associated with more than one entity from entity set A.

[Image: Many-to-one relation]

Many-to-many: one entity from A can be associated with more than one entity from B and vice versa.

[Image: Many-to-many relation]

Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and relationship among them. While formulating real-world scenario
into database model, ER Model creates entity set, relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of database.
ER Model is based on:

Entities and their attributes

Relationships among entities


These concepts are explained below.

[Image: ER Model]

Entity
An entity in ER Model is real world entity, which has some properties called attributes. Every attribute is defined by its set of values,
called domain.
For example, in a school database, a student is considered as an entity. Student has various attributes like name, age and class etc.

Relationship
The logical association among entities is called relationship. Relationships are mapped with entities in various ways. Mapping cardinalities
define the number of association between two entities.

Mapping cardinalities:
o

one to one

one to many

many to one

many to many
ER-Model is explained here.

Relational Model
The most popular data model in DBMS is Relational Model. It is more scientific model then others. This model is based on first-order
predicate logic and defines table as an n-ary relation.

[Image: Table in relational Model]


The main highlights of this model are:

Data is stored in tables called relations.

Relations can be normalized.

In normalized relations, values saved are atomic values.

Each row in relation contains unique value

Each column in relation contains values from a same domain.


.

ER Diagram Representation

Now we shall learn how ER Model is represented by means of ER diagram. Every object like entity, attributes of an entity, relationship set,
and attributes of relationship set can be represented by tools of ER diagram.

Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.

[Image: Entities in a school database]

Attributes
Attributes are properties of entities. Attributes are represented by means of eclipses. Every eclipse represents one attribute and is directly
connected to its entity (rectangle).

[Image: Simple Attributes]

If the attributes are composite, they are further divided in a tree like structure. Every node is then connected to its attribute. That is
composite attributes are represented by eclipses that are connected with an eclipse.

[Image: Composite Attributes]

Multivalued attributes are depicted by double eclipse.

[Image: Multivalued Attributes]

Derived attributes are depicted by dashed eclipse.

[Image: Derived Attributes]

Relationship
Relationships are represented by diamond shaped box. Name of the relationship is written in the diamond-box. All entities (rectangles),
participating in relationship, are connected to it by a line.

BINARY RELATIONSHIP AND CARDINALITY


A relationship where two entities are participating, is called a binary relationship. Cardinality is the number of instance of an entity from a
relation that can be associated with the relation.

One-to-one
When only one instance of entity is associated with the relationship, it is marked as '1'. This image below reflects that only 1 instance of each
entity should be associated with the relationship. It depicts one-to-one relationship

[Image: One-to-one]

One-to-many
When more than one instance of entity is associated with the relationship, it is marked as 'N'. This image below reflects that only 1 instance of
entity on the left and more than one instance of entity on the right can be associated with the relationship. It depicts one-to-many relationship

[Image: One-to-many]

Many-to-one
When more than one instance of entity is associated with the relationship, it is marked as 'N'. This image below reflects that more than one
instance of entity on the left and only one instance of entity on the right can be associated with the relationship. It depicts many-to-one
relationship

[Image: Many-to-one]

Many-to-many
This image below reflects that more than one instance of entity on the left and more than one instance of entity on the right can be associated
with the relationship. It depicts many-to-many relationship

[Image: Many-to-many]

PARTICIPATION CONSTRAINTS

Total Participation: Each entity in the entity is involved in the relationship. Total participation is represented by double lines.

Partial participation: Not all entities are involved in the relationship. Partial participation is represented by single line.

[Image: Participation Constraints]

Generalization Aggregation
ER Model has the power of expressing database entities in conceptual hierarchical manner such that, as the hierarchical goes up it
generalize the view of entities and as we go deep in the hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized view. For example, a
particular student named, Mira can be generalized along with all the students, the entity shall be student, and further a student is person. The
reverse is called specialization where a person is student, and that student is Mira.

Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all the generalized entities
is called Generalization. In generalization, a number of entities are brought together into one generalized entity based on their similar
characteristics. For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.

[Image: Generalization]

Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In specialization, a group of entities is divided into subgroups based on their characteristics. Take a group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee, employer, customer or vendor based on
what role do they play in company.

[Image: Specialization]
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on what role do they play in school as entities.

Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented programming. This makes it easier for the
programmer to concentrate on what she is programming. Details of entities are generally hidden from the user, this process known as
abstraction.

One of the important features of Generalization and Specialization, is inheritance, that is, the attributes of higher-level entities are inherited by
the lower level entities.

[Image: Inheritance]
For example, attributes of a person like name, age, and gender can be inherited by lower level entities like student and teacher etc.

Relation Data Model


Relational data model is the primary data model, which is used widely around the world for data storage and processing. This model is simple
and have all the properties and capabilities required to process data with storage efficiency.

Concepts
Tables: In relation data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows
and columns, where rows represent records and columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation instance. Relation instances do not have
duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation (table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute domain.

Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity Constraints.
There are three main integrity constraints.

Key Constraints

Domain constraints

Referential integrity constraints

KEY CONSTRAINTS:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of attributes is
called key for that relation. If there are more than one such minimal subsets, these are called candidate keys.
Key constraints forces that:

in a relation with a key attribute, no two tuples can have identical value for key attributes.

key attribute can not have NULL values.


Key constrains are also referred to as Entity Constraints.

DOMAIN CONSTRAINTS
Attributes have specific values in real-world scenario. For example, age can only be positive integer. The same constraints has been tried to
employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For example, age can not be less than zero
and telephone number can not be a outside 0-9.

REFERENTIAL INTEGRITY CONSTRAINTS


This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can be referred in other relation, where it is
called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or same relation, that key element must exists.

Relational Algebra
Relational database systems are expected to be equipped by a query language that can assist its user to query the database instances. This
way its user empowers itself and can populate the results as required. There are two kinds of query languages, relational algebra and
relational calculus.

Relational algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of relations as output. It
uses operators to perform queries. An operator can be either unary or binary. They accept relations as their input and yields relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are also considered relations.
Fundamental operations of Relational algebra:

Select

Project

Union

Set different

Cartesian product

Rename
These are defined briefly as follows:

Select Operation ()
Selects tuples that satisfy the given predicate from a relation.
Notation p(r)
Where p stands for selection predicate and r stands for relation. p is prepositional logic formulae which may use connectors like and, or and
not. These terms may use relational operators like: =, , , < , >, .
For example:
subject="database"(Books)
Output : Selects tuples from books where subject is 'database'.
subject="database"

and price="450"(Books)

Output : Selects tuples from books where subject is 'database' and 'price' is 450.
subject="database"

and price < "450" or year > "2010"(Books)

Output : Selects tuples from books where subject is 'database' and 'price' is 450 or the publication year is greater than 2010, that is published
after 2010.

Project Operation ()
Projects column(s) that satisfy given predicate.
Notation: A1, A2, An (r)
Where a1, a2 , an are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
for example:

subject,

author

(Books)

Selects and projects columns named as subject and author from relation Books.

Union Operation ()
Union operation performs binary union between two given relations and is defined as:
r s = { t | t r or t s}
Notion: r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold:

r, s must have same number of attributes.

Attribute domains must be compatible.


Duplicate tuples are automatically eliminated.

author

(Books)

author

(Articles)

Output : Projects the name of author who has either written a book or an article or both.

Set Difference ( )
The result of set difference operation is tuples which present in one relation but are not in the second relation.
Notation: r s
Finds all tuples that are present in r but not s.

author

(Books)

author

(Articles)

Output: Results the name of authors who has written books but not articles.

Cartesian Product ()
Combines information of two different relations into one.
Notation: r s
Where r and s are relations and there output will be defined as:
r s = { q t | q r and t s}

author = 'tutorialspoint'(Books

Articles)

Output : yields a relation as result which shows all books and articles written by tutorialspoint.

Rename operation ( )
Results of relational algebra are also relations but without any name. The rename operation allows us to rename the output relation. rename
operation is denoted with small greek letter rho
Notation: x (E)
Where the result of expression E is saved with name of x.
Additional operations are:

Set intersection

Assignment

Natural join

Relational Calculus
In contrast with Relational Algebra, Relational Calculus is non-procedural query language, that is, it tells what to do but never explains the
way, how to do it.
Relational calculus exists in two forms:

Tuple relational calculus (TRC)


Filtering variable ranges over tuples
Notation: { T | Condition }
Returns all tuples T that satisfies condition.
For Example:
{ T.name |

Author(T) AND T.article = 'database' }

Output: returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified also. We can use Existential ( )and Universal Quantifiers ( ).
For example:
{ R| T

Authors(T.article='database' AND R.name=T.name)}

Output : the query will yield the same result as the previous one.

Domain relational calculus (DRC)


In DRC the filtering variable uses domain of attributes instead of entire tuple values (as done in TRC, mentioned above).
Notation:
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example:
{< article, page, subject > |

TutorialsPoint subject = 'database'}

Output: Yields Article, Page and Subject from relation TutorialsPoint where Subject is database.
Just like TRC, DRC also can be written using existential and universal quantifiers. DRC also involves relational operators.
Expression power of Tuple relation calculus and Domain relation calculus is equivalent to Relational Algebra.

E-R Diagram of Airline Booking System

E-R Diagram of Payroll Management System

E-R Diagram of School Management System

E-R Diagram of Banking System

E-R Diagram of Bus Reservation System

E-R Diagram of Hotel Reservation System

Functional Dependency

Functional Dependency is the starting point for the process of normalization. Functional dependency exists when a
relationship between two attributes allows you to uniquely determine the corresponding attributes value. If X is known,
and as a result you are able to uniquely identify Y, there is functional dependency. Combined with keys, normal forms
are defined for relations.
Examples
Bear Number determines Student Name:
BearNum ---> StuName
Department Number and Job Rank determine Security Clearance:
(DeptNum, JRank) --->SecClear
Social Security Number determines Employee Name and Salary:
SSN ---> (EmpName, Salary)
Additionally, the above can be read as:
SSN --->EmpName and SSN Salary
Armstrongs Axioms
William W. Armstrong established a set of rules which can be sued to infer the functional dependencies in a relational
database (from umbc.edu - no external linking, Google Database Design UMBC):

Reflexivity rule:If A is a set of attributes, and B is a set of attributes that are completely contained in A,
the A implies B.

Augmentation rule: If A implies B, and C is a set of attributes, then if A implies B, then AC implies BC.

Transitivity rule: If A implies B and B implies C, then A implies C.

These can be simplified if we also use:

Union rule:If A implies B and A implies C, the A implies BC.

Decomposition rule:If A implies BC then A implies B and A implies C.

Pseudotransitivity rule:If A implies B and CB implies D, then AC implies D.

Normalization
Normalization, as previously mentioned, makes use of functional dependencies that exist in relations and the primary key
or candidate keys when analyzing tables. Multivalued Dependencies are also part of the normalization process, at levels
higher than Third Normal Form.

+What are the moral dimensions of Information system? Explain.

The five (5) moral dimensions of the information age are:


1) Rights and obligations of information: What are the rights of individuals and corporations
about information about themselves? What are the legal means to protect it? And what are the
obligations
Privacy

are
is

for

the

that

right

information.

of

individuals

These
to

rights

be

left

include:
in

peace.

Technology and information systems threaten the privacy of individuals to make cheap, efficient
and

effective

invasion.

Due process requires the existence of a set of rules or laws that clearly define how we treat
information about individuals and that appeal mechanisms available.
2) Property rights: how to move the classical concepts of patent and intellectual property in
digital technology? What are these rights and how to protect? Information technology has
hindered the protection of property because it is very easy to copy or distribute computer
information

networks.

Intellectual

property

is

subject

to

various

protections

under

three

patents:
Trade secrets: Any intellectual work product used for business purposes may be classified as
secret.
Copyright: It is a concession granted by law to protect creators of intellectual property against
copying

by

others

for

any

purpose

for

period

of

28

years.

Patents: A patent gives the holder, for 17 years, an exclusive monopoly on the ideas on which an
invention.
3) Responsibility and control: Who is responsible and who controls the use and abuse of
information from the People. The new information technologies are challenging existing laws
regarding liability and social practices, to force individuals and institutions accountable for their
actions.
4) Quality systems: What data standards, information processing programs should be required
to

ensure

the

protection

of

individual

rights

and

society?

It

can

hold

individuals

and

organizations for avoidable and foreseeable consequences if their obligation is to see and
correct.
5) Quality of life: What values should be preserved and protected in a society based on
information and knowledge? What institutions should protect and which should be protected? The
negative social costs of introducing information technologies and systems are growing along with
the power of technology. Computers and information technologies can destroy valuable elements
of culture and society, while providing benefits.
These five dimensions represent very good guideline considerations, ethical questions and
answers should be a company when introducing a new technology.

Five Moral Dimensions


The five moral dimensions of the information age.
1. Information rights and obligations- What information rights do people have?
2. Property rights and obligations- How will traditional intellectual property rights be protected in a digital society in
which tracing and accounting for ownership are difficult and ignoring such property rights is so easy?
3. Accountability and control- Who will end up being accountable?
4. System quality- What standards of data should we protect?
5. Quality of life- What values should be preserved in an information?

Das könnte Ihnen auch gefallen