Sie sind auf Seite 1von 17

A File can store records and we can extract these records using different applications

programs.

 The simplest data retrieval task from file require extensive programming. Also this is a
time consuming and a high skill activity.
 To access the data in file the programmer must aware of the physical structure of the
file.
 Security features such as effective password protection, locking parts of file etc are
very difficult to program.
 The File system exhibits structural dependence. That is a change in file structure such
as addition or deletion of a field require the modification of all programs using that file.
 Data dependence: A change in file data characteristic such as change in a field data
type from integer to decimal, requires changes in all programs that access the file.

A typical file processing system is supported by conventional operating systems. The system
stores permanent record in various files. It uses various application programs to extract
records from, and add records to the appropriate files. Before using DBMS to store and
retrieve data, organizations stored information in file processing systems.

But as the number of files in the system expands, system administration becomes difficult
too. Each file must have its own file management system, composed of programs that allow
user to create the file structure, add data to the file, delete data from the file, modify the
data in the file, list the file contents etc.

Even a simple file processing system containing 25 files requires 5 * 25 =125 file
management programs. Each department in the organization owns its data by creating its
own files. So the number of files can multiply rapidly.

Security features such as effective password protection, locking out part of files or part of
system itself and other data confidentiality measures are difficult to program and are
usually omitted.

The file system’s structure and lack of security makes it difficult to pool data. The same
basic data is stored in different locations. But it is very unlikely that that data stored in
different locations will always be updated consistently, hence maintaining different versions
of same data. The file processing system is simply not suitable for modern data
management and information requirement.

Disadvantages of File Processing System


The conventional file processing system suffers from the following
shortcomings.
Data Redundancy
Data Redundancy means same information is duplicated in several files. This
makes data redundancy.

Data Inconsistency
Data Inconsistency means different copies of the same data are not
matching. That means different versions of same basic data are existing.
This occurs as the result of update operations that are not updating the
same data stored at different places.

Example: Address Information of a customer is recorded differently


in different files.

Difficulty in Accessing Data


It is not easy to retrieve information using a conventional file processing
system. Convenient and efficient information retrieval is almost impossible
using conventional file processing system.

Data Isolation
Data are scattered in various files, and the files may be in different format,
writing new application program to retrieve data is difficult.

Integrity Problems
The data values may need to satisfy some integrity constraints. For
example the balance field Value must be grater than 5000. We have to
handle this through program code in file processing systems. But in
database we can declare the integrity constraints along with definition itself.

Atomicity Problem
It is difficult to ensure atomicity in file processing system.For example
transferring $100 from Account A to account B.If a failure occurs during
execution there could be situation like $100 is deducted from Account A and
not credited in Account B.

Concurrent Access anomalies


If multiple users are updating the same data simultaneously it will result in
inconsistent data state. In file processing system it is very difficult to handle
this using program code. This results in concurrent access anomalies.

Security Problems
Enforcing Security Constraints in file processing system is very difficult as
the application programs are added to the system in an ad-hoc manner.
Observations and Conclusions

Data Redundancy may leads to Data inconsistency , if redundant data are


not updated simultaneously. Data inconsistency leads the system into an
inconsistent state, since the operations based on inconsistent data results in to more
inconsistency.

The term database system refers to an organization of components that define and regulate
the collection, storage, management, and use of data with in a database environment. In a
high level view the database system is composed of the following five major parts.

 Hardware
 Software
 People
 Procedures
 Data

Hardware Components in a Database System Environment

Hardware identifies all the system's physical devices. It includes computers, computer
peripherals, network components etc.

Software Components in a Database System Environment

Software refers to the collection of programs used with in the database system. It includes
the operating system, DBMS Software, and application programs and utilities.

Operating System
DBMS Software
Application Programs and Utilities

The operating System manages all the hardware components and makes it possible for all
other software to run on the computers. UNIX, LINUX, Microsoft Windows etc are the
popular operating systems used in database environment.

DBMS software manages the database with in the database system. Oracle Corporation's
ORACLE, IBM's DB2, Sun's MYSQL, Microsoft's MS Access and SQL Server etc are the
popular DBMS (RDBMS) software used in the database environment.

Application programs and utilities software are used to access and manipulate the data in
the database and to manage the operating environment of the database.

People in a Database System Environment


People component includes all users associated with the database system. On the basis of
primary job function we can identify five types of users in a database system: System
Administrators, Database Administrators, Data Modelers,System Analysts and Programmers
and End Users.

System Administrators
Data Modelers
Database Administrators
System Analysts and Programmers
End Users

System Administrators oversees the database system's general operations.

Data Modelers (Architect) prepare the conceptualdesign from the requirement.ER model
represent the conceptual design of an OLTP application.

Database Administrator (DBA) physically implements the database according to the logical
design. The DBA performs the physical implementation and maintenance of a database
system.

System Analysts and programmers design and implements the application programs. They
create the input screens, reports, and procedures through which end users access and
manipulate the database.

End Users are the people who use the application. For example incase of a banking system,
the employees, customer using ATM or online banking facility are end users.

Procedures in a Database Environment

Procedures are the instructions and business rules that govern the design and use of the
database system.

Data in the Database

Data are the very important basic entity in a database. It is the collection of facts stored in
the database.

DBMS Functions:

The Database Management System performs the following functions.

 Data Dictionary Management


 Data Storage Management
 Data Transformation and Presentation
 Security Management
 Multi User Access Control
 Backup and Recovery Management
 Data Integrity Management
 Database Access Languages and Application Interface
 Database Communication Interface

Data Dictionary Management

The data dictionary stores the definitions of data elements and their relationships.This
information is termed as metadata.The metadata includes definition of data, data types,
relationship between data, integrity constraints etc. Any changes made in a database
structure are automatically reflected in the data dictionary. In short the DBMS provides data
abstraction and it removes structural and data dependency from the system.

Data Storage Management

The DBMS creates the complex structures required for data storage. The users are freed
from defining,programming and implementing the complex physical data characteristics.

Data Transformation and Presentation

DBMS supports data independence.Hence the DBMS translate logical request into
commands that physically locate and retrieve the requested data. The DBMS formats the
physically retrieved data according to the logical data format specifications.

Security Management

The DBMS creates a security system that enforces user security and data privacy within the
database. Security rules determine the access rights of the users. Read/write access is
given to the user is specified using access rights.

Multiuser Access Control


The DBMS ensures that multiple users can access the database concurrently without
compromising the integrity of the database. Hence the database ensures data integrity and
data consistency.

Backup and Recovery Management


The DBMS provide backup and data recovery procedures to ensure data safety and
integrity. DBMS system provide special utilities which allow the DBA to perform routine and
special backup and restore procedures. Recovery Management deals with the recovery of
the database after a failure.

Data Integrity Management


The DBMS promotes and enforce integrity rules to eliminate data integrity problems, thus
minimizing the data redundancy and maximizing data consistency.
Database Access Languages and Application Interface
The DBMS provides data access via query language. A query language is a non-
procedural language, that is the user only need to specify what must be done without
specifying how it is to be done. The DBMS's query language contains two components: a
data definition language(DDL) and a data manipulation language(DML). The DBMS also
provide data access to programmers via programming languages.

Database Communication Interfaces


Different users may access the database through a network environment.So the DBMS
provide communication functions to access the database through computer network
environment.

Data Independence

A major purpose of a database system is to provide the users with an abstract view of data.
To hide the complexity from users database apply different levels of abstraction. The
following are different levels of abstraction.

Physical Level
Logical Level
View Level

Physical Level
Physical Level is the lowest level of abstraction and it defines the storage structure.The
physical level describes complex low level data structures in detail.The database system
hides many of the lowest level storage details from the database programmers. Database
Administrators may be aware of certain details of physical organization of data.

Logical Level
This is the next higher level of abstraction which describe what data are stored in database,
relation between data, types of data etc . Database programmers, DBA etc knows the
logical structure of data

View Level
This the highest level of abstraction. It provides different view to different users. At the view
level users see a set of application programs that hide details of data types. The details such
as data type etc are not available at this level. Only view or Access is given to a part of data
according to the users access right

Physical Data Independence


The changes in Physical Level does not affect or visible at the logical level. This is called
physical data independence.

Logical Data Independence


The changes in the logical level do not affect the view level. This is called logical data
independence.
Database Languages

Data Definition Language (DDL)


Data Manipulation Language (DML)

Data Definition Language (DDL)


We specify the database schema (Data fields , Data Types , Constraints) by a set of
definition expressed by a language called DDL.
Example: CREATE TABLE student(SNAME CHAR(10), ROLLNO CHAR(10)).
This DDL statement creates a table student with fields SNAME and ROLLNO.This information
(meta data) is stored in the data dictionary.

Data Manipulation Language (DML)


Data Manipulation Language (DML) is used for data manipulation.

Data manipulation is
Retrieval of Information Stored in Database
Insertion of Information to the database
Deletion of information from the database
Updating of information stored in the databas

Types of Data Manipulation Language (DML)


Procedural DML
Declarative DML (Non Procedural DML)

Procedural DML:
In procedural Data manipulation language user has to specify what data are needed and
how to get it.

Declarative DML (Non Procedural DML):


In declarative Data manipulation language user has to specify what data are needed without
specifying how to get it.
Example : SQL (Structured Query Languages)

Database System Structure

A database System is divided into modules based on their function. The functional
components of a database system can be broadly divided into the storage manager and the
query processor components.

 Storage Manager
 Query Processor

Storage Manager
The storage manager is important because database typically require a large amount of
storage space.So it is very important efficient use of storage, and to minimize
the movement of data to and from disk .

A storage manager is a program module that provides the interface between the low-level
data stored in the database and the application programs and the queries submitted to the
system. The Storage manager is responsible for the interaction with the file manager. The
Storage manager translates the various DML statements into low level file system
commands. Thus the storage manager is responsible for storing, retrieving, and updating
data in the database.The storage manager components include the following.

Authorization and Integrity Manager


Transaction Manger
File Manager
Buffer Manger

Authorization and Integrity Manger tests for the satisfaction of integrity constraints and
checks the authority of users to access data. Transaction manager ensures that the
database remains in a consistent state and allowing concurrent transactions to proceed
without conflicting.The file manager manages the allocation of space on disk storage and
the data structures used to represent information stored on disk. The Buffer manager is
responsible for fetching the data from from disk storage into main memory and deciding
what data to cache in main memory.

The storage manager implements the following data structures as part of the physical
system implementation.Data File, Data Dictionary, Indices.Data files stores the database
itself. The Data dictionary stores meta data about the structure of database, in particular
the schema of the database. Indices provide fast access to data items.

The Query Processor

The Query Processor simplifies and facilitates access to data. The Query processor include
the following component.

DDL Interpreter
DML Compiler
Query Evaluation Engine

The DDL interpreter interprets DDL statements and record the definition in the data
dictionary. The DML compiler translate DML statements in a query language into an
evaluation plan consisting of low-level instructions that the query evaluation engine
understands. The DML compiler also performs query optimization, that is it picks the lowest
cost evaluation plan from among the alternatives. Query evaluation engine executes low
level instructions generated by the DML compiler.

Functions of a Database Administrator

One of the main reasons for using DBMS is to have a central control of both data and the
programs accessing those data. A person who has such control over the system is called a
Database Administrator(DBA). The following are the functions of a Database administrator

 Schema Definition
 Storage structure and access method definition
 Schema and physical organization modification.
 Granting authorization for data access.
 Routine Maintenance

Schema Definition
The Database Administrator creates the database schema by executing DDL
statements. Schema includes the logical structure of database table(Relation) like data
types of attributes,length of attributes,integrity constraints etc.

Storage structure and access method definition


Database tables or indexes are stored in the following ways: Flat files,Heaps,B+ Tree
etc..

Schema and physical organization modification


The DBA carries out changes to the existing schema and physical organization.

Granting authorization for data access


The DBA provides different access rights to the users according to their level. Ordinary users
might have highly restricted access to data, while you go up in the hierarchy to the
administrator ,you will get more access rights.

Routine Maintenance
Some of the routine maintenance activities of a DBA is given below.

Taking backup of database periodically


Ensuring enough disk space is available all the time.
Monitoring jobs running on the database.
Ensure that performance is not degraded by some expensive task submitted by some
users.
Performance Tuning
Entity Relationship Model (E-R Model)

The process of database design is an iterative process. The ER Model is used at the
conceptual design stage of the database design.ER diagram is used to represent this
conceptual design. The requirement analysis is modeled in this conceptual design. The ER
model is very expressive so that people can easily understand the requirement. The data
modeler prepares the ER diagram and is verified with the functional domain experts to
ensure that all the requirements are properly incorporated in the conceptual design. The
process is repeated until the end users and designers agree that the E-R diagram is a fair
representation of the requirement. We can easily map an ER diagram to a relational
schema.

The basic constructs of ER Model are Entity, Attributes and Relationships. An Entity is an
object that exist in the real world and is distinguishable from other entities.

 Conceptual Design
 ER Diagram
 Entities
 Attributes
 Relationships

Entities

An entity is a thing or object in the real world that is distinguishable from all other objects.
For example 'Person' in an organization is an entity. An entity has a set of properties. At
the E-R modeling level an entity actually refers to an entity set. In other words, entity in ER
model corresponds to a table.

 Entity
 Entity Set

An entity may be concrete such as a person, book etc or may be abstract such as
account,loan etc. The ER model refers to a specific table row as an entity instance or entity
occurrence. Collection of similar entities (Entity Set) often corresponds to a table. Each
entity set has a key.All entities in an entity set have the same set of attributes. Thus entity
set is a set of entities of the same type that share the same properties or attributes. An
entity is represented by a rectangle containing the entity name, which is a noun usually
written in capital letters.

Attributes
An entity is represented by a set of attributes. It corresponds to a field in a table.For each
attribute there is a set of permitted values called the domain or value set of the attribute.
Attributes are represented by ovals and are connected to the entity with a line. Each oval
contains the name of the attribute it represents.

Relationships

A relationship describes an association among entities. In ER model the association among


entities is described as one-to-one, one-to-many, many-to-many. Relationships are
represented by a diamond connected to the related entities. The relationship name (an
active or passive verb), is written inside the diamond.

Attribute Types

In Entity Relationship(ER) Model attributes can be classified into the following types.

 Simple and Composite Attribute


 Single Valued and Multi Valued attribute
 Stored and Derived Attributes
 Complex Attribute

Simple and Composite Attribute

Simple attribute consists of a single atomic value. A simple attribute cannot be subdivided.
For example the attributes age, sex etc are simple attributes.

A composite attribute is an attribute that can be further subdivided. For example the
attribute ADDRESS can be subdivided into street, city, state, and zip code.

Simple Attribute: Attribute that consist of a single atomic value.


Example: Salary, age etc

Composite Attribute : Attribute value not atomic.


Example : Address : ‘House_no:City:State
Name : ‘First Name: Middle Name: Last Name’

Single Valued and Multi Valued attribute

A single valued attribute can have only a single value. For example a person can have only
one 'date of birth', 'age' etc. That is a single valued attributes can have only single value.
But it can be simple or composite attribute.That is 'date of birth' is a composite attribute ,
'age' is a simple attribute. But both are single valued attributes.

Multivalued attributes can have multiple values. For instance a person may have multiple
phone numbers,multiple degrees etc.Multivalued attributes are shown by a double line
connecting to the entity in the ER diagram.
Single Valued Attribute: Attribute that hold a single value
Example1: Age
Exampe2: City
Example3:Customer id

Multi Valued Attribute: Attribute that hold multiple values.


Example1: A customer can have multiple phone numbers, email id's etc
Example2: A person may have several college degrees

Stored and Derived Attributes

The value for the derived attribute is derived from the stored attribute. For example 'Date of
birth' of a person is a stored attribute. The value for the attribute 'AGE' can be derived by
subtracting the 'Date of Birth'(DOB) from the current date. Stored attribute supplies a value
to the related attribute.

Stored Attribute: An attribute that supplies a value to the related attribute.


Example: Date of Birth

Derived Attribute: An attribute that’s value is derived from a stored attribute.


Example : age, and it’s value is derived from the stored attribute Date of Birth.

Complex Attribute

A complex attribute that is both composite and multi valued.

KEYS in Database: Primary Key, Candidate Key, Super


Key
Super Keys : Super key stands for superset of a key.
A Super Key is a set of one or more attributes that are taken collectively and can identify
all other attributes uniquely.

Candidate Keys
Candidate Keys are super keys for which no proper subset is a super key. In other words
candidate keys are minimal super keys.

Primary Key:
It is a candidate key that is chosen by the database designer to identify entities with in an
entity set. Primary key is the minimal super keys. In the ER diagram primary key is
represented by underlining the primary key attribute. Ideally a primary key is composed of
only a single attribute. But it is possible to have a primary key composed of more than one
attribute.

Composite Key
Composite key consists of more than one attributes.

Example: Consider a Relation or Table R1. Let A,B,C,D,E are the attributes of this relation.
R(A,B,C,D,E)
A→BCDE This means the attribute 'A' uniquely determines the other attributes B,C,D,E.
BC→ADE This means the attributes 'BC' jointly determines all the other attributes A,D,E in
the relation.

Primary Key :A
Candidate Keys :A, BC
Super Keys : A,BC,ABC,AD

ABC,AD are not Candidate Keys since both are not minimal super keys.

Entity Types

The Entiy Relationship (ER) model consists of different types of entities. The existence of an
entity may depends on the existence of one or more other entities, such an entity is said to
be existence dependent.Entities whose existence not depending on any other entities is
termed as not existence dependent.

Entities based on their characteristics are classified as follows.

 Strong Entities
 Weak Entities
 Recursive Entities
 Composite Entities

Strong Entity Vs Weak Entity


An entity set that has a primary key is termed as strong entity set. An entity set that does
not have sufficient attributes to form a primary key is termed as a weak entity set.

A weak entity is existence dependent. That is the existence of a weak entity depends on the
existence of an identifying entity set.

The discriminator (or partial key) is used to identify other attributes of a weak entity set.

The primary key of a weak entity set is formed by primary key of identifying entity set and
the discriminator of weak entity set.

The existence of a weak entity is indicated by a double rectangle in the ER diagram.

We underline the discriminator of a weak entity set with a dashed line in the ER diagram.

Recursive Entity
A recursive entity is one in which a relation can exist between occurrences of the same
entity set. This occurs in a unary relationship.

Composite Entities
If a Many to Many relationship exist we must create a bridge entity to convert it into 1 to
Many. Bridge entity composed of the primary keys of each of the entities to be connected.
The bridge entity is known as a composite entity. A composite entity is represented by a
diamond shape with in a rectangle in an ER Diagram.

ER Diagram Symbols

Entity Relationship Diagram(ER Diagram) is used to represent the requirement analysis


at the conceptual design stage .The database is designed from the ER Diagram or we can
say that the ER digram is converted to the database.

Each entity in the ER diagram corresponds to a table in the database.

The attributes of an entity corresponds to fields of a tables.

The ER Diagram is converted to the database.

Entity Reletionship Diagram (ER Diagram) Symbols


Normalization

Normalization is the process of decomposing a relation(table) based on functional


dependency and primary key.

 Un-Normalized Form
 First Normal Form (1 NF)
 Second Normal Form (2 NF)
 Third Normal Form (3 NF)
 Boyce – Codd Normal Form (BCNF)
 Fourth Normal From (4 NF)
 Fifth Normal Form (5 NF)
Un-Normalized Form
Un-Normalized relation contain non atomic values.Each row may contain multiple set of
values for some of the columns.These multiple values in a single row are called non atomic
value.

First Normal Form


A Relation is said to be in 1NF if the values in the domain of each attribute of relation are
atomic.Each cell of the table must have single value.No two rows in a table may be
identical.

Second Normal Form


A relation R is said to be in 2NF if it is in 1NF and there should not be any partial
dependency. Here all the non key attributes are dependent on the key alone. No attribute
is depend upon a part of the key. Any relation having a key with single attribute is in 2NF.

Third Normal Form


A relation R is in 3NF if it is in 2NF and has no transitive dependency.Here all the non-key
attributes are depend on the key alone.There should not be any dependency among the
non-key attributes.

Boyce – Codd Normal Form BCNF


A relation R is in BCNF if every determinant is a candidate key.

Problem with BCNF: Given a relation R , Functional Dependency F, BCNF may or may not
preserve all given functional dependencies.
Fourth Normal From
A Relation is in 4NF if it is in BCNF and has no multi valued dependency.

Fifth Normal Form


It deals with join dependency. A relation R is in 5NF if it has no join dependency.
Loss less Join Dependency : When we join the decomposed relation then we must get the
original relation without any loss.

Transaction

The term transaction refers to a collection of operation that form a single logical unit of
work. A transaction T is a logical unit of database processing that includes one or more
database access operation.

Transaction Properties: ACID Properties

 Atomicity
 Consistency
 Isolation
 Durability

Atomicity
A transaction must be atomic. Ie it ensure that either all operations of the transactions are
reflected properly in the database or none should

Consistency
If the database is in a consistent state before the execution of the transaction, the database
remains consistent after the execution of the transaction.

Example: Transaction T1 transfers $100 from Account A to Account B. Both Account A and
Account B contains $500 each before the transaction.

Transaction T1
Read (A)
A=A-100
Write (A)
Read (B)
B=B+10

Consistency Constraint
Before Transaction execution Sum = A + B
Sum = 500 + 500
Sum = 1000

After Transaction execution Sum = A + B


Sum = 400 + 600
Sum = 1000
Before the execution of transaction and after the execution of transaction SUM must be
equal.

Isolation
When multiple transactions are executing concurrently, then each transaction is unaware of
other transactions executing concurrently in the system. Ie the execution of one transaction
must not interfere with another.

Durability
Changes applied to a database by a committed transaction must be made permanent even
if the system fails.

Components Ensuring Transaction properties

 Query Processor
 Transaction Manager
 Recovery Manager

The Query Processor handles DML (Data Manipulation) operations.Transaction Manager


handles Isolation property of transaction.Recovery Manager handles Atomicity, and
Durability.

No components of DBMS ensure consistency property of transaction.It is the responsibility


of the programmer to ensure consistency through program code.

Das könnte Ihnen auch gefallen