Sie sind auf Seite 1von 101

TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE

THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

1. INTRODUCTION

A Database Management System (DBMS) is a set of computer programs that


controls the creation, maintenance, and the use of the database of an
organization and its end users. It allows organizations to place control of
organization-wide database development in the hands of database
administrators (DBAs) and other specialists. DBMSes may use any of a variety of
database models, such as the network model or relational model. In large
systems, a DBMS allows users and other software to store and retrieve data in a
structured way. It helps to specify the logical organization for a database and
access and use the information within a database. It provides facilities for
controlling data access, enforcing data integrity, managing concurrency
controlled, and restoring database.

The first DBMS appeared during the 1960's at a time in human history where
projects of momentous scale were being contemplated, planned and engineered.
Never before had such large datasets been assembled in this new technology.
Problems on the floor were identified and solutions were researched and
developed - often in real-time.

The DBMS became necessary because the data was far more volatile than had
earlier been planned, and because there were still major limiting factors in the
costs associated with data storage media. Data grew as a collection, and it also
needed to be managed at a detailed transaction by transaction level. In the
1980's all the major vendors of hardware systems large enough to support the
evolving needs of evolving computerized record keeping systems of larger
organizations, bundled some form of DBMS with their system solution.

The first DBMS species were thus very much vendor specific. IBM as usual led
the field, but there were a growing number of competitors and clones whose
database solutions offered varying entry points into the bandwagon of
computerized record keeping systems.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 1
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

1.1. DBMS Definitions


Some of the technical terms of DBMS are defined as below:

1.1.1. Database

A database is a logically coherent collection of data with some inherent meaning,


representing some aspect of real world and which is designed, built and
populated with data for a specific purpose. Ex: consider the name, telephone
number, and addresses.
You can record this data in an indexed address book. For maintain database we
generally use such software DBASE IV, Ms-Access or Excel

1.1.2. DBMS

It is a collection of programs that enables user to create and maintain a

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 2
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
database. In other words it is general-purpose software that provides the users
with the processes of defining, constructing and manipulating the database for
various applications.

1.1.3. Database system

The database and DBMS software together is called as Database system.

1.2. Components of database

1.2.1. Database administrator (DBA)


In many organizations where many persons use the same resources, there is a
need for a chief administrator to manage these resources.
In a database environment, the primary resource is the database itself and the
secondary resource is the DBMS and the related software. To manage these
resources, we need the database administrator.
DBA is responsible for authorizing access to the database and for acquiring S/W
and H/W resource as needed.

1.2.2. Database designer


They are responsible for identifying the data to be stored in the database and for
choosing appropriate structure to represent and store this data. The responsibility
of the database designer is to communicate with the database user and to
understand their requirement.

1.2.3. End users


These are the persons whose jobs requires to access to the database for
querying, updating and generating the reports. The databases generally exist for
their use.

There are several categories of end users:

A. Casual end users: who occasionally access the database but they need
different information each time.
B. Parametric end user: make up a sizable portion of the database end user
their main job function involves constantly querying and updating the
database. By using standard types of queries and updates called canned
transaction tat have been carefully programmed and tested. Such as bank
tellers' checks accounts balances, withdraws and deposits.
C. Sophisticated end users: includes engineers, scientist, and business
analyst who toughly familiarize with the facilities of the DBMS so as to
implement their application to meet the complex requirement.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 3
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
D. Stand alone end users: maintains personal database by using
readymade software that provide easy to use menu or graphical based
interface. Ex: tax packages that store a variety of personal financial data
for tax purpose.
E. System analyst and application programmer: System analyst
determines the requirement of the end users, especially parametric end
users and develops specification for the canned transaction to meet their
requirement. Application programmer implements these specifications as
programs then they test, debug document and maintain these canned
transaction. These programmers are known as software engineer.

1.3. Advantages of DBMS


1. Controlling redundancy
2. Restricting unauthorized access
3. Providing persistent storage for program object and data structure
4. Database interfacing
5. Providing multiple user interface
6. Presenting complex relationship among data
7. Enforcing integrity constraints
8. Providing backup and recovery

1.4. Disadvantage in File Processing System


1. Data redundancy & inconsistency.
2. Difficult in accessing data.
3. Data isolation.
4. Data integrity.
5. Concurrent access is not possible.
6. Security Problems.

2. DATA MODELS
Data model is a set of concepts that can be used to describe the structure of the
data base.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 4
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
By the structure of the database as data type, relationship and constraints that
should hold for the data. Most of the data items also include a set of basic
operations for specifying the modification on the data.

2.1. Categories of data models


A. High level or conceptual data model: that describe how the user will
use the database. High-level data model uses concepts such as entities,
attributes and relationship.
B. Entity: represents real world objects such as employee or project that is
stored in the database.
C. Attribute: represents some property of interest that further describes the
entity such as employee name or salary.
D. Relationship: it represents the relationship between two or more entity.
Low level or physical data model: that describe how the data is stored in
the computer.
E. Representational or implementation data model: it hides some of the
details of data storage but can be implemented on a computer system in a
direct way.

2.2. Schemas and instances


The description of the database is called the database schema. The database
schema is specified during the database design. The displayed schema is called
a schema diagram and is not change frequently.

The actual data in the database may change frequently. In a data base changes
occur every time. We add a new student or entry a new grade for a student. The
data in the database at the particular moment of time is called the database state
or instance or snapshot.

2.3. DBMS architecture


Three important characteristics of the database
1. Insulation of program and data
2. Support of multiple user view
3. Use of catalogue to store the database schema

The architecture of the database system is called as three- schema architecture


1. Internal schema
2. Conceptual schema

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 5
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
3. External schema

1. Internal schema: it describes the physical storage structure of the


database. The internal schema uses a physical data model and describes
the complete details of data storage and access path for the database.
2. Conceptual schema: it describes the structure of a whole database for a
community of users. The conceptual schema hides the details of physical
storage structure. High-level data model or an implementation data model
can be used at this level.
3. External schema: it describes the part of the database that a particular
user group is interested in and hides the rest of the database from that
user group.

2.4. Data independence


Three schema architecture can be used to explain the concepts of data
independence which can be defines the capacity to change the schema at one
level of the database system without change the schema at the next higher level.

There are two types of data independence:

2.4.1. Logical data independence

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 6
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
This is the capacity to change the conceptual schema without having to change
external schema or application programs. We can change the conceptual
schema to expand the database or to reduce the database.

2.4.2. Physical data independence


This is the capacity to change the internal schema without having to change the
conceptual or external schema. Changes to the external schema may be needed
because some physical files have to be reorganized. Ex: by creating additional
access structure to improve the performance of retrieval or updates.

2.5. Classification of database management system


We can categorize the DBMS as follows:

1. Relational data model


2. Network data model
3. Hierarchal data model
4. Object oriented data model

2.5.1. Relational data model


Relational data model represents a database as a collection of tables where
each table is stored as a separate file. Most relational database has high level
query language and support a limited form of users view.

2.5.2. Network data model


Represent data as a record type and also represent a limited type of 1:N
relationship, called a set of types.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 7
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

2.5.3. Hierarchal data model


It represents data as hierarchal tree structure. Each hierarchy represents a
number of related records. There is no standard language for hierarchal model.

2.5.4. Object oriented data model


It define a database in term of objects their properties and their operations.
Objects with the same structure and behavior belong to a class and classes are
organized into a hierarchy and cyclic graph.

2.6. Database languages and interfaces

2.6.1. DBMS languages


The first thing is to specify conceptual and internal schema for the database and
any mapping between two. In many DBMS where no strict separation of levels is
maintained one language called the data definition language (DDL) is used by
the DBA and data base designer to define both schemas.
In DBMS, there is a DDL compiler, whose function is to process DDL statements
in order to identify the description of the schema constructs and to store the
schema description in the DBMS catalog. Where the clear separation of
• Conceptual schema
• Internal schema

A. Then DDL is used to specify conceptual schema only.


B. SDL (storage definition language) is used to specify internal
schema only.

Mapping between two levels is specifying by the any of the two languages. In
some DBMS VDL (view definition language) is used to specify the users view
and their mapping to the conceptual schema. But in most DBMS, DDL is used to

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 8
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
specify both conceptual and external schema. Once the database schema is
created and database is filled with data. Users must have to manipulate the
database. Manipulations include:

• Retrieval
• Insertion
• Deletion
• Modification

For that purpose DBMS provides DML (database manipulation language).

2.6.1.1. DML database manipulation language

There are two main type of DML’s:


1. High-level or nonprocedural DML( SQL)
2. Low level or procedural DML

1. High-level or nonprocedural DML: can be used to specify complex


database operations. Many DBMS allows high-level DML statement either
to be entered interactively from a terminal or to be embedded in a general
purpose programming language. DML statement must be identified within
the program so that they can be extracted by a pre-compiler and
processed by the DBMS. High-level DML such as SQL can be specify and
retrieve many records in a single DML statement and hence are called
set-at-a-time or set-oriented DML’s.

2. Low level or procedural DML: must be embedded in a general purpose


programming language. This type of DML typically retrieves individual
records or objects from the database and processes each separately.
Hence it needs to use programming language, such as looping, to retrieve
and process each record from a set of records. Low-level DML are also
called record-at-a-time DML because of this property. Whenever DML
commands, high/low level are embedded in a general purpose
programming language that language is called the host language and the
DML is called the data sub language. On the other hand, high level DML
used in a stand-alone interactive manner is called a query language.

2.6.2. DBMS interfaces

User friendly interfaces provided by a DBMS may include the following:

Menu based interfaces: these interfaces present the user within list of options,
called menus, which lead the user through the formulation of a request. The

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 9
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
query is composed step-by-step by picking option from a menu that is displayed
by the system.

Forms based interface: a form-based interface display a form to each users.


Users can file out all of the form entries to insert new data or they file only certain
entries. Forms are actually designed and programmed for parametric end users.

Graphical user interface: GUI displays a schema to the user. User can then
specify a query by manipulating the diagram. Most GUI uses a pointing device as
mouse to pick up the certain part of the displayed schema.

Natural language interface: natural language interface refers to the world in its
schema as well as a set of standard word to interpret the request. If the
interpretation is successful, the interface generate a high level query
corresponding to the natural language request and submit it to the DBMS for
processing.

Interfaces for parametric users: parametric users, such as bank teller, often
have a small set of operations that they must perform repeatedly. System analyst
and programmer designed and implement a special interface for parametric user.
They generate keys by which that command automatically runs.

Interfaces for the DBA: the DBA staff uses these interfaces. These commands
are for creating accounts, setting system parameters, granting account
authorization, changing a schema and reorganizing the storage structure of a
database.

2.7. Database system environment


The database and the DBMS catalog are usually stored on the disk. Access to
the disk is controlled primarily by the operating system, which schedules disk
input/ outputs.

2.7.1. Data manager:


Modules of the DBMS controls:
A. Access to the DBMS information i.e. Stored on the disk.
B. It uses some basic OS services for carrying out low level data transfer
between the disk and computer main storage.
C. Handling buffers in the main memory.

2.7.2. DDL compiler


It processes schema definition specified in the DDL. The stored description of the
schemas in the DBMS catalog DBMS catalog: includes the following information
• Name of the files
• Data items

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 10
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
• Storage details of each file
• Mapping information

2.7.3. Run-time database processor


It handles database accesses. It receives retrieval or updates operations and
carries them to the database. Access to the disk goes through the stored data
manager.

2.7.4. Query compiler


Handles high level queries that are entered interactively and then generates calls
to the run time processors for executing the codes.

2.7.5. Pre-compiler
Extracts DML commands from an application program written in a host language.
Then commands send to the DML compiler for compilation of object code.

2.8. Entity Relationship Model


For designing a successful database application there are two terms that play
major role in the designing of database application:

• Database application
• Application program

Database application: refers to a particular database (bank database) and


associate program implements the queries and updates.

Example: program that’s implements database updates corresponding to


customers. Making deposits and withdraws these program provides user friendly
graphical interfaces (GUI’s) utilizing forms and the testing of these application
program.

2.8.1. Entities and attributes

Entities: the basic object that the ER model represents is an entity. The entity
may be an object with a physical existence –a particular person, car, house or
employee or it may be an object with conceptual existence – a company, a job or
a universally course.

Attribute: a particular property that describes the entity.


Ex: entity –employee may be describe by the employee’s name, age, address,
salary and job.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 11
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Composite attribute: composite attribute can be divided into the subparts which
represents more basic attributes with independent meaning.

Simple or atomic attribute: Attributes that are not divisible are called simple or
atomic attribute.

Single valued attributes: most attribute have a single value for a particular
entity, such attribute are called single valued attribute. Ex: (age) single valued
attribute for person.

Multi valued attributes: the attributes, which may have more than one value.
Colors attributes of a car. Car with one color have a single value where cars may
have multiple values. Such attributes are called multi-valued attributes.

Stored attributes: in some cases two attributes values are related. Ex: age and
birth date of person. The value of an age can be determined by the current data
and the value of the person’s birth date. The age attribute is called the derived
attribute and the birth date is called the stored attribute.

Null values: in some cases a particular entity may not have appropriate value for
an attribute. Ex: apartment number

Complex attribute: we represent composite attribute between parenthesis () and


separating the components by commas. Multi valued attributes by { }. Such
attributes are called complex attribute.
{Address phone ({phone (area code, phone number)})}

2.8.2. Entity types, entity sets, keys and values sets


Entity types: an entity types defined a collection ( or sets ) of entities that have
the same attributes. Each entity type in the database is described by its name
and attributes.

Entity sets: the collection of all entities of a particular entity type in the database
at any point in time is called an entity sets.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 12
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Key attributes: an entity type usually have key attribute whose values are
distinct for each individual entity in the collection. Such an attribute is called the
key attribute.
Values sets (domain of attribute) each simple attribute of an entity type is
associated with a value set (or domain of value), which specify the set of values
that may be assigned to that attribute for each individual entity.
Ex: employee
Age specify in the range 16 to 70.

2.8.3. Relationship types, sets and instances


An association among entities is called a relationship. Relationship type R among
n entity types E1, E2……………..En defines a set of associations. In another
word, the attribute set R is a set of relationship instances.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 13
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Degree of relationship type: The degree of relationship type is the number of


participated entry types.
Ex: work for relationship is of degree two.

Degree two- binary relationship


Degree three - ternary relationship

Role name: each entity type that participates in a relationship type plays a
particular role in relationship. The role name specify the role that a particular
entity from the entity play in each relationship instances and helps to explain
what the relationship means.

Recursive relationship: Role name is not important where all the participating
entity type is distinct, since each entity type name can be used as the role name.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 14
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
In some cases, some entity type participates in more than one in a relationship
type in different roles. In such cases role name becomes essential for
distinguishes the meaning of each participation. Such relationship types are
called recursive relationships.

Employee and supervisor entities are the member of the same employee entity
types.

Weak entity type: The entity types that do not have key attribute are called weak
entity type. Weak entity type some times called the child entity type.

Regular/ strong entity type: that have key attribute are called the regular or
strong entity type. Identifying entity type is also some time called the parent entity
type or dominant entity type.

2.8.4. Notations for ER diagram

Symbols Meaning

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 15
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 16
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

2.8.5. Generalization
We think of a reverse process of abstraction in which we suppress the
differences among several entity type, identifying their common features and
generalize them into a single super class.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 17
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 18
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

2.8.6. Aggregation
Aggregation is an abstraction concept for building composite object from their
component objects. There are calls where this concept can be used and related
to EER module.
• Where we aggregate attribute value of an object to form the whole object.
• When we represent an aggregate relationship as an ordinary relationship.
• Combining objects that are related by a particular relation instances.

3. RELETIONAL MODEL
The relational model represents the database as a collection of relations.
Relation is thought of as a table of values, each row in the table represents a
collection of related data values.
In relational model, each row in the table corresponds to entity or relationship. In
a relational model concept, a row is called a tuples, columns are called as
attributes, and the table is called a relation.

The data type describing the type of values that can appear in each column is
called a domain.

Domain:
The domain D is a set of atomic values. Atomic means that each value in the
domain is indivisible.
USA_phone_number- 10 .digits

Relation schemas:
R is denoted as R(A1,A2,A3.......An)
R is the relation name
Ai attributes for I=1,2,3,.....n

Student (name, SSN, home phone, address, office phone, age)

3.1. Characteristics of relation


1. Ordering of tuple in relation: a relation is defined as a set of tuple. Tuples
in a relation do not have any specific meaning.
2. Ordering of values within a type: n-type is an ordered list of n- values, so
ordering of value in a type. Attributes values are with in types of order.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 19
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
3. Values in the tuples: each value in a tuple is a atomic value. I.e. it is not
divisible into its components. In a relational model concepts composite
and multi valued attributes are not allowed.
4. Interpretation of relation: the relation schema can be interpreted as a
declared or type of assertion.

Relational constraints: in this relational constraints we will study about the


restrictions apply on the database schema. These includes

Domain constraints: it specifies that the value of each attribute must be atomic
value.
Key constraints: a relation is defined as a set of tuples. All elements of sets are
distinct. Hence all tuples in the relation must be distinct. No two tuple can have
the same combination of all their attribute values.
Entity integrity constraints: no primary key value can be null, because it is
used to identify the individual tuples n a relation.
Referential integrity constraints: it is specified between two relations and is
used to maintain the consistency among tuples of the two relations. It is based on
the foreign key concepts.

3.2. Operations of the relation model


Operations on the relational model can be categorized into retrieval and updates.
There are three basic updates operations on relations.
Insert operation: it provides a list of attributes for a new tuple t that can be
inserted into a relation R.

Delete operation: it is used to delete a tuple from a relation if the tuple is being
deleted as referenced by the foreign key from other tuple in the database. We
use condition to delete the tuple.

Ex: delete from employee


Where SSN=. 985676;

Update operation: is used to change the value of one or more attribute in a tuple
of relation R.

Ex: update employee


Set age=.25.
Where SSN=.576787;

3.3. Relational algebra operation

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 20
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
1. Select operation: is used to select the subset of the tuples from a relation that
specify a selection condition or it is used to select some of the row from a
relation.

2. Project operation: it is used to select some of the column (set of attribute)


from a relation.

3. Rename operation: which is used to rename either relation name or attribute


names or both.

Rename (old table name) to (new table name)

3.4. Set theoretic operation


Several set theoretic operations are used to merge the elements of two sets in
various ways. These operations are as follows.

3.4.1. Union
The result of this operation is denoted by the R U S, is a relation that includes all
tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.

R U S = S U R {commutative operation}
Select salesman 'ID", name
From sales_master
Where city =.mumbai.
union
Select client "ID" , name
From client_master
Where city =.mumbai.;

3.4.1.1. Restrictions on using an union operation is as follows

1. The number of column in all the queries should be same.


2. The data type of the column in each query must be the same.
3. Union cannot be used in the sub query.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 21
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
4. Aggregate function cannot be used with union clause.

3.4.2. Intersection
The result of this operation is denoted by RП S is a relation that include all tuples
that are in both R and S.

Select salesman "ID", name


From sales_master
Where city =.mumbai.
intersect
Select client "ID", name
From client_master
Where city =.mumbai.;

3.4.3. Set difference


The result of this operation is denoted by R.S, is a relation that includes all tuples
that are in in R but not in S.

Selecr product_no from product_master


Minus
Select product_no from sales_order;

3.4.4. Join operation


Denoted by X, is used to combine related tuples from two relations into a single
tuple. This operation is very important because it allow us to process relationship
among relations.

R X (join condition) S
There are some categories of join operations.
1. Cartesian product (cross product) or (cross Join); the main difference
between the Cartesian product and join, in join, only combination of tuples
satisfy the join condition appear in the result.
2. equi join: where only comparison operator is used =, is called the equi
join. Each pair of attributes with identical value is spurious. Removal of
spurious tuples is followed by natural join R * S.

3.4.5. Division operation


Division operation if used for special kind of query that sometimes occurs in
database application.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 22
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

3.4.6. Aggregate function


On collection of values from the database, these functions are as follows:

SUM, AVERAGE, MAX, MIN

3.4.7. COUNT
This function is used to count tuples and attributes.

3.4.8. Grouping
This is used to group the attribute of any relation.

Select company, sum (amount) from sales


Group by company
Having sum (amount) > 10,000;

3.4.9. Recursive closure operation


This operation is used a recursive relationship.

3.4.10. Outer join

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 23
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Natural join is denoted by R * S
Where R, S are relations

Only tuples from R that have matching tuple in S will be selected in the result and
without matched tuples are eliminated. Null tuples also eliminated.
A set of operations, called outer join can be used when we want to keep all the
tuples in R and S or in both. The relations whether they match or not.
Outer join is used to take the union of tuples of twp relations, if the relation is not
union compatible. Then they are called partially compatible. Only some of their
attributes are union compatible. This type of attribute has a key for both the
relation.

Left outer join:


R =>< S
Keeps every tuple or R, if no matching found in S, then S have null values.

Right outer join:


R ><= S
Full outer join R=><=S
{If no match found set null value in the tuple}

Outer union:
Student (name, SSN, department, advisor)
Faculty (name, SSN, department, Rank)
Result (name, SSN, department, advisor, Rank)

All the tuples of both the relation will appear in the result.

3.5. Tuple relational calculus


Relational calculus is formal query language. When we write one declarative
expression to specify a relation request and hence there is no description how to
evaluate the query.
Tuple relational calculus is based on specifying a number of tuple variables.
Variables may take as its value any individual tuple from that relation. A simple
tuple relational calculus queries is of the form

{t | cond(t)}

Result will display the set of all tuple t that satisfy cond(t).
Ex: find all employees whose salary > 50,000.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 24
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
{t|employee(t) and t.salary>50,000}
This notation resembles how attributes name are qualified with relation names.
{t.fname,t.lname|employee(t) and t.salary>50,000}

select t.fname, t.lname


from employee as t
where t.salary >50,000;

3.5.1. Expression and formulas in tuples calculus


A general expression of the tuple relational calculus of the form
{t1.a1, t2.a2.....tn.an | cond(t1.t2.t3.t4........tn)}
Where
t1.a1, t2.a2.....tn.an  tuple variable
Ai is an attribute of the relation on which ti ranges.
Cond  is a condition or formula

Formula:
Formula is made up of predicate calculus atoms which can be one of the
followings.
1. An atom of the form R(ti)
where R  relation name
ti  tuple variable
R(ti)  identifies the range of the tuple variable ti as the relation whose
name is R
2. An atom of the form ti. A op tj.B
where op  comparison operator
set = { > < >= <= #}
ti and tj are tuple variable
A  attribute of the relation on which ti ranges
B  attribute of the relation on which tj ranges
3. An atom of the form ti.A op c or c op tj.B
where op  comparison operator
ti and tj are tuple variable
A  attribute of the relation on which ti ranges
B  attribute of the relation on which tj ranges
C  constant value

A formula is made up of one or more atoms connected via the logical operator
AND, OR, NOT, and is defined as follows
1. Every atom is a formula.
2. If f1 & f2 are formulas, then so are ( f1 AND f2), ( f1 OR f2), ( f1 NOT f2)
and NOT (f2)

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 25
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
3. The truth values of these formulas are derived from their component
formulas f1 and f2 as follows.
a. (f1 AND f2) is true if both f1 and f2 are true.
b. ( f1 OR f2)is false , if both f1 and f2 are false otherwise true
c. NOT (f1) is true if f1 is false, it is false if f1 is true
d. NOT (f2) is true if f2 is false, it is false if f2 is true

3.5.2. Existence and universal quantifier


Two special symbols called quantifier can appear in formulas, there are
1. Universal quantifier
2. Existential quantifier

Firstly we need to define the concept of free and bound tuples in formulas.
Bound: a tuple variable t is bound if it is quantified meaning that it appear in an
and
Free: otherwise it is free.
We can define the tuple variable in a formula as free and bound according to the
following rule.
1. An occurrence of a tuple variable in a formula F that is an atom is free in
F.
2. An occurrence of a tuple variable t is free or bound in formula made up of
logical connectives. (f1 AND f2), ( f1 OR f2), ( f1 NOT f2) and NOT (f2)
depending on whether it is free or bound in f1 and f2. a tuple variable may be
free or bound either in f1 or in f2.
3. All free occurrence of a tuple variable t in f are bound in a formula f of the form.
F.= ( f) or
F.= (F)
The tuple variables are quantifier specified in f.
F1= d.dname=.research.
F2= ( d.dname=t.DNO)
F3= ( d.mgrssn=.12345677)
Tuple variable d is free in both f1 & f2. where it is found to the universal
quantifier in f3.
t is bound to the quantifier in f2.

3.5.3. Rules for the definition of a formula

1. if f is formula then so
is ( f)
where t tuple variable

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 26
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
the formula ( f) is true
if the formula f evaluates to true some ( at least one) tuple assigned to free
occurrence of t in f. otherwise ( f) is false.
2. if f is a formula , then so is (F)
where t tuple variable
The formula (F) is true, if formula f evaluates to true for every tuple (in
the universe) assigned to free occurrence to t in f.
otherwise (F) is false.
Note: quantifier called the existential quantifier because a formula

(f) is true , if there exist some tuples that makes f true.


quantifier called the universal quantifier (F) is true for every
possible tuple.

3.6. Transforming the universal and existential quantifier


We now use some of the transformation from mathematical logic that states the
universal and existential quantifier. It is possible to transform a universal
quantifier into an existential quantifier and vise-versa.

3.6.1. Domain relational calculus


There is another type of relational calculus called the domain relational calculus
or simply domain calculus. The QBE language related to domain calculus. The
specification of domain calculus was proposed after the development of QBE
language.
Domain calculus is differing from the tuple calculus in the type of variable used in
the formula.
An expression of the domain calculus is of the form
{X1, x2.....xn+1.....xn+m) | cond(x1,x2.....xn+1.....xn+m}
where
X1, x2.....xn+1.....xn+m are domain variable that ranges domain of attributes.
Cond= is the condition or formula of the domain relational calculus.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 27
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
A formula is made up of atoms. A formula can be one of the followings.
1. An atom of the form R(x1, x2....xj)
R  name of relation of degree
And each
Xi 1<= I<=j is a domain variable
2. An atom of the form xi op xj
where op  comparison operator in the set
3. An atom is of the form xi op c or c op xj
where op  comparison operator in the set
Xi and xj are domain variable
c  constant value

4. DATABASE DESIGN
Conceptual database design gives us a set of relational schemas and integrity
constraints (ICs) that can be regarded as a good starting point for the final
database design. This initial design must be refined by taking the ICs in to
account more fully than is possible with just the ER model constructs and also by
considering performance criteria and typical workloads.
We concentrate on an important class of constraints called functional
dependencies. Other kind of ICs, for example, multi-valued dependencies and
join dependencies, also provide useful information. They can sometimes reveal
redundancies that cannot be detected using functional dependencies alone.

4.1. Schema Refinement


Redundant storage of information is the root cause of problems. Although
decomposition can eliminate redundancy; it can lead to problems of its own and
should be used with caution.

4.1.1. Guidelines for relation schema

1. Semantics of the attributes: every attributes in the relation must belong to


the relation as we know; relation is a collection of attributes and having a
meaning. Semantics means, how the attribute values in a tuple relate to one
another.
Example: (ename, ssn, bdate, address, dnumber)
Each attribute give the information about employees.

2. Redundant information in the tuples:


For the best use of free space, we disallow the redundant tuples from a relation.
For this we use some anomalies.
• Insert Anomalies

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 28
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
• Deletion Anomalies
• Modification Anomalies

3. Reducing null values in a tuple:


Because this can waste space at the storage level and may create a problem
with under standing the meaning of the attribute. Null values can have multiple
interpretations.
• Attributes values does not apply.
• Attribute values are not known for a tuple.
• Value is known but has not been recorded yet.

4. Spurious tuples:
Spurious tuples are those tuples which give the wrong information. The spurious
tuples are marked by asterisks (*).
Example:
Emp_loc (ename, plocation)
Emp_proj (ssn, pno, hours, pname, plocation)

4.2. Functional Dependencies


A functional dependency is denoted by X  Y between two sets of attributes
X and Y.
For any two tuples t1 and t2 in r
T1[X]=t2[X]

We must also have T1[Y]=t2[Y]

This means that the value of Y component of a tuple is depend on, or determine
by the value of X components or vise-versa.
X  called the left hand side of the FD
Y  called the right hand of the FD
X functionally determines the Y in a relation R if and only if whenever two
tuples of r(R) agree on their x value and agree on y values.

1. X is a candidate key. Because the key constraints imply that not two
tuples will have the same value of X.
2. if X Y in R, this does not say whether or not Y  X in R.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 29
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

4.2.1. Interference rules for Functional Dependencies


Set of functional dependency is denoted by F that is specified on relational
schema R. it is impossible to specify all possible functional dependencies that
may hold. The set of all such dependency is called the closure of F and is
denoted by F+.

F={ssn  {ename,bdate,address,dnumber},
Dnumber  {dname,dmgrssn}}
Ssn  {dname,dmgrssn},
Ssn  ssn,
Dnumber  dname

F+ is also known as infer dependency. To determine a systematic way to infer,


we use inference rules. F=xy is used to denote that the functional dependency.
XY is inferred from the set of FD of F.

4.2.2. Axioms to check if FD holds

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 30
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

4.2.3. An Algorithm to Compute Attribute Closure X+ with respect to F


Let X be a subset of the attributes of a relation R and F be the set of functional
dependencies that hold for R.
1. Create a hyper graph in which the nodes are the attributes of the relation
in question.
2. Create hyperlinks for all functional dependencies in F.
3. Mark all attributes belonging to X
4. Recursively continue marking unmarked attributes of the hyper graph that
can be reached by a hyperlink with all ingoing edges being marked.
Result: X+ is the set of attributes that have been marked by this process.

4.2.3.1. Hyper graph for F

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 31
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

4.3. NORMALIZATION

4.3.1. Basics of normal forms


A set of functional dependencies is specified for each relation, the process which
is top-down fashion and decomposing relation as necessary. Initially codd(1972)
proposed 1NF,2NF,3NF. The stronger definition of 3NF is boyce-codd normal
form proposed be Boyce and codd.
All these normal forms are based on the FD of a relation. After some time, 4NF &
5NF were proposed based on the concept of multi-valued dependencies and join
dependency.

4.3.1.1. 1NF (first normal form)


It was defined to disallow multi-value and composite attribute and their
combination. It states that the domain of an attribute must include only atomic
value. Values of any attributes in a tuple must be a single value from the domain
of that attribute.

4.3.1.2. 2NF (second normal form)

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 32
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Second normal form is based on the concept of full functional dependency. A FD
(XY) if full functional dependency. If removal of any attribute A from x means
that the dependency does not hold any more.
i.e A x
(X{A}) does not determine Y. XY is partial dependency if some attribute
removes from x.

4.3.1.3. 3NF (third normal form)


It is based on the concept of transitive dependency.
FD xY in a relation schema R is transitive dependency.
There is a attribute z neither candidate key not a subset of any key of R.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 33
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
XZ
ZY
Dependency hold.

4.3.1.4. Boyce-Codd normal form (BCNF)


Boyce-Codd normal form (or BCNF) is a normal form used in database
normalization. It is a slightly stronger version of the third normal form (3NF). A
table is in Boyce-Codd normal form if and only if, for every one of its non-trivial
functional dependencies X → Y, X is a superkey - that is, X is either a candidate
key or a superset thereof.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 34
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF
table which does not have multiple overlapping candidate keys is guaranteed to
be in BCNF. Depending on what its functional dependencies are, a 3NF table
with two or more overlapping candidate keys may or may not be in BCNF.

An example of a 3NF table that does not meet BCNF is:

Today's Court Bookings

Court Start Time End Time Rate Type

1 09:30 10:30 SAVER

1 11:00 12:00 SAVER

1 14:00 15:30 STANDARD

2 10:00 11:30 PREMIUM-B

2 11:30 13:30 PREMIUM-B

2 15:00 16:30 PREMIUM-A

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 35
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
• Each row in the table represents a court booking at a tennis club that has
one hard court (Court 1) and one grass court (Court 2)
• A booking is defined by its Court and the period for which the Court is
reserved
• Additionally, each booking has a Rate Type associated with it. There are
four distinct rate types:

• SAVER, for Court 1 bookings made by members


• STANDARD, for Court 1 bookings made by non-members
• PREMIUM-A, for Court 2 bookings made by members
• PREMIUM-B, for Court 2 bookings made by non-members

4.3.1.5. Algorithm for relational database


For a database, a universal relation schema R=(A1,A2,……….An) that include all
the attribute of the database. In this universal relation assumption, this states that
every attribute name is unique. A set of functional dependency that should hold
on the attribute or R specified by the database designers. Using functional
dependency, the algorithms decompose the universal relation schema R into a
set of relation schema D=(R1,R2,………….Rm)

D= relational database schema (D is called a decomposition of R)


We must sure that each attribute in R will appear in at least one relation schema
Ri in the decomposition, so that no attribute are lost.

I=1UmRi=R

R = {R1UR2UR3………………….Rm}
This is called attribute preservation condition of decomposition.

4.3.1.6. Decomposition and dependency preservation


If each functional dependency X→Y specified in F appears directly in one of the
relation schemas Ri in the decomposition D or could be inferred from the
dependencies that appears in some Ri. This is the dependency preservation
condition.
We want to preserve the dependency because each dependency in F represents
constraints on the database. That is needed to join two or more relations.
Suppose that a relation R is given and a set of functional dependency F.
F+ is the closure of F.
Decomposition D = {R1, R2………………Rm} of R is dependency preservation
with respect to F.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 36
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
4.3.1.7. Decomposition and lossless (non-additive) joins

Another property a decomposition D should process in the loss-less join or non-


additive join property. Which ensure that no spurious tuples are generated, when
a normal join operation is applied to the relation in the decomposition. The
condition of no spurious tuples should hold on every legal relation state. Every
relation satisfies the functional dependency in F.

A decomposition D={R1,R2………………Rm} of R has the loss-less (non-


additive) join property with respect to the set of dependency F of R. if every
relation state r of R that satisfy F. where * is the natural join of all the relation in
D.

Word loss in loss-less refers to the loss of information, not loss of tuples. If
decomposition does not have loss-less join property. We may get additional
spurious tuples.

4.3.1.8. Multi-valued dependencies and fourth normal forms


In this section we will study about multi-valued dependency. That is a
consequence of first normal form (1NF), which allowed an attribute in a tuple to
have a set of values. For multi-valued attribute, we repeat every value of one of
the attribute with every value of the other attribute to keep the relation state
consistent. This constraint is specified by a multi-valued dependency.

An employee may work on several projects and several dependent. But project
and dependent are independent to each other. To make the relation consistent,
we must have a separate tuple to represent every combination of an employee’s
dependent and employee project. This constraint is specified as multi-valued
dependency.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 37
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

4.3.1.8.1. Inference rules for functional and multi-valued dependency


To develop inference rule that includes both FD’s and MVD’s, so that both types
of constraints can be considered together.
Inference rules IR1 through IR8 form a complete set for FD’s and MVD’s from a
given set of dependency.

R={A1,A2……………….Am} and X,Y,Z,W are subset of R.

4.3.1.8.2. Fourth normal forms


A relation schema R is in 4NF respect to a set of dependency F (that includes FD
and MVD) if, for every MVD’s XY in F+, X is a super key for R.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 38
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

4.3.1.9. Loss-less join decomposition

4.3.1.10. Join decomposition and fifth normal form


Join dependency (JD), denoted by JD (R1, R2……………..Rn) specified on
relation schema R, specifies constraints on state r of R.
The constraints state that every legal state r of R should have a loss-less join
decomposition into R1, R2……………..Rn.

A join decomposition JD (R1, R2……………..Rn) specified on relation schema R


is a trival JD if one of the relation schema Ri in JD (R1, R2……………..Rn) is
equal to R. such a dependency is called trival dependency because it has the
loss-less join property. For any relation state r of R and hence does not specify
any constraints on R.

4.3.1.10.1. Fifth normal forms (Project join normal form)

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 39
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
A relation schema R is in 5NF or project join normal form (PJNF) with respect to
a set F of functional , multi-valued dependency JD (R1, R2……………..Rn) in F+
(i.e. implies by F), every Ri is a super key of R.

Example:

4.4. Inclusive dependency


Inclusion dependency was defined in order to formalize certain interrogational
constraints.

Example:
Foreign key constraints cannot specify as FD’s or MVD’s. Because it relates
attributes across relations. It can be specified as inclusive dependencies.
Inclusive dependency is also used to represent the constraints between two
relations.

An inclusive dependency
R.X<S.Y between two relation (set of attributes)
X of relation R
And Y of relation S

x of R and y of S must have the same number of attributes.


Example:
If X = {A1, A2……………….An}
And
Y = {B1, B2…………………Bn}
Where
1<=I<=n
Ai corresponds to Bi.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 40
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Inference rules for inclusive dependency.

1. IDIR1 reflexive rule R.X<R.X


2. IDIR2 Attribute correspondence
If
R.X<S.Y
where
X={A1,A2……………….An}
And
Y={B1,B2…………………Bn}
Ai corresponds to Bi.
R.Ai<S.Bi for 1<=i<=n
3. IDIR3 transitive rule
If R.X<S.Y
And
S.Y<T.Z
Then
R.X<T.Z
All the inclusion dependency represents referential integrity constraints.

5. TRANSACTION MANAGEMENT

5.1. Transaction Concept


A transaction is a unit of program that access and possibly updates various data
items. A transaction usually results from the execution of a user program written
in high level language or data manipulation language or any other programming
language.
Example: SQL, COBOL, C, PASCAL

And is determine by statements or system calls of the form begin transaction and
end transaction.
The transaction consist of all the operation between begin and end. To ensuring
the integrity of data, we require that the database must maintain the following
properties.

1. Atomicity: either all operation of the transaction is reflected property in


database or none.
2. Consistency: execution of a transaction in isolation (i.e. no other
transaction execution concurrently) preserve the consistency of the
database.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 41
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
3. Isolation: even through multiple transactions can execute concurrently.

Ti & Tj set of transactions


Ti execution finished
Tj start execution

4. 4. Durability: after a transaction complete successfully, the changes it


has made to the database persist, even if there is a system failure.

These properties are called as ACID properties.

Access to the database accomplished by the following two operations.

1. Read(X): which transfer the data item X from the database to local buffer
belonging to the transaction that execute the read operation.
2. Write(X): that execute the write back to the database.

Example:
Ti that transfer $50 from account A to account B.
Ti:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE (B)
Initial value of A and b are 1000$ and 2000$. Suppose the system failure occurs
after the write (A) and before .Then the account information
A=950$
B=2000$

5.2. Transaction state


Compensating transaction: to undo the effect of committed transaction is to
execute a compensating transaction.
We establish a simple abstract transaction model transaction must be in the
following states.

Active: the initial state, the transaction stays in this state while executing.
Partially committed: after the final statement has been executed.
Failed: after the discovery that normal executing can no longer proceed.
Aborted: after the transaction has been rolled back and the database has
been restored the prior state.
Committed: after successful completion.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 42
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
A transaction enters the failed state after the system determines that the
transaction can no longer proceed with its normal execution.

Example:
Hardware or logical errors, such as a transaction must be rolled back, then
entered the aborted state, at this point system has two options.

1. Restart the transaction: hardware or software error


2. Kill the transaction: internal logical error that can be correct only by
rewriting or because due to the bad input.

5.3. Implementation of atomicity and durability


Recovery management component of a database system implements the
support of atomicity and durability.
Shadow-database scheme: transaction that wants to update on the database,
first create the complete copy of the database. All updates are done into the new
copy of the database, leaving the original copy, called the shadow copy.
If at any time, transaction has to be aborted, the new copy deleted. The old copy
of the database is unaffected. If transaction completes, operating system asks
write all the new copy on to the disk.
In UNIX operating system FLUSH command is used. After the FLUSH has
completed db_pointer, now points to the current copy of the database.

5.4. Concurrent Execution


A database system must control the interaction among the concurrent transaction
to ensure consistency of the database. In this section, we focus on the concept of
concurrent execution.

Example:
Consider the set of transaction that access and updates the bank account.
Let T1 and T2 be two transactions.

T1:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)

T2:
READ(A)

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 43
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
TEMP:=A*0.1;
A:=A-TEMP;
WRITE(A)
READ(B)
B:=B+TEMP;
WRITE(B)

Initial value of A and b are 1000$ and 2000$.

CASE1.
If T1 followed by T2
A=855$
B=2145$

CASE2
If T2 followed by T1
A=850$
B=2150$

5.5. Schedule
Execution sequences are called as schedules that show the order of transaction
execution. These schedules are called serial schedule. Each serial schedule
consists of a sequence of instruction from the various transactions, where the
instruction belonging to the single transaction appears together in the execution.
If two transactions are running concurrently, the CPU switches between the two
transactions or shared among all the transaction.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 44
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Final value of A and B are A=855$, B=2145$


Some of the schedules leave the database in inconsistence state.
Consider the example:

Final value of A and B are A=900$, B=2150$. Here we gained 50$.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 45
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
5.6. Serializability
The database system must control execution of concurrent transaction to ensure
that the database system remains consistent. Then we first understand which will
ensure consistency and which schedule will not.

Generally transaction performs two operations.


I. Read operation
II. Write operation

A transaction performs this sequence of operations on the copy of Q that is


residing in the local buffer of the transaction. Here we will discuss different forms
of schedules.
I. Conflict Serializability
II. View Serializability

5.6.1. Conflict Serializability

Consider a schedule S that consist two consecutive transactions Ti and Tj.


Where Ii and Ij are instructions respectively (I≠j)
1. If Ii and Ij refer to the different data items then we can swap Ii and Ij
without affecting the result of any instruction in the schedule.
2. If Ii and Ij refer to the same data items then the order of two steps may
matters. Here we are dealing with two operation read operation and write
operation.

a. Ii=READ(Q), Ij=READ(Q) order does not matter. Because the same


value of Q is read by both ( Ti and Tj)
b. Ii=READ(Q), Ij=WRITE(Q) order will matter.
c. Ii=WRITE(Q), Ij=READ(Q) order will matter.
d. Ii=WRITE(Q), Ij=WRITE(Q)

Since both instructions are write operation. The order of this instruction does not
affect Ti and Tj. But the value obtained by the next read (Q) instruction of S is
affected.
We sat that Ii and Ij conflict if they are operation by different transaction on the
same data item, and at least one of these instructions is a write operation.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 46
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

Serial schedule is defined as the all the instruction of any transaction executes
together.

If a schedule S can be transformed into a schedule S’ by a series of swaps of no-


conflicting instruction, we say that S and s’ are conflict equivalent.
The concepts of conflict equivalent leads to the concepts of conflict Serializability,
we say that a schedule S is conflict Serializable if it is conflict equivalent to a
serial schedule.
Such analysis is hard to implement and computationally expensive. We will
consider one such definition.

5.6.2. View Serializability

It is similar to conflict Serializability and based on the only read and write
operations of transactions. Consider the two schedule S and S’, where the same
set of transaction participates in both schedule. The schedule and S’ are said to
view Serializability, is they satisfy the following three conditions:

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 47
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
1. 1.for each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then the transaction Ti must be in schedule S’, also read the
initial value of Q.
2. for each data item Q, if transaction Ti executes the read (Q) in schedule S
and that the value was produced by transaction Tj, then transaction Ti
must be in schedule S’ also read the value of Q that was produced by the
Tj.
3. for each data items Q, the transaction that performs the final write (Q)
operation in schedule S must performs the final write(Q) operation in
schedule S’.

5.7. Recoverability
Still we are discussing about which schedule will ensure the consistency of the
database and which will not. With assuming that there is no transaction failure
now, we address the effect of transaction failure during concurrent execution.
Transaction Ti that fails, for what ever reason and we need to undo the effect
of Ti to ensure atomicity property. In a system that allows concurrent execution.
Tj that is dependent upon on Ti.( Tj reads the data item written by the Ti) also
aborted.
That’s why we need to place some restrictions on that schedules.

5.7.1. Recoverable schedule

Most database system requires that all schedules be recoverable. A recoverable


schedule is one where, for each pairs of transaction Ti and Tj such that Tj reads
a data item previously written by Ti. The commit operation of Ti appears before
the commit operation of Tj.

5.7.2. Cascade less schedule

Consider the example

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 48
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

T10 writes a value that is read by T11. Suppose T10 fails, T10 must be rolled
back. Since T11 is dependent on T10, T11 and T12. Then all the remaining
transaction must be rolled back.
The phenomenon in which a single transaction failure leads to a series of
transaction rollbacks is called Cascading roll back. It is desirable that cascading
roll backing should not be occurs in a schedule. Such schedules are called as
cascade less schedule. For every pairs of transactions, such as Ti and Tj, where
Tj reads the data item written by the Ti, the execution of Ti must finish before Tj.
Then it is easy to identify that recoverable schedule is cascade less schedule.

5.8. Testing for Serializability

Every schedule must be Serializable, we first understand to determine a given


particulars schedule S is Serializable or not.
Let S be a schedule. We construct a directed graph (precedence graph) from S.
G= (V, E)
Where V set of vertices
E set of edges
Vertices: consists all the transactions that are participating in a schedule.
Edges: Ti Tj for which one of the following condition hold.
1. Ti executes write(Q) before Tj executes read(Q)
2. Ti executes read(Q) before Tj executes write(Q)
3. Ti executes write(Q) before Tj executes write(Q)
For any particular schedule S
T1---------- T2
All the instructions of T1 executes before the first instruction of T2.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 49
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

5.9. Precedence graph

By using precedence graph scheme it is not conflict serializable. But it is view


serializable. There is an edge T4 T3 are called useless writes.
To test view serializability, we develop a scheme for deciding whether an edge is
need to be inserted in a precedence graph.

Schedule S
Tj reads a value written by Ti { Ti Tj}
If schedule S is view serializable then any schedule S’ i.e equivalent to schedule
S.

Tk executes write(Q)
Then in schedule S’
Tk Ti
Either
Tj Tk
It can not appear between Ti and Tj.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 50
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
To test view serializability, we need to extend the precedence graph to include
labeled edges. This types of graph termed as label precedence graph.
Rules for inserting labled edges in precedence graph:
Let us consider a schedule S having transaction s (T1, T2………….Tn)
Let Tb and Tf two transactions
Tb issues write(Q) for each Q accessed in S
Tf issues read(Q) for each Q accessed in S
Now, we construct a new schedule S’ from S by inserting
Tb at the beginning of S
Tf at the end of S.

We construct the labeled precedence graph for schedule S’ as follows.


1. Add an edge Ti Tj. If Tj reads the value of a data item Q written by Ti.
2. Remove all edges incident on useless transactions. A transaction Ti is
useless if there exsist no path in the precedence graph, from Ti Tf.
3. for each data item Q such that Tj reads a value of Q written by Ti and Tk
executes write(Q) and Tk≠Tb, do the followings:
a. Ti=Tb and Tj ≠ Tf then insert an edge Tj Tk.
b. If Ti≠ Tb and Tj=Tf then insert an edge Tk Ti.
c. If Ti≠ Tb and Tj≠Tf then insert an edge Tk Ti.
And Tj Tk in the labled precedence graph. Where P= unique
number.

6. CONCURRENCY CONTROL
When several transactions executes concurrently in the database, the isolation
property may no longer preserved. It is necessary for the system to control the
interaction among concurrent transaction. These types of controls are termed as
concurrency control schemes.

6.1. Lock based protocols


One way to ensure the serializability is to require that access to data item be
done in a mutually exclusive manner. I.e while one transaction is accessing a
data item, no other transaction can modify that data item.
One way to implement this requirement is to allow a transaction to access a data
item if it is currently holding a lock on that data item.

6.1.1. Locks
There are various modes in which a data item may be locked.
Share mode: if a transaction Ti has share mode lock (denoted by S) on the data
item Q, then Ti can read but can not write Q.
Exclusive mode: (denoted by X) then Ti can perform both read and write on Q.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 51
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Example:
T1: LOCK-X(B)
READ(B)
B:=B+50;
WRITE(B)
UNLOCK(B)
LOCK-X(A)
READ(A)
A:=A-50;
WRITE(A)
UNLOCK(A);

T2: LOCK-S(A)
READ(A)
UNLOCK(A)
LOCK-S(B)
UNLOCK(B)
DISPLAY(A+B);

Initial amount
A=100$
B=200$

Case1. T1 followed by T2
Case 2. T2 followed by T1
Case 3.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 52
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

This schedule will display the ( A+B) as 250$.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 53
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

This situation is called deadlock. When deadlock occurs, the system must roll
back on of the two transactions. The data item that was locked by that
transaction is unlocked. These data items were available to other transactions.

6.1.2. Granting of locks


When a transaction requests a lock on the data item in particular mode, and no
other transaction has a lock on the same data item in a conflicting mode. The
lock can be granted.
Suppose Transaction T2 has a lock and T1 request (has to wait) for T2 release
the exclusive mode lock.

T1 will wait
T2 lock-S(Q)( has)
T1 lock-X(Q) (wait)
T3 lock-S(Q)( request)
T4 lock-S(Q)( request)

T1 is still waiting. This situation is called as starvation where a particular


transaction continuously waiting for a particular lock on the same data item.

6.1.3. Avoiding starvation of transaction by granting locks

When a transaction Ti request a lock on data item Q in particular mode M, the


lock is granted provided that
1. There is no other transaction holding a lock on Q in a mode that conflict with
M.
2. There is no other transaction that is waiting for a lock on Q and that made its
lock request before Ti.

6.2. Two phase locking protocol


One protocol that ensures serializability is the two phase locking protocol. This
protocol requires that each transaction issue locks and unlock request in two
phases.
1. Growing phase: a transaction may obtain locks but not release any locks.
2. Shrinking phase: a transaction may release locks but may not obtain any
new locks.
When a transaction has obtained its final locks is called the lock point of
the transaction.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 54
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Cascading rollbacks can be avoided by a modification of two phase locking
called the strict two-phase locking protocol. This protocol requires that all
exclusive locks taken by the transaction must be held until that transaction
commits.
This requirement ensures that any data item written by an uncommitted
transaction are locked in exclusive mode until the transaction commits,
preventing any other transaction from reading the data. Another type of protocols
is the rigorous two-phase locking protocol. Which requires all lock to be held until
the transaction commits. It can be easily verified that transaction can be
serialized.

6.3. Graph based protocol


If we wish to develop protocol that is not two-phase, we need additional
information on how each transaction will access the database. In this model we
have prior knowledge about the order in which the database item will be
accessed.

To acquire such prior knowledge a particular order on the set D = (d1,


d2……………..dn) of all data items. If di dj. Then any transaction di and dj
must access di before accessing dj. This ordering can be shown as a directed
acyclic graph, called database graph. We restricted to employee only exclusive
locks.
In the tree protocol, the only lock allowed is lock-X. Each transaction action Ti
can lock data item at most once and must follows the rules.
a. The first lock by Tj may be on any data item.
b. Subsequently, a data item Q can be locked by Ti only if the parent
of Q is currently locked by Ti.
c. Data item may be unlocked at any time.
d. A data item that has been locked and unlocked by Ti can not be
subsequently be relocked by Ti.

Advantages:
1. Unlocking may access earlier that leads to shorter waiting time and to
increase concurrency.
2. Protocol is deadlock free, no roll backs are required.

Disadvantages:
1. 1. Locking results in increased locking
2. 2. Additional waiting time
3. 3. Potential decrease in concurrency

6.4. Time-stamp based protocol

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 55
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
These types of locking protocol, we use for ordering between every pairs of
conflicting transaction is determines at execution time.

Time-stamp:
With each transaction Ti in the system, we associated a unique fixed time stamp
denoted by TS (Ti).

This time stamp assigned by the database system before the transaction Ti starts
execution.

If TS(Ti)  T0 transaction Ti

New entered transaction Ti TS(Tj)There are two simple methods for


implementing this scheme.
1. Use the value of system clock as the time stamp. That’s a transaction time
stamp is equal to the value of the clock when the transaction enters the
system.
2. Use a logical counter that is incremented after a new time-stamp has been
assigned. Transaction time-stamp is equal to the value of the counter.

6.5. Validation based protocol


In some cases, where the majority of the transactions are read-only transaction,
rate of conflicts among transaction may be low. But we do not know in advance
which transaction will be involved in a conflict. To gain that we need to scheme
for monitoring the system, we assume that each transaction Ti executes in two
phases.
1. Read phase: during this phase, the execution of transaction Ti takes
place, the value of the various data item are read and are stored in
variable local to Ti. All write operations are performed on temporary local
variable, without updating the actual database.
2. Validation phase: transaction Ti performs a validation test to determine
whether it can copy to the database. The temporary local variable that
holds the result of write operation without causing a violation of
serializability.
3. if transaction Ti succeeds in validation ( step 2). Then the actual updates
are applied to the database. Otherwise Ti is roll back.
a. Start (Ti), the time when Ti started its execution.
b. Validation (Ti), the time when Ti finished read phase and started its
validation phase.
c. Finish (Ti), the time, when Ti finished its write phase.

6.6. Recovery system

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 56
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
6.6.1. Failure Classification

There are various types of failure that occurs in the system. Each of which deals
with in a different manners.
Simple failure: does not loss of information in a system.
Difficult failure: of information in a system.
Here we consider only the following types of failure:

6.6.1.1. Transaction failure

There are two types of error that may cause transaction to fail.
Logical error: transaction can no longer proceed with its normal execution. Due
to such as bad input, data not found, overflow or resource limit exceeded.

System errors: The system has entered in undesirable state ( deadlock) as a


result of which a transaction cannot continue with its normal execution. For this
transaction re-execute after.

System crash: Such as bug in the database software, operating system fails,
that causes loss of contents of volatile storage.

Disk failure: Disk blocks loses its contents, either head crash. To recover this
types of failure , tapes are used.

6.6.2. Log based recovery:


The most widely structure for recording database modification is the log.
The log is a sequence of log records and maintains a record of all the update in
the database.
Log records having the following fields:

Transaction identifier:
It is a unique identifier of the transaction that performs write operation.

Data item identifier:


It is identifier of the data item. Basically it is the location of the data item on the
disk.

Old value:
Value of the data item prior to the write operation.

New value:
Value of the data item will have after the write operation.
Log record exist to record significant events during transaction processing.
< Ti, start> transaction Ti has started.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 57
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
<Ti,Xj,V1,V2> transaction Ti performed write operation on the data item Xj,
has the value V1 before the write , will has V2 after the write.
<Ti, commit> transaction Ti has committed.
<Ti, abort> transaction ti has aborted.

6.7. Deferred Database Modification


In this scheme, when a transaction is partially commits, the information on the log
associated with the transaction is used in executing the deferred writes. If the
system crashes before the transaction completes. Its execution or if the
transaction aborts then the information on the log is simply ignored.

T0:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)

T1:
READ(C)
C:=C-100;
WRITE(C)

6.8. Immediate Database Modification

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 58
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The immediate update technique allows database modification to be output to the
database while the transaction still in the active state.
Database modifications written by active transaction are called uncommitted
modification. In the event of a crash or transaction failure, the system must use
the old value field of the log records to restore the modified data item.

< T0, start>


< T0,A,1000,950>
< T0,B,2000,2050>
< T0,commit>
< T1, start>
< T1,C,700,600>
< T1,commit>

7. CENTRALIZED AND DISTRIBUTED DATABASE


In the traditional enterprise computing model, an Information Systems
department maintains control of a centralized corporate database system.
Mainframe computers, usually located at corporate headquarters, provide the
required performance levels. Remote sites access the corporate database
through wide-area networks (WANs) using applications provided by the
Information Systems department.
Changes in the corporate environment toward decentralized operations have
prompted organizations to move toward distributed database systems that
complement the new decentralized organization.
Today’s global enterprise may have many local-area networks (LANs) joined with
a WAN, as well as additional data servers and applications on the LANs. Client
applications at the sites need to access data locally through the LAN or remotely
through the WAN. For example, a client in Tokyo might locally access a table
stored on the Tokyo data server or remotely access a table stored on the New
York data server.
Both centralized and distributed database systems must deal with the problems
associated with remote access:
• Network response slows when WAN traffic is heavy. For example, a
mission-critical transaction-processing application may be adversely
affected when a decision-support application requests a large number of
rows.
• A centralized data server can become a bottleneck as a large user
community contends for data server access.
• Data is unavailable when a failure occurs on the network.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 59
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
7.1. Distributed Database System
A distributed database system is a collection of data that belongs logically to the
same system but is physically spread over the sites of a computer network.

7.2. Some advantages of the DDBMS are as follows:


1. Distributed nature of some database application: some database
application arte naturally distributed over the different sites.
2. Increased reliability and availability: there are two most common
advantages for any database. Reliability is broadly defined as the
probability that a system is up at a particular moments. Availability is the
probability that the system is continuously available during a time interval.
3. Allowing data sharing while maintaining some measures of local
controls: it is possible to control the data & software locally at each site.
However the certain data can be accessed by users at other remote site
through the DBMS software. This allows the controlled sharing of data
through out the distributed system.
4. Improved performance: when a large data is distributed over the multiple
sites, smaller data base exist at each site. As a result, local queries &
transaction accessing data at a single site have better performance
because of the smaller local database. If all the transaction are submitted
to a single centralized database, than the performance will be decreased.

7.3. Some additional properties:


1. The ability to access remote sites and transmit queries and data among
the various sites via a communication network.
2. The ability to decide on which copy of a replicated data item to access.
3. The ability to maintain the consistency of copies of a replicated data item.
4. The ability to recover from individual site crashes and from new types of
failure such as the failure of the communication links.

7.4. Physical hardware level


The following main factors distinguish a DDBMS from a centralized system:
1. There are multiple computers called site or nodes.
2. These sites must be connected by some types of communication network
to transmit data and command among the site.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 60
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The site may be within the same building or group of adjacent building via local
area network or they may be geographically distributed over the large distance
and connected via a long haul network. Local area network typically uses cables.
Whereas long haul network use telephone lines or satellites it is also possible to
use a combination of the two types of network. Networks may have different
topologies that define the different communication among sites.

7.5. Client Server Architecture


The client server architecture has been developed to deal with new computing
environment in which a large no. of personal computers, workstations, file
servers, peripherals and others equipments are connected together via a
network. The idea is to define specialized covers with specific functionalities.

The instruction between client and server might proceed as follows during
processing of an SQL query.
1. The client passes a users query and decomposition it into a number of
independent site queries. Each site query is sent to the appropriate
receiver site.
2. Each server processes the local query and sends the resulting relation to
the client site.
3. The client site combines the result of the sub queries to improve the result
of the originally submitted query. In this approach SQL server has called a
database processor (DP) or a back-end machine whereas the client has
been called as application processor (AP) or front-end machine.

The DDBMS, it is to divide the software modules into the three levels.

1. The server software is responsible for local data management at a site.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 61
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
2. The client software is responsible for most of the distribution function. It
accesses the data distribution information from the DDBMS catalog and
processes all request that require access to more than one site.
3. The communication software provides the communication primitives that
are used by the client to transmit command and data among the various
sites as needed.

7.6. Data fragmentation


If relation r is fragmented, r is divided into a number of fragments r1,
r2……………rn. These fragments contains the sufficient information to allow
reconstruction of the original relation r. this reconstruction can take place through
the application of either the union operation or special types of join operation on
the various fragments.
There are three different types of schemes for fragmenting a relation:
I. Horizontal fragmentation
II. Vertical fragmentation
III. Mixed fragmentation

7.6.1. Horizontal fragmentation


In this each tuple of r is fragment into one or more fragments horizontally. A
relation r is partitioned into a number of subsets r1, r2……………rn. Each tuple of
the relation r must belong to at least one of the fragments so that the original
relation can be reconstructed. These fragments can be defined as a selection
operation.

For reconstruction we uses union operation


R=r1Ur2U……………….rn

7.6.2. Vertical fragmentation


In this each column of r is fragment into one or more fragments vertically. Vertical
fragmentation r(R) involves the subset of attributes R1,R2…………..Rn of the
schema R such that
R=R1UR2U……………….Rn
Each fragments of r is defined by project operation

For reconstruction we uses join operation


R=r1×r2×……………….rn

7.6.3. Mixed fragmentation


Either the horizontal fragments or vertical fragments. A relation r is divided into a
number of fragments R1,R2…………..Rn. each fragments is obtained as the

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 62
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
result of applying either the horizontal fragmentation or vertical fragmentation
scheme on relation r or a fragments of r which was obtained previously.

7.7. Data Replication


If r relation is replicated, a copy of relation r is stored in two or more sites. If we
have full replication in which a copy is stored in every site in the system.

Availability: If one site fails then the relation may found on the other site. This
system may continue the process.
Increased parallelism: where the majority of access to the relation r result in
only the reading the relation. Then the several sites can process the queries
involving r in parallel. Then there is the chance that needed data is found when
the transaction is executing.
Increased overhead on update:
The system must ensure that all replicas of a relation r are consistent; otherwise
error ness computations may result. Whenever r is updated, the update must be
propagating to all sites containing replicas.

7.8. Deadlock handling


A system is in deadlock state if there exist a set of transaction such that every
transaction in a set is waiting for the transaction in the set.
Suppose a set of waiting transaction { T0,T1………Tn}
T0 is waiting for a data item held by T1
.
.
.
.
Tn is waiting for a data item held by T0
No any transaction can make progress in this situation.
There are two principal methods for dealing with deadlock problems:
a. Deadlock prevention: this protocol ensures that system will never enter
in deadlock state.
b. Deadlock detection and recovery: we allow a system to enter in
deadlock state and then they try to recover.

7.8.1. Deadlock prevention


There are two approaches to deadlock prevention
Approach1:
i. No cyclic waits can occurs
ii. All locks to be acquired together.
Approach2:
i. This approach is closer to deadlock recovery

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 63
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
ii. We rollback transactions instead of waiting for deadlock under the first
approach

7.8.1.1. The first approach

Each transaction locks all the data item before it begin it execution.

Disadvantages:
i. It is often hard to predict, before the transaction begins, what data item
need to be locked.
ii. Data item utilization will be very low, since many of data items may be
locked but unused for a long time.

7.8.1.2. The second approach


For preventing the deadlock is to use preemption method and transaction
rollback.
In preemption:
T2 request lock by T1
The lock granted to T1 may be preempted by roll backing back of T1 and
granting of lock to T2.
To control preemption we assign a unique time stamp to each transaction. The
system uses these time stamp only to decide whether a transaction should wait
or roll back.
Two different deadlock prevention schemes are proposed:
1. wait –die: this scheme is based on a non-preemption techniques. Ti
request a data item held by Tj
ti is allowed to wait if
Ti( time stamp)< Tj ( time stamp)
2. wound-wait: preemption techniques and is a counter part to the wait-die
scheme.
Ti request a data item held by tj
Ti is allow to wait only if
Ti( time stamp) >Tj ( time stamp)

7.8.1.3. Time –out based scheme


Another simple technique is based on the lock time outs.
In this approach, the transaction that has requested a lock waits for at most a
specified amount of time. If the lock has not been granted within that time, the
transaction is said to be time out and it rolled back itself and restarts.

Disadvantages:
i. One or more transactions involved in deadlock.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 64
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
ii. Short a wait result in transaction rollback, even there is no deadlock.
iii. Leading to wasted resources.
iv. Starvation is also possibility with this scheme.

7.8.2. Deadlock detection and recovery


i. If a system does not employ that ensures deadlock freedom, then a
detection & recovery scheme must be used.
ii. In this schemes system determines
iii. Whether a deadlock has occurred, if one has system must attempt to
recover from the deadlock.

To do this, system must


i. Maintains information about the current allocation of data item to
transaction as well as resulting data item.
ii. Develop an algorithm that uses this information to determine whether the
system has entered a deadlock state.
iii. Recover from deadlock, if deadlock exists.

7.8.2.1. Deadlock detection


To describe deadlock we use directed graph called wait-for graph.
Graph consist
G=(V,E)
Vset of vertices (all the transaction in the system)
Eset of edges

7.8.2.1.1. Directed graph


TiTj
Ti is waiting for transaction Tj to release data item that it needs. A deadlock
exists in a system if and only if the wait –for graph contains a cycle. Each
transaction in the cycle is said to be deadlocked. To detect deadlock, the system
maintains the wait-for graph and there search the cycle in the graph.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 65
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

7.8.2.2. Recovery from the deadlock


When system determines that a deadlock exists. The system must recover from
deadlock. The most common solution is to roll back one or more transaction to
break deadlock. The following actions need to be taken:
1. Select a victim: which one transaction is to be rollback.
a. How long the transaction has completed the task.
b. How many data items the transaction has used.
c. How many more data item the transaction needs for it to complete.
d. How many transactions will be involved in the rollback

2. Rollback: once we have decided that the particular transaction must be


roll back. We must determine how far this transaction should be rolled
back. But for these methods, system requires to maintain the information
about the state of all running transaction.
3. Starvation: when a system determines that a particular transaction never
completes its designated task. This situation is called starvation.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 66
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
8. SQL (STRUCTURED QUERY LANGUAGE)
SQL (Structured Query Language) is a database sublanguage for querying and
modifying relational databases. It was developed by IBM Research in the mid
70's and standardized by ANSI in 1986.

The Relational Model defines two root languages for accessing a relational
database -- Relational Algebra and Relational Calculus. Relational Algebra is a
low-level, operator-oriented language. Creating a query in Relational Algebra
involves combining relational operators using algebraic notation. Relational
Calculus is a high-level, declarative language. Creating a query in Relational
Calculus involves describing what results are desired.

SQL is a version of Relational Calculus. The basic structure in SQL is the


statement. Semicolons separate multiple SQL statements.

8.1. DDL Statements

DDL stands for data definition language. DDL statements are SQL Statements
that define or alter a data structure such as a table.

DDL statements are used to define the database structure or schema. Some
examples:

• CREATE - to create objects in the database


• ALTER - alters the structure of the database
• DROP - delete objects from the database
• TRUNCATE - remove all records from a table, including all spaces
allocated for the records are removed
• COMMENT - add comments to the data dictionary
• RENAME - rename an object

8.1.1. Implicit commits

In Oracle, a (successful) DDL statement implicitly commits a transaction;

SQL> create table ddl_test_1 (a number);

SQL> insert into ddl_test_1 values (1);

SQL> commit;

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 67
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

SQL> insert into ddl_test_1 values (2);

SQL> create table ddl_test_2 (a number);

SQL> rollback;

The create table statement implicitly committed the transaction. The insertion of
the value 2 into ddl_test_1 cannot be rolled back any more.

SQL> select * from ddl_test_1;

----------

8.1.2. Data dictionary

Since DDL changes definitions of database objects, a DDL is always reflected in


the data dictionary.

8.2. DML

Data Manipulation Language (DML) statements are used for managing data
within schema objects. Some examples:

• SELECT - retrieve data from the a database


• INSERT - insert data into a table
• UPDATE - updates existing data within a table
• DELETE - deletes all records from a table, the space for the records
remain
• MERGE - UPSERT operation (insert or update)
• CALL - call a PL/SQL or Java subprogram
• EXPLAIN PLAN - explain access path to data
• LOCK TABLE - control concurrency

8.3. Language Structure

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 68
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
SQL is a keyword based language. Each statement begins with a unique
keyword. SQL statements consist of clauses which begin with a keyword. SQL
syntax is not case sensitive.

The other lexical elements of SQL statements are:

• names -- names of database elements: tables, columns, views, users,


schemas; names must begin with a letter (a - z) and may contain digits (0
- 9) and underscore (_)
• literals -- quoted strings, numeric values, date-time values
• Delimiters -- + - , ( ) = < > <= >= <> . * / || ? ;

Basic database objects (tables, views) can optionally be qualified by schema


name. A dot -- ".", separates qualifiers:

schema-name. table-name
Column names can be qualified by table name with optional schema
qualification.

8.4. Basic SQL Queries


There are 3 basic categories of SQL Statements:

• SQL-Data Statements -- query and modify tables and columns


o SELECT Statement -- query tables and views in the database
o INSERT Statement -- add rows to tables
o UPDATE Statement -- modify columns in table rows
o DELETE Statement -- remove rows from tables
• SQL-Transaction Statements -- control transactions
o COMMIT Statement -- commit the current transaction
o ROLLBACK Statement -- roll back the current transaction
• SQL-Schema Statements -- maintain schema (catalog)
o CREATE TABLE Statement -- create tables
o CREATE VIEW Statement -- create views
o DROP TABLE Statement -- drop tables
o DROP VIEW Statement -- drop views
o GRANT Statement -- grant privileges on tables and views to other
users
o REVOKE Statement -- revoke privileges on tables and views from
other users

8.4.1. SQL data statements

8.4.1.1. SELECT Statement

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 69
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The SQL SELECT statement queries data from tables in the database. The
statement begins with the SELECT keyword. The basic SELECT statement has 3
clauses:

• SELECT
• FROM
• WHERE

The SELECT clause specifies the table columns that are retrieved. The FROM
clause specifies the tables accessed. The WHERE clause specifies which table
rows are used. The WHERE clause is optional; if missing, all table rows are
used.

For example,

SELECT name FROM s WHERE city='Rome'

8.4.1.2. INSERT Statement


The INSERT Statement adds one or more rows to a table. It has two formats:

INSERT INTO table-1 [(column-list)] VALUES (value-list)


and,
INSERT INTO table-1 [(column-list)] (query-specification)

The first form inserts a single row into table-1 and explicitly specifies the column
values for the row. The second form uses the result of query-specification to
insert one or more rows into table-1. The result rows from the query are the rows
added to the insert table. Note: the query cannot reference table-1.

INSERT Examples

INSERT INTO p (pno, color) VALUES ('P4', 'Brown')

Before After
pno descr color pno descr color
P1 Widget Blue P1 Widget Blue
P2 Widget Red P2 Widget Red
=>
P3 Dongle Green P3 Dongle Green
P4 NULL Brown

INSERT INTO sp
SELECT s.sno, p.pno, 500

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 70
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
FROM s, p
WHERE p.color='Green' AND s.city='London'

Before After
sno pno qty sno pno qty
S1 P1 NULL S1 P1 NULL
S2 P1 200 S2 P1 200
S3 P1 1000 => S3 P1 1000
S3 P2 200 S3 P2 200
S2 P3 500

8.4.1.3. UPDATE Statement


The UPDATE statement modifies columns in selected table rows. It has the
following general format:

UPDATE table-1 SET set-list [WHERE predicate]

The optional WHERE Clause has the same format as in the SELECT Statement.
The set-list contains assignments of new values for selected columns.

UPDATE Examples

UPDATE sp SET qty = qty + 20

Before After
sno pno qty sno pno qty
S1 P1 NULL S1 P1 NULL
S2 P1 200 S2 P1 220
=>
S3 P1 1000 S3 P1 1020
S3 P2 200 S3 P2 220

UPDATE s
SET name = 'Tony', city = 'Milan'
WHERE sno = 'S3'

Before After
sno name city sno name city
S1 Pierre Paris S1 Pierre Paris
=>
S2 John London S2 John London

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 71
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
S3 Mario Rome S3 Tony Milan

8.4.1.4. DELETE Statement


The DELETE Statement removes selected rows from a table. It has the following
general format:

DELETE FROM table-1 [WHERE predicate]

The optional WHERE Clause has the same format as in the SELECT Statement.

DELETE Examples

DELETE FROM sp WHERE pno = 'P1'

Before After
sno pno qty sno pno qty
S1 P1 NULL S3 P2 200
S2 P1 200
=>
S3 P1 1000
S3 P2 200

DELETE FROM p WHERE pno NOT IN (SELECT pno FROM sp)

Before After
pno descr color pno descr color
P1 Widget Blue P1 Widget Blue
P2 Widget Red => P2 Widget Red
P3 Dongle Green

8.4.2. SQL-Transaction Statements

SQL-Transaction Statements control transactions in database access. This


subset of SQL is also called the Data Control Language for SQL (SQL DCL).

8.4.2.1. COMMIT Statement


The COMMIT Statement terminates the current transaction and makes all
changes under the transaction persistent. It commits the changes to the
database. The COMMIT statement has the following general format:

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 72
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
COMMIT [WORK]

WORK is an optional keyword that does not change the semantics of COMMIT.

8.4.2.2. ROLLBACK Statement


The ROLLBACK Statement terminates the current transaction and rescinds all
changes made under the transaction. It rolls back the changes to the database.
The ROLLBACK statement has the following general format:

ROLLBACK [WORK]

WORK is an optional keyword that does not change the semantics of


ROLLBACK.

8.4.3. SQL-Schema Statements

SQL-Schema Statements provide maintenance of catalog objects for a schema --


tables, views and privileges. This subset of SQL is also called the Data Definition
Language for SQL (SQL DDL).

8.4.3.1. CREATE TABLE Statement


The CREATE TABLE Statement creates a new base table. It adds the table
description to the catalog. A base table is a logical entity with persistence. The
logical description of a base table consists of:

• Schema -- the logical database schema the table resides in


• Table Name -- a name unique among tables and views in the Schema
• Column List -- an ordered list of column declarations (name, data type)
• Constraints -- a list of constraints on the contents of the table

The CREATE TABLE Statement has the following general format:

CREATE TABLE table-name ({column-descr|constraint} [,{column-


descr|constraint}]...)

table-name is the new name for the table. column-descr is a column declaration.
constraint is a table constraint.

8.4.3.2. CREATE VIEW Statement


The CREATE VIEW statement creates a new database view. A view is effectively
a SQL query stored in the catalog. The CREATE VIEW has the following general
format:

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 73
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
CREATE VIEW view-name [ ( column-list ) ] AS query-1
[ WITH [CASCADED|LOCAL] CHECK OPTION ]

view-name is the name for the new view. column-list is an optional list of names
for the columns of the view, comma separated. query-1 is any SELECT
statement without an ORDER BY clause. The optional WITH CHECK OPTION
clause is a constraint on updatable views.

column-list must have the same number of columns as the select list in query-1.
If column-list is omitted, all items in the select list of query-1 must be named. In
either case, duplicate column names are not allowed for a view.

8.4.3.3. DROP TABLE Statement


The DROP TABLE Statement removes a previously created table and its
description from the catalog. It has the following general format:

DROP TABLE table-name {CASCADE|RESTRICT}

table-name is the name of an existing base table in the current schema. The
CASCADE and RESTRICT specifiers define the disposition of other objects
dependent on the table. A base table may have two types of dependencies:

• A view whose query specification references the drop table.


• Another base table that references the drop table in a constraint - a
CHECK constraint or REFERENCES constraint.

RESTRICT specifies that the table not be dropped if any dependencies exist. If
dependencies are found, an error is returned and the table isn't dropped.

CASCADE specifies that any dependencies are removed before the drop is
performed:

• Views that reference the base table are dropped, and the sequence is
repeated for their dependencies.
• Constraints in other tables that reference this table are dropped; the
constraint is dropped but the table retained.

8.4.3.4. DROP VIEW Statement


The DROP VIEW Statement removes a previously created view and its
description from the catalog. It has the following general format:

DROP VIEW view-name {CASCADE|RESTRICT}

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 74
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
view-name is the name of an existing view in the current schema. The
CASCADE and RESTRICT specifiers define the disposition of other objects
dependent on the view. A view may have two types of dependencies:

• A view whose query specification references the drop view.


• A base table that references the drop view in a constraint - a CHECK
constraint.

RESTRICT specifies that the view not be dropped if any dependencies exist. If
dependencies are found, an error is returned and the view isn't dropped.

CASCADE specifies that any dependencies are removed before the drop is
performed:

• Views that reference the drop view are dropped, and the sequence is
repeated for their dependencies.
• Constraints in base tables that reference this view are dropped; the
constraint is dropped but the table retained.

8.4.3.5. GRANT Statement


The GRANT Statement grants access privileges for database objects to other
users. It has the following general format:

GRANT privilege-list ON [TABLE] object-list TO user-list

privilege-list is either ALL PRIVILEGES or a comma-separated list of properties:


SELECT, INSERT, UPDATE, DELETE. object-list is a comma-separated list of
table and view names. user-list is either PUBLIC or a comma-separated list of
user names.

The GRANT statement grants each privilege in privilege-list for each object
(table) in object-list to each user in user-list. In general, the access privileges
apply to all columns in the table or view, but it is possible to specify a column list
with the UPDATE privilege specifier:

UPDATE [ ( column-1 [, column-2] ... ) ]

If the optional column list is specified, UPDATE privileges are granted for those
columns only.

The user-list may specify PUBLIC. This is a general grant, applying to all users
(and future users) in the catalog.

Privileges granted are revoked with the REVOKE Statement.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 75
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The optional specifier WITH GRANT OPTION may follow user-list in the GRANT
statement. WITH GRANT OPTION specifies that, in addition to access privileges,
the privilege to grant those privileges to other users is granted.

GRANT Statement Examples

GRANT SELECT ON s,sp TO PUBLIC

GRANT SELECT,INSERT,UPDATE(color) ON p TO art,nan

GRANT SELECT ON supplied_parts TO sam WITH GRANT OPTION

8.4.3.6. REVOKE Statement


The REVOKE Statement revokes access privileges for database objects
previously granted to other users. It has the following general format:

REVOKE privilege-list ON [TABLE] object-list FROM user-list

The REVOKE Statement revokes each privilege in privilege-list for each object
(table) in object-list from each user in user-list. All privileges must have been
previously granted.

The user-list may specify PUBLIC. This must apply to a previous GRANT TO
PUBLIC.

REVOKE Statement Examples

REVOKE SELECT ON s,sp FROM PUBLIC

REVOKE SELECT,INSERT,UPDATE(color) ON p FROM art,nan

REVOKE SELECT ON supplied_parts FROM sam

8.5. Union, Intersect and Except

The UNION, EXCEPT, and INTERSECT operators all operate on multiple result
sets to return a single result set:

• The UNION operator combines the output of two query expressions into a
single result set. Query expressions are executed independently, and
their output is combined into a single result table.
• The EXCEPT operator evaluates the output of two query expressions and
returns the difference between the results. The result set contains all
rows returned from the first query expression except those rows that are
also returned from the second query expression.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 76
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
• The INTERSECT operator evaluates the output of two query expressions
and returns only the rows common to each.

The following figure illustrates these concepts with Venn diagrams, in which the
shaded portion indicates the result set.

Result Set

UNION combines the rows from two or more result sets into a single result set.

EXCEPT evaluates two result sets and returns all rows from the first set that are
not also contained in the second set.

INTERSECT computes a result set that contains the common rows from two
result sets.

UNION, INTERSECT, and EXCEPT operators can be combined in a single


UNION expression.

In statements that include multiple operators, the default order of evaluation


(precedence) for these operators is left to right; however, the INTERSECT
operator is evaluated before UNION and EXCEPT. The order of evaluation can
be modified with parentheses.

8.5.1. ALL
If ALL is specified, duplicate rows returned by union_expression are retained. If
two query expressions return the same row, two copies of the row are returned in
the final result. If ALL is not specified, duplicate rows are eliminated from the
result set.

In statements with multiple UNION, EXCEPT, and INTERSECT operators, in


which the ALL keyword is used, the order of evaluation can affect the results.
The placement of the ALL keyword in relation to the query evaluation determines
the duplicates that are retained and eliminated. If the last operation performed
does not contain the ALL keyword, any duplicates retained from previous
evaluations are eliminated.

Examples:

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 77
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
This example contains a simple UNION operation that combines data from the
City and Hq_City columns in the Store and Market tables:

select hq_city as ca_cities


from market
where hq_state like 'CA%'
union
select city
from store
where state like 'CA%'

CA_CITIES
Cupertino
Los Angeles
Los Gatos
Oakland
San Francisco
San Jose

This example adds the ALL keyword, so duplicate Hq_City and City entries are
retained:

select hq_city as ca_cities


from market
where hq_state like 'CA%'
union all
select city
from store
where state like 'CA%'

CA_CITIES
San Jose
San Francisco
Oakland
Los Angeles
Los Gatos
San Jose
Cupertino
Los Angeles
San Jose

The following example replaces the UNION operator with EXCEPT;


consequently, only those California cities that are in the Market table but not the
Store table are returned:

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 78
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
select hq_city as ca_cities
from market
where hq_state like 'CA%'
except
select city
from store
where state like 'CA%'

CA_CITIES
Oakland
San Francisco

The following example replaces the UNION operator with INTERSECT;


consequently, only those California cities that are in both the Market table and
the Store table are returned:

select hq_city as ca_cities


from market
where hq_state like 'CA%'
intersect
select city
from store
where state like 'CA%'

CA_CITIES
Los Angeles
San Jose

The following query uses multiple INTERSECT operations to return a list of


common key values in five different tables:

select prodkey as common_keys from product


intersect select classkey from class
intersect select promokey from promotion
intersect select perkey from period
intersect select storekey from store

COMMON_KEYS
1
3
4
5
12

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 79
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The following example uses parentheses to force the order of evaluation in a
query that contains a UNION operation and an INTERSECT operation. The
parentheses force the UNION operator to be evaluated first; without them, the
INTERSECT operator would take precedence and the result set might differ.

(select prod_name from product


natural join sales_canadian
union
select prod_name from product
natural join sales_mexican)
intersect
select prod_name from product
natural join sales

Because parentheses override the default order of evaluation for UNION,


EXCEPT, and INTERSECT operations, they sometimes determine whether
duplicate rows are retained or eliminated from the final result set. The following
two queries illustrate this point

8.6. Cursors
Every SQL statement executed by the RDBMS has a private SQL area that
contains information about the SQL statement and the set of data returned. In
PL/SQL, a cursor is a name assigned to a specific private SQL area for a specific
SQL statement. There can be either static cursors, whose SQL statement is
determined at compile time, or dynamic cursors, whose SQL statement is
determined at runtime. Static cursors are covered in greater detail in this section.
Dynamic cursors in PL/SQL are implemented via the built-in package
DBMS_SQL.

8.6.1. Explicit Cursors

Explicit cursors are SELECT statements that are DECLAREd explicitly in the
declaration section of the current block or in a package specification. Use OPEN,
FETCH, and CLOSE in the execution or exception sections of your programs.

8.6.1.1. Declaring explicit cursors

To use an explicit cursor, you must first declare it in the declaration section of a
block or package. There are three types of explicit cursor declarations:

• A cursor without parameters, such as:

CURSOR company_cur

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 80
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
IS
SELECT company_id FROM company;

• A cursor that accepts arguments through a parameter list:

CURSOR company_cur (id_in IN NUMBER) IS


SELECT name FROM company
WHERE company_id = id_in;

• A cursor header that contains a RETURN clause in place of the SELECT


statement:

CURSOR company_cur (id_in IN NUMBER)


RETURN company%ROWTYPE IS
SELECT * FROM company;

This technique can be used in packages to hide the implementation of the cursor
in the package body.

8.6.2. Implicit Cursors

Whenever a SQL statement is directly in the execution or exception section of a


PL/SQL block, you are working with implicit cursors. These statements include
INSERT, UPDATE, DELETE, and SELECT INTO statements. Unlike explicit
cursors, implicit cursors do not need to be declared, OPENed, FETCHed, or
CLOSEd.

SELECT statements handle the %FOUND and %NOTFOUND attributes


differently from explicit cursors. When an implicit SELECT statement does not
return any rows, PL/SQL immediately raises the NO_DATA_FOUND exception
and control passes to the exception section. When an implicit SELECT returns
more than one row, PL/SQL immediately raises the TOO_MANY_ROWS
exception and control passes to the exception section.

Implicit cursor attributes are referenced via the SQL cursor. For example:

BEGIN
UPDATE activity SET last_accessed := SYSDATE
WHERE UID = user_id;

IF SQL%NOTFOUND THEN
INSERT INTO activity_log (uid,last_accessed)
VALUES (user_id,SYSDATE);
END IF
END;

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 81
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
8.7. Triggers

Triggers are programs that execute in response to changes in table data or


certain database events. There is a predefined set of events that can be
"hooked" with a trigger, enabling you to integrate your own processing with that
of the database. A triggering event fires or executes the trigger.

8.7.1. Creating Triggers

The syntax for creating a trigger is:

BEFORE | AFTER | INSTEAD OF trigger_event


ON
[ NESTED TABLE nested_table_column OF view ]
| table_or_view_reference | DATABASE
trigger_body;

INSTEAD OF triggers are valid on only Oracle8 views. Oracle8i must create a
trigger on a nested table column.

Trigger events are defined in the following table.

Trigger Event Description


INSERT Fires whenever a row is added to the table_reference.
UPDATE Fires whenever an UPDATE changes the table_reference.
UPDATE triggers can additionally specify an OF clause to
restrict firing to updates OF certain columns. See the
following examples.
DELETE Fires whenever a row is deleted from the table_reference.
Does not fire on TRUNCATE of the table.
CREATE (Oracle8i) Fires whenever a CREATE statement adds a new object to
the database. In this context, objects are things like tables
or packages (found in ALL_OBJECTS). Can apply to a
single schema or the entire database.
ALTER (Oracle8i) Fires whenever an ALTER statement changes a database
object. In this context, objects are things like tables or
packages (found in ALL_OBJECTS). Can apply to single
schema or the entire database.
DROP (Oracle8i) Fires whenever a DROP statement removes an object from
the database. In this context, objects are things like tables
or packages (found in ALL_OBJECTS). Can apply to a
single schema or the entire database.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 82
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Trigger Event Description
SERVERERROR Fires whenever a server error message is logged. Only
(Oracle8i) AFTER triggers are allowed in this context.
LOGON (Oracle8i) Fires whenever a session is created (a user connects to the
database). Only AFTER triggers are allowed in this context.
LOGOFF Fires whenever a session is terminated (a user disconnects
(Oracle8i) from the database). Only BEFORE triggers are allowed in
this context.
STARTUP Fires when the database is opened. Only AFTER triggers
(Oracle8i) are allowed in this context.
SHUTDOWN Fires when the database is closed. Only BEFORE triggers
(Oracle8i) are allowed in this context.

Triggers can fire BEFORE or AFTER the triggering event. AFTER data triggers
are slightly more efficient than BEFORE triggers.

8.8. Dynamic SQL

Dynamic SQL is a programming technique that enables you to build SQL


statements dynamically at runtime. You can create more general purpose,
flexible applications by using dynamic SQL because the full text of a SQL
statement may be unknown at compilation. For example, dynamic SQL lets you
create a procedure that operates on a table whose name is not known until
runtime.

Oracle includes two ways to implement dynamic SQL in a PL/SQL application:

• Native dynamic SQL, where you place dynamic SQL statements directly
into PL/SQL blocks.
• Calling procedures in the DBMS_SQL package.

Static SQL statements do not change from execution to execution. The full texts
of static SQL statements are known at compilation, which provides the following
benefits:

• Successful compilation verifies that the SQL statements reference valid


database objects.
• Successful compilation verifies that the necessary privileges are in place
to access the database objects.
• Performance of static SQL is generally better than dynamic SQL.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 83
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

9. QBE
Stands for "Query By Example." QBE is a feature included with various database
applications that provides a user-friendly method of running database queries.
Typically without QBE, a user must write input commands using correct SQL
(Structured Query Language) syntax. This is a standard language that nearly all
database programs support. However, if the syntax is slightly incorrect the query
may return the wrong results or may not run at all.

The Query By Example feature provides a simple interface for a user to enter
queries. Instead of writing an entire SQL command, the user can just fill in blanks
or select items to define the query she wants to perform. For example, a user
may want to select an entry from a table called "Table1" with an ID of 123. Using
SQL, the user would need to input the command, "SELECT * FROM Table1
WHERE ID = 123". The QBE interface may allow the user to just click on Table1,
type in "123" in the ID field and click "Search."

QBE is offered with most database programs, though the interface is often
different between applications. For example, Microsoft Access has a QBE
interface known as "Query Design View" that is completely graphical. The
phpMyAdmin application used with MySQL, offers a Web-based interface where
users can select a query operator and fill in blanks with search terms. Whatever
QBE implementation is provided with a program, the purpose is the same – to
make it easier to run database queries and to avoid the frustrations of SQL
errors.

10. QUERY PROCESSING AND OPTIMIZATION


SQL query processing requires that the DBMS identify and execute a strategy for
retrieving the results of the query. The SQL query determines what data is to be
found, but does not define the method by which the data manager searches the
database. Hence, query optimization is necessary for high-level relational queries
and provides an opportunity for the DBMS to systematically evaluate alternative
query execution strategies and to choose an optimal strategy.

10.1. Query Processing


The steps necessary for processing an SQL query are shown in Figure 2. The
SQL query statement is first parsed into its constituent parts. The basic SELECT

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 84
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
statement is formed from the three clauses SELECT, FROM, and WHERE.
These parts identify the various tables and columns that participate in the data
selection process. The WHERE clause is used to determine the order and
precedence of the various attribute comparisons through a conditional
expression. An example query to determine the names and addresses of all
patients of Doctor 1234 is shown as query Q1 below. The WHERE clause uses a
conjunctive clause which combines two attribute comparisons. More complex
conditions are possible.
Q1: SELECT Name, Address, Dr_Name
FROM Patient, Physician
WHERE Patient.Doctor = Physician.Provider AND Physician.Provider =
1234
The query optimizer has the task of determining the optimum query execution
plan. The term “optimizer” is actually a misnomer, because in many cases the
optimum strategy is not found. The goal is to find a reasonably efficient strategy
for executing the query. Finding the perfect strategy is usually too time
consuming and can require detailed information on both the data storage
structure and the actual data content. Usually this information is simply not
available.
Once the execution plan is established the query code is generated. Various
techniques such as memory management, disk caching and parallel query
execution can be used to improve the query performance. However, if the plan is
not correct, then the query performance cannot be optimum.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 85
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE

10.2. Query Optimizing


There are two main techniques for query optimization. The first approach is to
use a rule based or heuristic method for ordering the operations in a query
execution strategy. The rules usually state general characteristics for data
access, such as it is more efficient to search a table using an index, if available,
than a full table scan. The second approach systematically estimates the cost of
different execution strategies and chooses the least cost solution. This approach
uses simple statistics about the data structure size and organization as
arguments to a cost estimating equation. In practice most commercial database
systems use a combination of both techniques.

10.3. Indexes
Consider, for example, a rule-based technique for query optimization that states
that indexed access to data is preferable to a full table scan. Whenever a single
condition specifies the selection, it is a simple matter to check whether or not an
indexed access path exists for the attribute involved in the condition. Queries Q2
and Q3 are two queries which, from a syntactic structure, are identical. However,

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 86
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
query Q2 uses an index on the patient number, and query Q3 does not have an
index on the patient name. Assuming a balanced tree based index, query Q2 will
at worst case access on the order of log2 (n) entries to locate the required row in
the table. Conversely, query Q3 must search on average n/2 rows to find the
entry during a full table scan, and n rows if the entry does not exist in the table.
When n = 1,000,000 this is the difference between accessing 20 rows versus
500,000 rows for a successful search. Clearly, indexing can significantly improve
query performance. However, it is not always practical to index every attribute in
every table, thus certain types of user queries can respond quite differently from
others.
Q2: SELECT *
FROM Patient
WHERE Patient.SSN = 11111111

In this query, the SSN attribute is the primary key index for the Patient table.

Q3: SELECT *
FROM Patient
WHERE Patient.Name = “Doe, John Q.”

In this query, no index exists on the Name attribute. This requires a full table
scan.

10.4. Selectivities
A more significant problem occurs when more than one condition is used in a
conjunctive selection. In this case the selectivity of each condition must be
considered. Selectivity is defined as the ratio between the numbers of rows that
satisfy the condition to the total number of rows in the table. This is the
probability that a row satisfies the condition, assuming a uniform distribution. If
the selectivity is small, then only a few rows are selected by the condition, and it
is desirable to use this condition first when retrieving records. To calculate
selectivities, the database manager needs statistics on all table and attribute
values. The heuristic rule states that, for multiple conjunctive conditions, the
order of application is from smallest selectivity to largest.
Queries Q4 and Q5 illustrate multiple conditions in a conjunctive selection on the
Patient table. Consider the case where the selectivity on Age is 10,000/1,000,000
= 0.01 (Age is assumed to be uniformly distributed between 0 and 100). The
selectivity on Gender is 500,000/1,000,000 = 0.5 (Gender is assumed to be
either M or F). It is clear that by using age as the first retrieval condition, 10,000
rows are accessed for testing against the gender condition, versus accessing
500,000 rows if the gender attribute was chosen first. This is a 50 times

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 87
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
performance difference. Selectivities can be used only if statistics are maintained
by the database manager. If this information is not available, then the order of
condition testing often defaults to the order of conditions as specified in the
WHERE clause.

Q4: SELECT *
FROM Patient
WHERE Age = 45 AND Gender = M

In this query, the Age attribute is specified first.

Q5: SELECT *
FROM Patient
WHERE Gender = M AND Age = 45

This query specifies Gender first.

10.5. Uniformity
In many cases the actual data does not follow a uniform distribution. Consider
the case where 95% of the patients live in the province of New Brunswick and
the remaining 5% live in 199 different states and countries of the world. In this
case there are 200 different values for the Area attribute. The selectivity of the
Area attribute, assuming a uniform distribution, is 5,000/1,000,000 = 0.005. Thus,
this attribute will be accessed first given any query with a conjunctive clause
relating Area and Age. In the example below, query Q6 selects Area based on
the province of Ontario. We estimate that (5% of 1,000,000) / 199, or 251
patients live in Ontario. These rows are accessed first and then tested against
the Age condition. Conversely, query Q7 selects patients in the province of New
Brunswick. In this case, 950,000 patient rows are accessed, or more than 3,700
times the number of rows for the Ontario example. The distribution was skewed
sufficiently to result in a poor choice by the query optimizer. Clearly, non-uniform
data distributions can significantly affect query performance.

Q6: SELECT *
FROM Patient
WHERE Area = “Ontario” AND Age = 45

A uniform distribution for out of province residents predicts that 251 patients live
in Ontario.

Q7: SELECT *

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 88
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
FROM Patient
WHERE Area = “New Brunswick” AND Age = 45

Actual data has 950,000 patients living in New Brunswick.

10.6. Disjunctive Clauses


A disjunctive clause occurs when simple conditions are connected by the OR
logical connective rather than AND. These clauses are much harder to process
and optimize. For example, consider query Q8, which uses a disjunctive clause
relating a specific doctor and the patient area of residence. With such a
condition, little optimization can be done because the rows satisfying the query
are the union of the rows satisfying each of the individual conditions. If any one of
the search conditions does not have an access path, then the query optimizer is
compelled to choose a full table scan to satisfy the query. Performance can only
be improved if an access path exists on every condition in the disjunctive clause.
In this case, row sets can be found satisfying each condition and then combined
through applying a union operation across the result sets to eliminate duplicate
rows. However, set union operations can also be expensive. The customary way
to implement union operations is to sort the relations on the same attributes and
then scan the sorted files to eliminate duplicate rows. Superficially, the
differences between query Q8 and Q9 appear trivial, yet the queries can have
profound differences in performance. In many cases the use of disjunctive
clauses in queries results in either a brute force linear search of the table, or a
sort of a potentially large amount of data.

Q8: SELECT *
FROM Patient
WHERE Doctor = 1234 OR Area = “Ontario”

Group one doctor’s patients with Ontario patients.

Q9: SELECT *
FROM Patient
WHERE Doctor = 1234 AND Area = “Ontario”

Identify only the Ontario patients of a particular doctor.

10.7. Join Selectivities

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 89
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The JOIN operation is one of the most time consuming operations in query
processing. A join operation matches two tables across domain compatible
attributes. One common technique for performing a join is a nested (inner-outer)
loop or brute force approach. In this case, for every row in the first table a scan of
the second table is performed and every record is tested for satisfying the join
condition. A second technique is to use an access structure or index to retrieve
the matching records. In this case, for every row in the first table an index is used
to access the matching records from the second table.
One factor that significantly affects performance of the join is the percentage of
rows in one table that will be joined with rows in the other table. This is called the
join selection factor. This factor depends not only on the two tables to be joined,
but also on the join fields if there are multiple join conditions between the two
tables. For example, query Q10 joins each Physician row with the Patient rows.
Each physician is expected to exist once in the Patient table (after all, a physician
is also a patient), but 999,000 patient rows will not be joined. Suppose indexes
exist on each of the join attributes. There are two options for performing the join.
The first retrieves each Patient row and then uses the index into the Physician
table to find the matching record. In this case, no matching records will be found
for those patients who are not also physicians. The second option first retrieves
each Physician row and then uses the index into the Patient table to find the
matching Patient row. In this case, every physician will have one matching
patient row.
It is clear that the second option is more efficient than the first option. This occurs
because the join selection factor of Physician with respect to the join condition is
1. Conversely, the Patient selection factor with respect to the same join condition
is 1,000/1,000,000. Choosing optimum join methods requires that various table
sizes and other statistics be used to compute estimated join selectivities.
Q10: SELECT *
FROM Patient, Physician
WHERE Patient.SSN = Physician.Dr_SSN

Q11: SELECT *
FROM Patient, Physician
WHERE Physician.Dr_SSN = Patient.SSN

If join selectivities are not used, then these two queries can exhibit quite different
performance.

10.8. Views
A view in SQL is a single table that is derived from other tables. A view can be
considered as a virtual table or as a stored query. A view is often used to specify
a frequently used query. This is of particular benefit if tables must be joined or

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 90
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
restricted. One difficulty with views is that a view can hide the query complexity
from the user. For example, view V1 describes a virtual table that contains the
same number of rows as the Physician table. Query Q12 accesses the Patient,
Provider, and Treatment tables through view V1 to determine the total cost of
services that Opthamologists have rendered. Conversely, query Q13 accesses
only the Physician table to retrieve (different) data on Opthamologists. The
problem is that both Q12 and Q13 appear to be of the same order of complexity,
given that knowledge of the view is hidden, yet each query will clearly have a
different performance profile.
V1: CREATE VIEW DrService (Dr, Specialty, Age, TotCost) AS
SELECT Provider, Specialty,Age,Sum(Cost)
FROM Patient, Physician, Treatment
WHERE SSN = Dr_SSN AND DrNum = Provider
GROUP BY Provider

This view matches the Physician table to the Treatment table, and then joins the
result to the Patient table.

Q12: SELECT *
FROM DrService
WHERE Specialty = “Opthamologist”

This query performs a three-way join, through the view.

Q13: SELECT *
FROM Physician
WHERE Specialty = “Opthamologist”

This query simply scans one table.

11. OODBMS
An object-oriented database management system (OODBMS), sometimes
shortened to ODBMS for object database management system), is a database
management system (DBMS) that supports the modelling and creation of data as
objects. This includes some kind of support for classes of objects and the
inheritance of class properties and methods by subclasses and their objects.
There is currently no widely agreed-upon standard for what constitutes an
OODBMS, and OODBMS products are considered to be still in their infancy. In
the meantime, the object-relational database management system (ORDBMS),
the idea that object-oriented database concepts can be superimposed on

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 91
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
relational databases, is more commonly encountered in available products. An
object-oriented database interface standard is being developed by an industry
group, the Object Data Management Group (ODMG). The Object Management
Group (OMG) has already standardized an object-oriented data brokering
interface between systems in a network.

An object-oriented database system must satisfy two criteria:


• It should be a DBMS,
• and it should be an object-oriented system,
i.e., to the extent possible, it should be consistent with the current crop of object-
oriented programming languages. The first criterion translates into five features:
persistence, secondary storage management, concurrency, recovery and an ad
hoc query facility. The second one translates into eight features: complex
objects, object identity, encapsulation, types or classes, inheritance, overriding
combined with late binding, extensibility and computational completeness.

11.1. Characteristics of Object-Oriented Database

Object-oriented database technology is a marriage of object-oriented


programming and database technologies. Given figure illustrates how these
programming and database concepts have come together to provide what we
now call object-oriented databases.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 92
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Perhaps the most significant characteristic of object-oriented database
technology is that it combines object-oriented programming with database
technology to provide an integrated application development system. There are
many advantages to including the definition of operations with the definition of
data. First, the defined operations apply ubiquitously and are not dependent on
the particular database application running at the moment. Second, the data
types can be extended to support complex data such as multi-media by defining
new object classes that have operations to support the new kinds of information.

11.2. Advantage of OODBMS

The OODBMS has many advantages and benefits. First, object-oriented is a


more natural way of thinking. Second, the defined operations of these types of
systems are not dependent on the particular database application running at a
given moment. Third, the data types of object-oriented databases can be
extended to support complex data such as images, digital and audio/video, along
with other multi-media operations. Different benefits of OODBMS are its
reusability, stability, and reliability. Another benefit of OODBMS is that
relationships are represented explicitly, often supporting both navigational and
associative access to information. This translates to improvement in data access
performance versus the relational model.

Another important benefit is that users are allowed to define their own methods of
access to data and how it will be represented or manipulated. The most
significant benefit of the OODBMS is that these databases have extended into
areas not known by the RDBMS. Medicine, multimedia, and high-energy physics
are just a few of the new industries relying on object-oriented databases.

11.3. Disadvantage of OODBMS

As with the relational database method, object-oriented databases also have


disadvantages or limitations. One disadvantage of OODBMS is that it lacks a
common data model. There is also no current standard, since it is still considered
to be in the development stages.

12. ORACLE
The Oracle Database (commonly referred to as Oracle RDBMS or simply Oracle)
consists of a relational database management system (RDBMS) produced and
marketed by Oracle Corporation. As of 2009[update], Oracle remains a major
presence in database computing.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 93
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
12.1. Storage

The Oracle RDBMS stores data logically in the form of tablespaces and
physically in the form of data files. Tablespaces can contain various types of
memory segments, such as Data Segments, Index Segments, etc. Segments in
turn comprise one or more extents. Extents comprise groups of contiguous data
blocks. Data blocks form the basic units of data storage.

Oracle database management tracks its computer data storage with the help of
information stored in the SYSTEM tablespace. The SYSTEM tablespace
contains the data dictionary — and often (by default) indexes and clusters. A
data dictionary consists of a special collection of tables that contains information
about all user-objects in the database.

12.2. Database Schema


Oracle database conventions refer to defined groups of object ownership
(generally associated with a "username") as schemas.

Most Oracle database installations traditionally came with a default schema


called SCOTT. After the installation process has set up the sample tables, the
user can log into the database with the username scott and the password tiger.
The name of the SCOTT schema originated with Bruce Scott, one of the first
employees at Oracle (then Software Development Laboratories), who had a cat
named Tiger.

The SCOTT schema has seen less use as it uses few of the features of the more
recent releases of Oracle. Most recent examples supplied by Oracle Corporation
reference the default HR or OE schemas.

Other default schemas include:

• SYS (essential core database structures and utilities)


• SYSTEM (additional core database structures and utilities, and privileged
account)
• OUTLN (utilized to store metadata for stored outlines for stable query-
optimizer execution plans.
• BI, IX, HR, OE, PM, and SH (expanded sample schemas containing more
data and structures than the older SCOTT schema).

12.3. Memory architecture

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 94
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
Each Oracle instance uses a System Global Area or SGA — a shared-memory
area — to store its data and control-information.

Each Oracle instance allocates itself an SGA when it starts and de-allocates it at
shut-down time. The information in the SGA consists of the following elements,
each of which has a fixed size, established at instance startup:

• the database buffer cache: this stores the most recently-used data blocks.
These blocks can contain modified data not yet written to disk (sometimes
known as "dirty blocks"), unmodified blocks, or blocks written to disk since
modification (sometimes known as clean blocks). Because the buffer
cache keeps blocks based on a most-recently-used algorithm, the most
active buffers stay in memory to reduce I/O and to improve performance.
• the redo log buffer: this stores redo entries — a log of changes made to
the database. The instance writes redo log buffers to the redo log as
quickly and efficiently as possible. The redo log aids in instance recovery
in the event of a system failure.
• the shared pool: this area of the SGA stores shared-memory structures
such as shared SQL areas in the library cache and internal information in
the data dictionary. An insufficient amount of memory allocated to the
shared pool can cause performance degradation.

12.3.1. Library cache

The library cache stores shared SQL, caching the parse tree and the execution
plan for every unique SQL statement.

If multiple applications issue the same SQL statement, each application can
access the shared SQL area. This reduces the amount of memory needed and
reduces the processing-time used for parsing and execution planning.

12.3.2. Data dictionary cache

The data dictionary comprises a set of tables and views that map the structure of
the database.

Oracle databases store information here about the logical and physical structure
of the database. The data dictionary contains information such as:

• user information, such as user privileges


• integrity constraints defined for tables in the database
• names and datatypes of all columns in database tables
• information on space allocated and used for schema objects

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 95
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
The Oracle instance frequently accesses the data dictionary in order to parse
SQL statements. The operation of Oracle depends on ready access to the data
dictionary: performance bottlenecks in the data dictionary affect all Oracle users.
Because of this, database administrators should make sure that the data
dictionary cache has sufficient capacity to cache this data. Without enough
memory for the data-dictionary cache, users see a severe performance
degradation. Allocating sufficient memory to the shared pool where the data
dictionary cache resides precludes these particular performance problems.

12.3.3. Program Global Area

The Program Global Area or PGA memory-area of an Oracle instance contains


data and control-information for Oracle's server-processes.

The size and content of the PGA depends on the Oracle-server options installed.
This area consists of the following components:

• stack-space: the memory that holds the session's variables, arrays, and so
on.
• session-information: unless using the multithreaded server, the instance
stores its session-information in the PGA. (In a multithreaded server, the
session-information goes in the SGA.)
• private SQL-area: an area in the PGA which holds information such as
bind-variables and runtime-buffers.
• sorting area: an area in the PGA which holds information on sorts, hash-
joins, etc.

12.4. Configuration

Database administrators control many of the tunable variations in an Oracle


instance by means of values in a parameter file. This file in its ASCII default form
("pfile") normally has a name of the format init<SID-name>.ora. The default binary
equivalent server paramater file ("spfile") (dynamically reconfigurable to some
extent) defaults to the format spfile<SID-name>.ora. Within an SQL-based
environment, the views V$PARAMETER and V$SP_PARAMETER give access to
reading parameter values.

13. OBJECTIVE QUESTIONS

1. The advantages of Standard Query Language (SQL) include which of the


following in relation to GIS databases?
a. It is widely used.
b. It is good at handling geographical concepts.

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 96
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
c. It is simple and easy to understand.
d. It uses a pseudo-English style of questioning.

2. Which of the following are characteristics of an RDBMS?


a. Tables are linked by common data known as keys.
b. It cannot use SQL.
c. Queries are possible on individual or groups of tables.
d. Keys may be unique or have multiple occurrences in the database.
e. Data are organized in a series of two-dimensional tables each of
which contains records for one entity.

3. What is a 'tuple'?
a. An attribute attached to a record.
b. Another name for the key linking different tables in a database.
c. A row or record in a database table.
d. Another name for a table in an RDBMS.

4. Which of the following are issues to be considered by users of large


corporate GIS databases?
a. The need for multiple copies of the same data and subsequent
merging after separate updates.
b. The need for manual transfer of records to paper.
c. The need for concurrent access and multi-user update.
d. The need to manage long transactions.
e. The need for multiple views or different windows into the same
databases.

5. Which of the following are features of the object-oriented approach to


databases?
a. The ability to develop more realistic models of the real world.
b. The ability to develop databases using natural language
approaches.
c. The ability to represent the world in a non-geometric way.
d. The need to split objects into their component parts.
e. The ability to develop database models based on location rather
than state and behavior.

6. Redundancy is minimized with a computer based database approach.


a. True
b. False

7. The relational database model is based on concepts proposed in the


1960s and 1970s.
a. True
b. False

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 97
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
8. A row in a database can also be called a domain.
a. True
b. False

9. A first step in database creation should be needs analysis.


a. True
b. False

10. In entity attribute modelling a many to many relationship is represented by


M:M.
a. True
b. False

11. In a networked web based GIS all communications must go through an


internet map server.
a. True
b. False

12. In an OO database approach 'object = attributes + behaviour'.


a. True
b. False

13. In an OO database objects may inherit some or all of the characteristics of


other objects.
a. True
b. False

14. Referring to the following table, what type of relationship exists between
the Product table and the Manufacturer table?

PRODUCT
=======
Product ID
Product Description
Manufacturer ID

MANUFACTURER
============
Manufacturer ID
Manufacturer Name

a. Product - Many; Manufacturer - Many

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 98
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
b. Product - One or Many; Manufacturer - One or Many
c. Product - Many; Manufacturer - One
d. Product - One; Manufacturer - One
e. Product - One; Manufacturer - Many

15. You are writing a database application to run on your DBMS. You do not
want your users to be able to view the underlying table structures. At the
same time you want to allow certain update operations.

Referring to the above scenario, what structure will you deploy?

a. Cursor table
b. Table filter
c. Dynamic procedure
d. View
e. Summary table

16. You are defining the operational process of your RDBMS.

Referring to the scenario above, which one of the following is a valid


ongoing "operational process?"

a. OS requirement
b. User analysis
c. Performance monitoring
d. Data dictionary specification
e. System requirement

17. You have been asked to construct a query in the company's RDBMS. You
have deployed a Right Outer Join operation.

Referring to the scenario above, what will happen to the final results when
there is NO match between the tables?

a. The right table will return ALL rows.


b. The right table will return NULL.
c. Both tables will return NULL.
d. The left table will return ALL rows.
e. The left table will return NULL.

18. Which phase of the data modeling process contains security review?

a. Structure
b. Design issue
c. Data source

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 99
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
d. Storage issue
e. Operational process

19. Which one of the following is NOT a characteristic of metadata?

a. Data about data


b. Describes a data dictionary
c. Self-describing
d. Includes user data
e. Supports its own structure

20. Which one of the following capabilities do you expect to see in a majority
of RDBMS extensions to ANSI SQL-92?

a. Encryption key management


b. Graphical User Interface Widgets
c. Thread creation, execution, & coordination
d. Network socket creation/operation
e. If/Then, for, do/while statements

21. What can a mandatory one to one relationship indicate?

a. More entities are needed.


b. The model should be denormalized.
c. The tables are not properly indexed.
d. The model cannot be implemented physically.
e. More attributes are needed.

22. For performance, you denormalize your database design and create some
redundant columns.

Referring to the scenario above, what RDBMS construct can you use to
automatically prevent the repeated columns from getting out of sync?

a. Cursors
b. Constraints
c. Views
d. Stored procedures
e. Trigger

23. You are running a query against a relational database.

Referring to the scenario above, what clause or command do you use in


the query to help avoid a costly tablescan?

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 100
TRAJECTORY EDUCATION SCHOOL OF COMPUTER SCIENCE
THE NO 1 INSTITUTE FOR UGC-NET IN COMPUTER SCIENCE
a. GROUP BY clause
b. INDEX command
c. HAVING clause
d. FROM clause
e. WHERE clause [/quote]

Main Office, 126 2nd Floor, Kingsway Camp, Delhi-09, 011-47041845,


www.trajectoryeducation.com/socm
Page no. 101

Das könnte Ihnen auch gefallen