Information Technology
Database Management
System
Semester I
Amity
University
Database Management System is primary ingredients of modern computing systems. Although
database concepts, technology and architectures have been developed and consolidated in the last
three decades, many aspects are subject to technological evolution and revolution. Thus,
developing a study material on this classical and yet continuously evolving field is a great
challenge.
Key features
This study material provides a widespread treatment of databases, dealing with the complete
syllabus for both an introductory course and an advanced course on databases. It offers a
balanced view of concepts, languages and architectures, with concrete reference to current
technology and to commercial database management systems (DBMS). It originates from the
authors experience in teaching, both the UG and PG classes for theory and application.
The study material is composed of seven chapters. Chapter 1 and 2 are designed to expose
students to the fundamental principles of database management and RDBMS concepts. It gives
an idea of how to design a database and develop its schema.Discussion of design techniques
starts with the introduction of the elements of the E-R (Entity-Relationship) model and proceeds
through a well-defined, staged process through conceptual design to the logical design, which
produces a relational schema.
Chapter 3 and 4 are devoted to advanced concepts, including Normalization, Functional
Dependency and use of structure query language required for mastering database technology.
Chapter 5 describes the fundamental and advance concept of procedural query language
commonly known as PL SQL. It improves the power of structure query language. PL/SQL
technology is like an engine that executes PL/SQL blocks and subprograms. This engine can be
started in Oracle server or in application development tools such as Oracle Forms, Oracle
Reports etc.
Chapter 6 and 7 is focusing on many advance concepts of Database systems including the
concept of Transaction Management, Concurrency Control Technology and Backup and
Recovery methods of database system.
Updated Syllabus
Course Contents:
Text:
Fundamental of Database Systems, Elmasri & Navathe, Pearson Education, Asia
Data Base Management System, Leon & Leon, Vikas Publications
Database System Concepts, Korth & Sudarshan, TMH
References:
Introduction to Database Systems, Bipin C Desai, Galgotia
Oracle 9i The Complete Reference, Oracle Press
Index:
1. Introductory Concepts
There are innumerable numbers of Database Management System (DBMS) Software available in
the market. Some of the most popular ones include Oracle, IBM’s DB2, Microsoft Access,
Microsoft SQL Server, MySQL. MySQL is, one of the most popular database management
systems used by online entrepreneurs is one example of an object-oriented DBMS. Microsoft
Access (another popular DBMS) on the other hand is not a fully object oriented system, even
though it does exhibit certain aspects of it.
Example: A database may contain detailed student information, certain users may only be
allowed access to student names , addresses and Phone number, while others user may be able
to view payment detail of students or marks detail of student. Access and change logs can be
programmed to add even more security to a database, recording the date, time and details of any
user making any alteration to the database.
Furthermore, the Database Management Systems employ the use of a query language and report
writers to interrogate the database and analyze its data. Queries allow users to search, sort, and
analyze specific data by granting users efficient access to the required information.
Example: one would use a query command to make the system retrieve data regarding all
courses of a particular department. The most common query language used to access database
systems is the Structured Query Language (SQL).
3.1 Hierarchical databases organize data under the premise of a basic parent/child relationship.
Each parent can have many children, but each child can only have one parent. In hierarchical
databases, attributes of specific records are listed under an entity type and entity types are
connected to each other through one-to-many relationships, also known as 1:N mapping.
Originally, hierarchical relationships were most commonly used in mainframe systems, but with
the advent of increasingly complex relationship systems, they have now become too restrictive
and are thus rarely used in modern databases. If any of the one-to-many relationships are
compromised, for e.g. an employee having more than one manager, the database structure
switches from hierarchical to a network.
3.2 Network model: In the network model of a database it is possible for a record to have
multiple parents, making the system more flexible compared to the strict single-parent model of
the hierarchical database. The model is made to accommodate many to many relationships,
which allows for a more realistic representation of the relationships between entities. Even
though the network database model enjoyed popularity for a short while, it never really lifted of
the ground in terms of staging a revolution. It is now rarely used because of the availability of
more competitive models that boast the higher flexibility demanded in today’s ever advancing
age.
3.3 Relational databases (RDBMS) are completely unique when compared to the
aforementioned models as the design of the records is organized around a set of tables (with
unique identifiers) to represent both the data and their relationships. The fields to be used for
matching are often indexed in order to speed up the process and the data can be retrieved and
manipulated in a number of ways without the need to reorganize the original database tables.
Working under the assumption that file systems (which often use the hierarchical or network
models) are not considered databases, the relational database model is the most commonly used
system today. While the concepts behind hierarchical and network database models are older
than that of the relational model, the latter was in fact the first one to be formally defined.
After the relational DBMS soared to popularity, the most recent development in DMBS
technology came in the form of the object-oriented database model, which offers more flexibility
than the hierarchical, network and relational models put together. Under this model, data exists
in the form of objects, which include both the data and the data’s behavior. Certain modern
information systems contain such convoluted combinations of information that traditional data
models (including the RDBMS) remain too restrictive to adequately model this complex data.
The object-oriented model also exhibits better cohesion and coupling than prior models, resulting
in a database which is not only more flexible and more manageable but also the most able when
it comes to modeling real-life processes. However, due to the immaturity of this model, certain
problems are bound to arise, some major ones being the lack of an SQL equivalent as well as
lack of standardization. Furthermore, the most common use of the object oriented model is to
have an object point to the child or parent OID (object I.D.) to be retrieved; leaving many
programmers with the impression that the object oriented model is simply a reincarnation of the
network model at best. That is, however, an attempt at the over-simplification of an innovative
technology.
4. Components of a DBMS
Components of a Data Base Management System (DBMS) is well illustrated by the diagram
shown bellow.
4.1. Database Engine: Database Engine is the foundation for storing, processing, and securing
data. The Database Engine provides controlled access and rapid transaction processing to meet the
requirements of the most demanding data consuming applications within your enterprise. Use the
Database Engine to create relational databases for online transaction processing or online analytical
processing data. This includes creating tables for storing data, and database objects such as
indexes, views, and stored procedures for viewing, managing, and securing data. You can use SQL
Server Management Studio to manage the database objects, and SQL Server Profiler for capturing
server events.
4.2. Data dictionary: A data dictionary is a reserved space within a database which is used to store
information about the database itself. A data dictionary is a set of table and views which can only
be read and never altered. Most data dictionaries contain different information about the data used
in the enterprise. In terms of the database representation of the data, the data table defines all
schema objects including views, tables, clusters, indexes, sequences, synonyms, procedures,
packages, functions, triggers and many more. This will ensure that all these things follow one
standard defined in the dictionary. The data dictionary also defines how much space has been
allocated for and / or currently in used by all the schema objects. A data dictionary is used when
finding information about users, objects, schema and storage structures. Every time a data
definition language (DDL) statement is issued, the data dictionary becomes modified.
A data dictionary may contain information such as:
Database design information
Stored SQL procedures
User permissions
User statistics
Database process information
Database growth statistics
Database performance statistics
4.3. Query Processor: A relational database consists of many parts, but at its heart are two major
components: the storage engine and the query processor. The storage engine writes data to and
reads data from the disk. It manages records, controls concurrency, and maintains log files. The
query processor accepts SQL syntax, selects a plan for executing the syntax, and then executes the
chosen plan. The user or program interacts with the query processor, and the query processor in
turn interacts with the storage engine. The query processor isolates the user from the details of
execution: The user specifies the result, and the query processor determines how this result is
obtained. The query processor components include
DDL interpreter
DML compiler
Query evaluation engine
4.4. Report writer: Also called a report generator, a program, usually part of a database
management system that extracts information from one or more files and presents the information
in a specified format. Most report writers allow you to select records that meet certain conditions
and to display selected fields in rows and columns. You can also format data into pie charts, bar
charts, and other diagrams. Once you have created a format for a report, you can save the format
specifications in a file and continue reusing it for new data.
5. Database Languages
5.1 Data Definition Language (DDL): Data Definition Language (DDL). It is use to define the
structure of a Database. The database structure definition (Schema) typically includes the
following:
Defining all data element, Defining data element field and records, Defining the name, field
length, and field type for each data type, Defining control for field that can have only selective
values.
Typical DDL operations (with their respective keywords in the structured query language SQL):
5.2 Data Manipulation Language (DML): Data Manipulation Language (DML) Once the
structure is defined the database is ready for entry and manipulation of data. Data Manipulation
Language (DML) includes the command to enter and manipulate the Data, with these commands
the user can Add new records, navigate through the existing records, view contents of various
fields, modify the data, delete the existing record, sort the record in desired sequence. Typical
DML operations (with their respective keywords in the structured query language SQL):
5.3 Data Control Language (DCL): Data control commands in SQL control access privileges
and security issues of a database system or parts of it. These commands are closely related to the
DBMS (Database Management System) and can therefore vary in different SQL
implementations. Some typical commands are:
Since these commands depend on the actual database management system (DBMS), we will not
cover DCL in this module.
6. Database USER
6.1 Database Administrator (DBA): The DBA is a person or a group of persons who is
responsible for the management of the database. The DBA is responsible for authorizing access
to the database by grant and revoke permissions to the users, for coordinating and monitoring its
use, managing backups and repairing damage due to hardware and/or software failures and for
acquiring hardware and software resources as needed. In case of small organization the role of
DBA is performed by a single person and in case of large organizations there is a group of
DBA's who share responsibilities.
6.2 Database Designers: They are responsible for identifying the data to be stored in the
database and for choosing appropriate structure to represent and store the data. It is the
responsibility of database designers to communicate with all prospective of the database users in
order to understand their requirements so that they can create a design that meets their
requirements.
6.3 End Users: End Users are the people who interact with the database through applications or
utilities. The various categories of end users are:
• Casual End Users - These Users occasionally access the database but may need
different information each time. They use sophisticated database Query language to specify their
requests. For example: High level Managers who access the data weekly or biweekly.
• Native End Users - These users frequently query and update the database using standard
types of Queries. The operations that can be performed by this class of users are very limited and
effect precise portion of the database.
For example: - Reservation clerks for airlines/hotels check availability for given request and
make reservations. Also, persons using Automated Teller Machines (ATM's) fall under this
category as he has access to limited portion of the database.
• Standalone end Users/On-line End Users - Those end Users who interact with the
database directly via on-line terminal or indirectly through Menu or graphics based Interfaces.
For example: - User of a text package, library management software that store variety of library
data such as issue and return of books for fine purposes.
Application Programmers are responsible for writing application programs that use the database.
These programs could be written in General Purpose Programming languages such as Visual
Basic, Developer, C, FORTRAN, COBOL etc. to manipulate the database. These application
programs operate on the data to perform various operations such as retaining information,
creating new.
7. ADVANTAGES OF DBMS
The DBMS (Database Management System) is preferred over the conventional file
processing system due to the following advantages:
Controlling Data Redundancy - In the conventional file processing system, every user group
maintains its own files for handling its data files. This may lead to
Duplication of same data in different files.
Wastage of storage space.
Errors may be generated due to updating of the same data in different files.
Time in entering data again and again is wasted.
Computer Resources are needlessly used.
It is very difficult to combine information.
The entire above mentioned problem was eliminated in Database Management System.
Elimination of Inconsistency - In the file processing system information is duplicated
throughout the system. So changes made in one file may be necessary be carried over to another
file. This may lead to inconsistent data. So we need to remove this duplication of data in multiple
file to eliminate inconsistency.
For example: - Let us consider an example of student's result system. Suppose that in
STUDENT file it is indicated that Roll no= 10 has opted for 'Computer' course but in RESULT
file it is indicated that 'Roll No. =l0' has opted for 'Accounts' course. Thus, in this case the two
entries for particular student don't agree with each other. Thus, database is said to be in an
inconsistent state. Science to eliminate this conflicting information we need to centralize the
database. On centralizing the data base the duplication will be controlled and hence
inconsistency will be removed. Data inconsistency are often encountered in every day life
Consider an another example, we have all come across situations when a new address is
communicated to an organization that we deal it (Eg - Telecom, Gas Company, Bank). We find
that some of the communications from that organization are received at a new address while
other continued to be mailed to the old address. So combining all the data in database would
involve reduction in redundancy as well as inconsistency so it is likely to reduce the costs for
collection storage and updating of Data.
Better service to the users - A DBMS is often used to provide better services to the users. In
conventional system, availability of information is often poor, since it normally difficult to
obtain information that the existing systems were not designed for. Once several conventional
systems are combined to form one centralized database, the availability of information and its
update ness is likely to improve since the data can now be shared and DBMS makes it easy to
respond to anticipated information requests.
Centralizing the data in the database also means that user can obtain new and combined
information easily that would have been impossible to obtain otherwise. Also use of DBMS
should allow users that don't know programming to interact with the data more easily, unlike file
processing system where the programmer may need to write new programs to meet every new
demand.
Flexibility of the System is improved - Since changes are often necessary to the contents of the
data stored in any system, these changes are made more easily in a centralized database than in a
conventional system. Applications programs need not to be changed on changing the data in the
database. This will also maintain the consistency and integrity of data into the database.
Integrity can be improved - Since data of the organization using database approach is
centralized and would be used by a number of users at a time. It is essential to enforce integrity-
constraints.
In the conventional systems because the data is duplicated in multiple files so updating or
changes may sometimes lead to entry of incorrect data in some files where it exists.
For example: - The example of result system that we have already discussed. Since multiple files
are to maintained, as sometimes you may enter a value for course which may not exist. Suppose
course can have values (Computer, Accounts, Economics, and Arts) but we enter a value 'Hindi'
for it, so this may lead to an inconsistent data, so lack of Integrity. Even if we centralized the
database it may still contain incorrect data. For example: -
• Salary of full time employ may be entered as Rs. 500 rather than Rs. 5000.
• A student may be shown to have borrowed books but has no enrollment.
• A list of employee numbers for a given department may include a number of non existent
employees. These problems can be avoided by defining the validation procedures whenever any
update operation is attempted.
Standards can be enforced - Since all access to the database must be through DBMS, so
standards are easier to enforce. Standards may relate to the naming of data, format of data,
structure of the data etc. Standardizing stored data formats is usually desirable for the purpose of
data interchange or migration between systems.
For example: - A DBA must choose best file Structure and access method to give fast response
for the high critical applications as compared to less critical applications.
Overall cost of developing and maintaining systems is lower - It is much easier to respond to
unanticipated requests when data is centralized in a database than when it is stored in a
conventional file system. Although the initial cost of setting up of a database can be large, one
normal expects the overall cost of setting up of a database, developing and maintaining
application programs to be far lower than for similar service using conventional systems, Since
the productivity of programmers can be higher in using non-procedural languages that have been
developed with DBMS than using procedural languages.
Data Model must be developed - Perhaps the most important advantage of setting up of
database system is the requirement that an overall data model for an organization be build. In
conventional systems, it is more likely that files will be designed as per need of particular
applications demand. The overall view is often not considered. Building an overall view of an
organization's data is usual cost effective in the long terms.
Provides backup and Recovery - Centralizing a database provides the schemes such as
recovery and backups from the failures including disk crash, power failures, software errors
which may help the database to recover from the inconsistent state to the state that existed prior
to the occurrence of the failure, though methods are very complex.
8. Three-Schemes Architecture
The objective of Three-Schemes Architecture is to separate the user application program and the
physical database. The Three schema architecture is an effective tool with which the user can
visualize the schema levels in a database system. The three levels ANSI architecture has an
important place in database technology development because it clearly separates the users’
external level, the system’s conceptual level, and the internal storage level for designing a
database. In three-schemas architecture schemas can be defined at three different levels.
8.3 Conceptual Schema: It describes the structure of the whole database for the entire user
community. The conceptual schema hides the details of physical storage structure and
concentrates on describing entities, data types, relationships and constraints. This
implementation of conceptual schema is based on conceptual schema design in a high level data
model.
9. Data Independence:
With knowledge about the three-scheme architecture the term data independence can be
explained as followed: Each higher level of the data architecture is immune to changes of the
next lower level of the architecture.
Data independence is normally thought of in terms of two levels or types. Logical data
independence makes it possible to change the structure of the data independently without
modifying the application programs that make use of the data. There is no need to rewrite current
applications as part of the process of adding to or removing data from then system.
The second type or level of data independence is known as physical data independence. This
approach has to do with altering the organization or storage procedures related to the data, rather
than modifying the data itself. Accomplishing this shift in file organization or the indexing
strategy used for the data does not require any modification to the external structure of the
applications, meaning that users of the applications are not likely to notice any difference at all in
the function of their programs.
Database Instance: The term instance is typically used to describe a complete database
environment, including the RDBMS software, table structure, stored procedures and other
functionality. It is most commonly used when administrators describe multiple
instances of the same database. Also Known As: environment
Relational Schema: A relation schema can be thought of as the basic information describing a
table or relation. This includes a set of column names, the data types associated with each
column, and the name associated with the entire table.
The Entity - Relationship Model (E-R Model) is a high-level conceptual data model developed
by Chen in 1976 to facilitate database design. Conceptual Modeling is an important phase in
designing a flourishing database. A conceptual data model is a set of concepts that describe the
structure of a database and associated retrieval and update transactions on the database. A high
level model is chosen so that all the technical aspects are also covered. The E-R data model grew
out of the exercise of using commercially available DBMS to model the database. The E-R
model is the generalization of the earlier available commercial models like the Hierarchical and
the Network Model. It also allows the representation of the various constraints as well as their
relationships.
So to sum up, the Entity-Relationship (E-R) Model is based on a view of a real world that
consists of set of objects called entities and relationships among entity sets which are basically a
group of similar objects. The relationships between entity sets is represented by a named E-R
relationship and is of 1:1, 1: N or M: N type which tells the mapping from one entity set to
another.
An ER Diagram
Entity
An entity is an real world objects (living or non living) or concept about which you want to
store information..
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it
identify the instances of entity set.. For example, an employee's social security number might be
Derived attribute
A derived attribute is based on another attribute. For example, an employee's monthly salary is
Relationships
Relationships illustrate how two entities share information in the database structure.
First, connect the two entities, then drop the relationship notation on the line.
Cardinality
Cardinality specifies how many instances of an entity relate to one instance of another entity.
ordinality is also closely linked to cardinality. While cardinality specifies the occurrences of a
relationship, ordinality describes the relationship as either mandatory or optional. In other words,
cardinality specifies the maximum number of relationships and ordinality specifies the absolute
Recursive relationship
In some cases, entities can be self-linked. For example, employees can supervise other
employees.
10.3 How to design an Effective ER Diagrams
1) Make sure that each entity only appears once per diagram.
3) Examine relationships between entities closely. Are they necessary? Are there any
relationships missing? Eliminate any redundant relationships. Don't connect relationships to each
other.
Using colors can help you highlight important features in your diagram
5) Create a polished diagram by adding shadows and color. You can choose from a number of
ready-made styles in the Edit menu under Colors and Shadows, or you can create your own.
1. The E-R diagram used for representing E-R Model can be easily converted into Relations
(tables) in Relational Model.
2. The E-R Model is used for the purpose of good database design by the database developer so
to use that data model in various DBMS.
3. It is helpful as a problem decomposition tool as it shows the entities and the relationship
between those entities.
4. It is inherently an iterative process. On later modifications, the entities can be inserted into this
model.
5. It is very simple and easy to understand by various types of users and designers because
specific standards are used for their representation.
For example, members of entity Employee can be grouped further into Secretary, Engineer,
Manager, Technician, Salaried_Employee.
The set listed is a subset of the entities that belong to the Employee entity, which means that
every entity that belongs to one of the sub sets is also an Employee.
Each of these sub-groupings is called a subclass, and the Employee entity is called the super-
class.
An entity cannot only be a member of a subclass; it must also be a member of the super-
class.
An entity can be included as a member of a number of sub classes, for example, a Secretary
may also be a salaried employee, however not every member of the super class must be a
member of a sub class.
Type Inheritance
The type of an entity is defined by the attributes it possesses, and the relationship types it
participates in.
Because an entity in a subclass represents the same entity from the super class, it should
possess all the values for its attributes, as well as the attributes as a member of the super
class.
This means that an entity that is a member of a subclass inherits all the attributes of the entity
as a member of the super class; as well, an entity inherits all the relationships in which the
super class participates.
Employee Department
Work
For
Specialization
The process of defining a set of subclasses of a super class.
Specialization is the top-down refinement into (super) classes and subclasses
The set of sub classes is based on some distinguishing characteristic of the super class.
For example, the set of sub classes for Employee, Secretary, Engineer, Technician,
differentiates among employee based on job type.
There may be several specializations of an entity type based on different distinguishing
characteristics.
Another example is the specialization, Salaried_Employee and Hourly_Employee, which
distinguish employees based on their method of pay.
To represent a specialization, the subclasses that define a specialization are attached by lines
to a circle that represents the specialization, and is connected to the super class.
The subset symbol (half-circle) is shown on each line connecting a subclass to a super class,
indicates the direction of the super class/subclass relationship.
Attributes that only apply to the sub class are attached to the rectangle representing the
subclass. They are called specific attributes.
A sub class can also participate in specific relationship types. See Example.
Employee Department
Work
For
Belongs Professional
To
Organization
Certain attributes may apply to some but not all entities of a super class. A subclass is
defined in order to group the entities to which the attributes apply.
The second reason for using subclasses is that some relationship types may be participated in
only by entities that are members of the subclass.
Summary of Specialization
Allows for:
Defining set of subclasses of entity type
Create additional specific attributes for each sub class
Create additional specific relationship types between each sub class and other entity types or
other subclasses.
Generalization
Types of Specializations
Example:
The condition is called the defining predicate of the sub class.
The condition is a constraint specifying exactly those entities of the Employee entity type
whose attribute value for Job Type is Secretary belong to the subclass.
Predicate defined subclasses are displayed by writing the predicate condition next to the line
that connects the subclass to the specialization circle.
Attribute-defined specialization
If all subclasses in a specialization have their membership condition on the same attribute of
the super class, the specialization is called an attribute-defined specialization, and the
attribute is called the defining attribute.
Attribute-defined specializations are displayed by placing the defining attribute name next to
the arc from the circle to the super class.
User-defined specialization
When we do not have a condition for determining membership in a subclass the subclass is
called user-defined.
Membership to a subclass is determined by the database users when they add an entity to the
subclass.
Specifies that the subclass of the specialization must be disjoint, which means that an entity
can be a member of, at most, one subclass of the specialization.
The d in the specialization circle stands for disjoint.
If the subclasses are not constrained to be disjoint, they overlap.
Overlap means that an entity can be a member of more than one subclass of the
specialization.
Overlap constraint is shown by placing an o in the specialization circle.
Completeness Constraint
Disjoint constraints and completeness constraints are independent. The following possible
constraints on specializations are possible:
Disjoint, total
Department
Academic Administrative
Employee
Disjoint, partial
Part
Manufactured Puchased
Overlapping, partial
Movie
Q2. A relationship is
(a) an item in an application
(b) a meaningful dependency between entities
(c) a collection of related entities
(d) related data
Q4. In three schemas architecture a specific view of data given to a particular user is defined at
a) Internal Level
b) External Level
c) Conceptual Level
d) Physical Level
Q11. Vehicle identification number, color, weight, and horsepower best exemplify:
a.) entities.
b.) entity types.
c.) data markers.
d.) attributes.
Q12. If each employee can have more than one skill, then skill is referred to as a:
a.) gerund.
b.) multivalued attribute.
c.) nonexclusive attribute.
d.) repeating attribute
A Relational Database Management System (RDBMS) provides a complete and integrated move
towards information management. A relational model provides the basis for a relational
database. A relational model has three aspects:
Structures
Operations
Integrity rules
Structures consist of a collection of objects or relations that store data. An example of relation is
a table. You can store information in a table and use the table to retrieve and modify data.
Operations are used to manipulate data and structures in a database. When using operations.
You must stick to a predefined set of integrity rules.
Integrity rules are laws that govern the operations allowed on data in a database. This ensures
data accuracy and consistency.
A Table is a basic storage structure of an RDBMS and consists of columns and rows. A table
represents an entity. For example, the S_DEPT table stores information about the departments of
an organization.
A Row is a combination of column values in a table and is identified by a primary key. Rows are
also known as records. For example, a row in the table S_DEPT contains information about one
department.
A Column is a collection of one type of data in a table. Columns represent the attributes of an
object. Each column has a column name and contains values that are bound by the same type and
size. For example, a column in the table S_DEPT specifies the names of the departments in the
organization.
A Field is an intersection of a row and a column. A field contains one data value. If there is no
data in the field, the field is said to contain a NULL value.
Figure Table, Row, Column & Field
A Primary key is a column or a combination of columns that is used to uniquely identify each
row in a table. For example, the column containing department numbers in the S_DEPT table is
created as a primary key and therefore every department number is different. A primary key must
contain a value. It cannot contain a NULL value.
A Foreign key is a column or set of columns that refers to a primary key in the same table or
another table. You use foreign keys to establish principle connections between, or within, tables.
A foreign key must either match a primary key or else be NULL. Rows are connected logically
when required. The logical connections are based upon conditions that define a relationship
between corresponding values, typically between a primary key and a matching foreign key. This
relational method of linking provides great flexibility as it is independent of physical links
between records.
Figure Primary & Foreign key
RDBMS Properties
An RDBMS is easily accessible. You execute commands in the Structured Query Language
(SQL) to manipulate data. SQL is the international Standards Organization (ISO) standard
language for interacting with a RDBMS.
An RDBMS provides full data independence. The organization of the data is independent of the
applications that use it. You do not need to specify the access routes to tables or know how data
is physically arranged in a database.
A relational database is a collection of individual, named objects. The basic unit of data storage
in a relational database is called a table. A table consists of rows and columns used to store
values. For access purpose, the order of rows and columns is insignificant. You can control the
access order as required.
Figure SQL & Database
When querying the database, you use conditional operations such as joins and restrictions. A join
combines data from separate database rows. A restriction limits the specific rows returned by a
query.
The constraint, entity, ensures uniqueness of rows, and the constraint column ensures
consistency of the type of data within a column. The other type, referential, ensures validity of
foreign keys, and user-defined constraints are used to enforce specific business rules.
An RDBMS minimizes the redundancy of data. This means that similar data is not
3. Codd's 12 rules
Codd's 12 rules are a set of twelve rules proposed by E. F. Codd, a pioneer of the relational
model for databases, designed to define what is required from a database management system in
order for it to be considered relational, i.e., an RDBMS. Codd produced these rules as part of a
personal campaign to prevent his vision of the relational database being diluted.
Table B
EmpNo DeptID EmpName
1001 F-1001 Tommy
1002 S-2012 Will
1003 H-0001 Jonathan
4. Relational algebra
Relational algebra is a procedural query language, which consists of a set of operations that take
one or two relations as input and produce a new relation as their result. The fundamental
operations that will be discussed in this section are: select, project, union, and set difference.
Besides the fundamental operations, the following additional operations will be discussed: set-
intersection.
Each operation will be applied to tables of a sample database. Each table is otherwise known as a
relation and each row within the table is referred to as a tuple. The sample database consists of
tables in which one might see in a bank. The sample database consists of the following 6
relations:
Account
branch-name account-number balance
Downtown A-101 500
Mianus A-215 700
Perryridge A-102 400
Round Hill A-305 350
Brighton A-201 900
Redwood A-222 700
Brighton A-217 750
Branch
branch-name branch-city assets
Downtown Brooklyn 9000000
Redwood Palo Alto 2100000
Perryridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
Customer
customer-name customer-street customer-city
Jones Main Harrison
Smith North Rye
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford
Depositor
customer-name account-number
Johnson A-101
Smith A-215
Hayes A-102
Turner A-305
Johnson A-201
Jones A-217
Lindsay A-222
Loan
branch-name loan-number amount
Downtown L-17 1000
Redwood L-23 2000
Perryridge L-15 1500
Downtown L-14 1500
Mianus L-93 500
Round Hill L-11 900
Perryridge L-16 1300
Borrower
customer-name loan-number
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Adams L-16
The Select operation is a unary operation, which means it operates on one relation. Its function is
to select tuples that satisfy a given predicate. To denote selection, the lowercase Greek letter
sigma ( ) is used. The predicate appears as a subscript to . The argument relation is given in
parentheses following the .
For example, to select those tuples of the loan relation where the branch is "Perryridge," we
write:
Comparisons like =, , <, >, can also be used in the selection predicate. An example query using
a comparison is to find all tuples in which the amount lent is more than $1200 would be written:
Another more complicated example query is to find those customers who live in Harrison is
written as:
Customer-name ( customer-city = "Harrison" (customer
The union operation yields the results that appear in either or both of two relations. It is a binary
and to find the names of all customers with an account in the bank, we would write:
Customer-name (depositor)
Then by using the union operation on these two queries we have the query we need to obtain the
wanted results. The final query is written as:
customer-name
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
The set intersection operation is denoted by the symbol . It is not a fundamental operation,
however it is a more convenient way to write r - (r - s).
An example query of the operation to find all customers who have both a loan and and account
can be written as:
Set Difference Operation Set difference is denoted by the minus sign ( ). It finds tuples that are
in one relation, but not in another. Thus results in a relation containing tuples that are in
but not in .
Cartesian Product Operation The Cartesian product of two relations is denoted by a cross ( ),
written
The result of is a new relation with a tuple for each possible pairing of tuples from
and .
Chapter-2
RELATIONAL DATABASE MODEL
End Chapter quizzes:
1. Functional Dependency
Consider a relation R that has two attributes A and B. The attribute B of the relation is
functionally dependent on the attribute A if and only if for each value of A no more than one
value of B is associated. In other words, the value of attribute A uniquely determines the value of
B and if there were several tuples that had the same value of A then all these tuples will have an
identical value of attribute B. That is, if t1 and t2 are two tuples in the relation R and t1(A) =
t2(A) then we must have t1(B) = t2(B).
A and B need not be single attributes. They could be any subsets of the attributes of a relation R
(possibly single attributes). We may then write
R.A -> R.B
If B is functionally dependent on A (or A functionally determines B). Note that functional
dependency does not imply a one-to-one relationship between A and B although a one-to-one
relationship may exist between A and B.
A simple example of the above functional dependency is when A is a primary key of an entity
(e.g. student number) and A is some single-valued property or attribute of the entity (e.g. date of
birth). A -> B then must always hold.
Functional dependencies also arise in relationships. Let C be the primary key of an entity and D
be the primary key of another entity. Let the two entities have a relationship. If the relationship is
one-to-one, we must have C -> D and D -> C. If the relationship is many-to-one, we would have
C -> D but not D -> C. For many-to-many relationships, no functional dependencies hold. For
example, if C is student number and D is subject number, there is no functional dependency
between them. If however, we were storing marks and grades in the database as well, we would
have
(student_number, subject_number) -> marks and we might have
marks -> grades
The second functional dependency above assumes that the grades are dependent only on the
marks. This may sometime not be true since the instructor may decide to take other
considerations into account in assigning grades, for example, the class average mark.
For example, in the student database that we have discussed earlier, we have the following
functional dependencies:
sno -> sname sno -> address cno -> cname cno -> instructor
instructor -> office
These functional dependencies imply that there can be only one name for each sno, only one
address for each student and only one subject name for each cno. It is of course possible that
several students may have the same name and several students may live at the same address. If
we consider cno -> instructor, the dependency implies that no subject can have more than one
instructor (perhaps this is not a very realistic assumption). Functional dependencies therefore
place constraints on what information the database may store. In the above example, one may be
wondering if the following FDs hold
sname -> sno cname -> cno
Certainly there is nothing in the instance of the example database presented above that
contradicts the above functional dependencies. However, whether above FDs hold or not would
depend on whether the university or college whose database we are considering allows duplicate
student names and subject names. If it was the enterprise policy to have unique subject names
than cname -> cno holds. If duplicate student names are possible, and one would think there
always is the possibility of two students having exactly the same name, then sname -> sno does
not hold.
Functional dependencies arise from the nature of the real world that the database models. Often
A and B are facts about an entity where A might be some identifier for the entity and B some
characteristic. Functional dependencies cannot be automatically determined by studying one or
more instances of a database. They can be determined only by a careful study of the real world
and a clear understanding of what each attribute means.
We have noted above that the definition of functional dependency does not require that A and B
be single attributes. In fact, A and B may be collections of attributes. For example
When dealing with a collection of attributes, the concept of full functional dependence is an
important one. Let A and B be distinct collections of attributes from a relation R end let R.A ->
R.B. B is then
fully functionally dependent on A if B is not functionally dependent on any subset of A. The
above example of students and subjects would show full functional dependence if mark and date
are not functionally dependent on either student number ( sno) or subject number ( cno) alone.
The implies that we are assuming that a student may have more than one subjects and a subject
would be taken by many different students. Furthermore, it has been assumed that there is at
most one enrolment of each student in the same subject.
The above example illustrates full functional dependence. However the following dependence
(sno, cno) -> instructor is not full functional dependence because cno -> instructor holds.
As noted earlier, the concept of functional dependency is related to the concept of candidate key
of a relation since a candidate key of a relation is an identifier which uniquely identifies a tuple
and therefore determines the values of all other attributes in the relation. Therefore any subset X
of the attributes of a relation R that satisfies the property that all remaining attributes of the
relation are functionally dependent on it (that is, on X), then X is candidate key as long as no
attribute can be removed from X and still satisfy the property of functional dependence. In the
example above, the attributes (sno, cno) form a candidate key (and the only one) since they
functionally determine all the remaining attributes.
Functional dependence is an important concept and a large body of formal theory has been
developed about it. We discuss the concept of closure that helps us derive all functional
dependencies that are implied by a given set of dependencies. Once a complete set of functional
dependencies has been obtained, we will study how these may be used to build normalised
relations.
Rules about Functional Dependencies
Let F be set of FDs specified on R
Must be able to reason about FD’s in F
Schema designer usually explicitly states only FD’s which are obvious
Without knowing exactly what all tuples are, must be able to deduce other/all FD’s that hold on
R
Essential when we discuss design of “good” relational schemas
Design of Relational Database Schemas
Problems such as redundancy that occur when we try to cram too much into a single relation are
called anomalies. The principal kinds of anomalies that we encounter are:
_ Redundancy. Information may be repeated unnecessarily in several tuples.
_ Update Anomalies. We may change information in one tuples but leave the same information
unchanged in another.
_ Deletion Anomalies. If a set of values becomes empty, we may lose other information as side
effect.
2 Normalization
Designing a database, usually a data model is translated into relational schema. The important
question is whether there is a design methodology or is the process arbitrary. A simple answer to
this question is affirmative. There are certain properties that a good database design must possess
as dictated by Codd’s rules. There are many different ways of designing good database. One of
such methodologies is the method involving ‘Normalization’. Normalization theory is built
around the concept of normal forms. Normalization reduces redundancy. Redundancy is
unnecessary repetition of data. It can cause problems with storage and retrieval of data. During
the process of normalization, dependencies can be identified, which can cause problems during
deletion and updation. Normalization theory is based on the fundamental notion of Dependency.
Normalization helps in simplifying the structure of schema and tables.
For example the normal forms; we will take an example of a database of the following logical
design: Relation S
{ S#, SUPPLIERNAME, SUPPLYTATUS, SUPPLYCITY}, Primary Key{S#}
Relation P { P#, PARTNAME, PARTCOLOR, PARTWEIGHT, SUPPLYCITY}, Primary
Key{P#}
Relation SP { S#, SUPPLYCITY, P#, PARTQTY}, Primary Key{S#, P#}
5F
Foreign Key{S#} Reference S
Foreign Key{P#} Reference P
S# SUPPLYCITY P# PARTQTY
S1 Bombay P1 3000
S1 Bombay P2 2000
S1 Bombay P3 4000
S1 Bombay P4 2000
S1 Bombay P5 1000
S1 Bombay P6 1000
S2 Mumbai P1 3000
S2 Mumbai P2 4000
S3 Mumbai P2 2000
S4 Madras P2 2000
S4 Madras P4 3000
S4 Madras P5 4000
Let us examine the table above to find any design discrepancy. A quick glance reveals that some
of the data are being repeated. That is data redundancy, which is of course an undesirable. The
fact that a particular supplier is located in a city has been repeated many times. This redundancy
causes many other related problems. For instance, after an update a supplier may be displayed to
be from Madras in one entry while from Mumbai in another. This further gives rise to many
other problems.
Therefore, for the above reasons, the tables need to be refined. This process of refinement of a
given schema into another schema or a set of schema possessing qualities of a good database is
known as Normalization. Database experts have defined a series of Normal forms each
conforming to some specified design
Decomposition. Decomposition is the process of splitting a relation into two or more relations.
This is nothing but projection process. Decompositions may or may not loose information. As
you would learn shortly, that normalization process involves breaking a given relation into one
or more relations and also that these decompositions should be reversible as well, so that no
information is lost in the process. Thus, we will be interested more with the decompositions that
incur no loss of information rather than the ones in which information is lost.
Lossless decomposition: The decomposition, which results into relations without loosing any
information, is known as lossless decomposition or nonloss decomposition. The decomposition
that results in loss of information is known as lossy decomposition.
Consider the relation S{S#, SUPPLYSTATUS, SUPPLYCITY} with some instances of the entries as
shown below.
S S# SUPPLYSTATUS SUPPLYCITY
S3 100 Madras
S5 100 Mumbai
S3 100 S3 Madras
S5 100 S5 Mumbai
Let us examine these decompositions. In decomposition (1) no information is lost. We can still
say that S3’s status is 100 and location is Madras and also that supplier S5 has 100 as its status
and location Mumbai. This decomposition is therefore lossless.
In decomposition (2), however, we can still say that status of both S3 and S5 is 100. But the
location of suppliers cannot be determined by these two tables. The information regarding the
location of the suppliers has been lost in this case. This is a lossy decomposition. Certainly,
lossless decomposition is more desirable because otherwise the decomposition will be
irreversible. The decomposition process is in fact projection, where some attributes are selected
from a table. A natural question arises here as to why the first decomposition is lossless while the
second one is lossy? How should a given relation must be decomposed so that the resulting
projections are nonlossy? Answer to these questions lies in functional dependencies and may be
given by the following theorem.
Heath’s theorem: Let R {A, B, C} be a relation, where A, B and C are sets of attributes. If R
B} and {A, C}.
Let us apply this theorem on the decompositions described above. We observe that relation S
satisfies two irreducible sets of FD’s
lost.
An alternative criteria for lossless decomposition is as follows. Let R be a relation schema, and
let F be a set of functional dependencies on R. let R1 and R2 form a decomposition of R. this
decomposition is a lossless-join decomposition of R if at least one of the following functional
dependencies are in F+:
R1
R2
A relation is in 1st Normal form (1NF) if and only if, in every legal value of that relation, every
tuple contains exactly one value for each attribute.
Although, simplest, 1NF relations have a number of discrepancies and therefore it not the most
desirable form of a relation.
Let us take a relation (modified to illustrate the point in discussion) as
The redundancies in the above relation causes many problems – usually known as update
anomalies, that is in INSERT, DELETE and UPDATE operations. Let us see these problems
due to supplier-
INSERT: In this relation, unless a supplier supplies at least one part, we cannot insert the
information regarding a supplier. Thus, a supplier located in Kolkata is missing from the relation
because he has not supplied any part so far.
DELETE: Let us see what problem we may face during deletion of a tuple. If we delete the
tuple of a supplier (if there is a single entry for that supplier), we not only delte the fact that the
supplier supplied a particular part but also the fact that the supplier is located in a particular city.
In our case, if we delete entries corresponding to S#=S2, we loose the information that the
supplier is located at Mumbai. This is definitely undesirable. The problem here is there are too
many informations attached to each tuple, therefore deletion forces loosing too many
informations.
UPDATE: If we modify the city of a supplier S1 to Mumbai from Madras, we have to make sure
that all the entries corresponding to S#=S1 are updated otherwise inconsistency will be
introduced. As a result some entries will suggest that the supplier is located at Madras while
others will contradict this fact.
2.2 Second Normal Form
A relation is in 2NF if and only if it is in 1NF and every nonkey attribute is fully functionally
dependent on the primary key. Here it has been assumed that there is only one candidate key,
which is of course primary key.
A relation in 1NF can always decomposed into an equivalent set of 2NF relations. The reduction
process consists of replacing the 1NF relation by suitable projections.
We have seen the problems arising due to the less-normalization (1NF) of the relation. The
remedy is to break the relation into two simpler relations.
REL2{S#, SUPPLYSTATUS, SUPPLYCITY} and
REL3{S#, P#, PARTQTY}
REL2 and REL3 are in 2NF with their {S#} and {S#, P#} respectively. This is because all
nonkeys of REL1{ SUPPLYSTATUS, SUPPLYCITY}, each is functionally dependent on the
primary key that is S#. By similar argument, REL3 is also in 2NF. Evidently, these two relations
have overcome all the update anomalies stated earlier. Now it is possible to insert the facts
regarding supplier S5 even when he is not supplied any part, which was earlier not possible. This
solves insert problem. Similarly, delete and update problems are also over now.
These relations in 2NF are still not free from all the anomalies. REL3 is free from most of the
problems we are going to discuss here, however, REL2 still carries some problems. The reason is
that the dependency of SUPPLYSTATUS on S# is though functional, it is transitive via
transitive dependency. We will see that this transitive dependency gives rise to another set of
anomalies.
INSERT: We are unable to insert the fact that a particular city has a particular status until we
have some supplier actually located in that city.
DELETE: If we delete sole REL2 tuple for a particular city, we delete the information that that
city has that particular status.
UPDATE: The status for a given city still has redundancy. This causes usual redundancy
problem related to update.
2.3 Third Normal Form
A relation is in 3NF if only if it is in 2NF and every non-key attribute is non-transitively
dependent on the primary key.
To convert the 2NF relation into 3NF, once again, the REL2 is split into two simpler relations –
REL4 and REL5 as shown below.
RELATION 4 RELATION 5
S# SUPPLYCITY SUPPLYCITY SUPPLYSTATUS
S1 Madras Madras 200
S2 Mumbai Mumbai 100
S3 Mumbai Kolakata 300
S4 Madras
S5 Kolkata
Evidently, the above relations RELATION 4 and RELATION5 are in 3NF, because there is no transitive
dependencies. Every 2NF can be reduced into 3NF by decomposing it further and removing any
transitive dependency.
To conclude, we note that it follows from the definition that 5NF is the ultimate normal form
with respect to projection and join (which accounts for its alternative name, projection-join
normal form). That is, a relation in 5NF is guaranteed to be free of anomalies that can be
eliminated by taking projections. For a relation is in 5NF the only join dependencies are those
that are implied by candidate keys, and so the only valid decompositions are ones that are based
on those candidate keys.
Chapter-3
FUNCTIONAL DEPENDENCY AND NORMALIZATION
End Chapter quizzes:
Q2 A relation is said to be in 2 NF if
(i) it is in 1 NF
(ii) non-key attributes dependent on key attribute
(iii) non-key attributes are independent of one another
(iv) if it has a composite key, no non-key attribute should be dependent on
part of the composite key.
Q8. If a non key attribute is depending on another non key attribute, It is known as
a) Full F D
b) Partial F D
c) TRANSITIVE F D
d) None of the above
Q9. Decomposition of relation should always be
a) Lossy
b) Lossless
c) Both a and b
d) None of the above
Chapter: 4
1. INTRODUCTARY CONCEPT
1.1 What is SQL?
SQL stands for Structured Query Language
SQL allows you to access a database
SQL is an ANSI standard computer language
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert new records in a database
SQL can delete records from a database
SQL can update records in a database
SQL is easy to learn
SQL is an ANSI (American National Standards Institute) standard computer language for
accessing and manipulating database systems. SQL statements are used to retrieve and update
data in a database. SQL works with database programs like MS Access, DB2, Informix, MS
SQL Server, Oracle, Sybase, etc
2. DATABASE LANGUAGE
2.1 SQL Data Definition Language (DDL)
The Data Definition Language (DDL) part of SQL permits database tables to be created or
deleted. We can also define indexes (keys), specify links between tables, and impose constraints
between database tables.
Example
This example demonstrates how you can create a table named "Person", with four columns. The
column names will be "LastName", "FirstName", "Address", and "Age":
ALTER TABLE
The ALTER TABLE statement is used to add, drop and modify columns in an existing table.
You can also specify the columns for which you want to insert data:
INSERT INTO table_name (column1, column2,...)
VALUES (value1, value2,....)
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value
Operator Description
= Equal
Result
LastName FirstName Address City Year
Hansen Ola Timoteivn 10 Sandnes 1951
Svendson Tove Borgvn 23 Sandnes 1978
Svendson Stale Kaivn 18 Sandnes 1980
A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after
the pattern.
Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
SELECT *
FROM Persons
WHERE FirstName LIKE 'O%'
The ORDER BY keyword is used to sort the result.
Company OrderNumber
Sega 3412
W3Schools 2312
W3Schools 6798
Example
To display the companies in alphabetical order:
Result:
Company OrderNumber
ABC Shop 5678
Sega 3412
W3Schools 6798
W3Schools 2312
Example
To display the companies in alphabetical order AND the order numbers in numerical order:
SELECT Company, OrderNumber FROM Orders
ORDER BY Company, OrderNumber
Result:
Company OrderNumber
Sega 3412
W3Schools 2312
W3Schools 6798
GROUP BY...
Aggregate functions (like SUM) often need an added GROUP BY functionality.
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate
of all column values every time they are called, and without the GROUP BY function it was
impossible to find the sum for each individual group of column values.
The syntax for the GROUP BY function is:
GROUP BY Example
This "Sales" Table:
Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100
3. What is a View?
In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements
to a view and present the data as if the data were coming from a single table.
Syntax
View is of two types’ updateable view and non-updateable view. Using updateable view value of
the table can be modified where as in case of non updateable view base table can not be updated.
4. Rename of a Table Column
ALTER TABLE <table>
RENAME <oldname> TO <newname>;
ViewName2
CONSTRAINT_NAME
------------------------------
TEST1_PK
1 row selected.
SQL> SELECT index_name, column_name
2 FROM user_ind_columns
3 WHERE table_name = 'TEST1';
INDEX_NAME COLUMN_NAME
-------------------- --------------------
TEST1_PK COL1
1 row selected.
SQL> -- Rename the table, columns, primary key
SQL> -- and supporting index.
SQL> ALTER TABLE test1 RENAME TO test;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col1 TO id;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col2 TO description;
Table altered.
SQL> ALTER TABLE test RENAME CONSTRAINT test1_pk TO test_pk;
Table altered.
SQL> ALTER INDEX test1_pk RENAME TO test_pk;
Index altered.
SQL> DESC test
Name Null? Type
-------------------- -------- --------------------
ID NOT NULL NUMBER(10)
DESCRIPTION NOT NULL VARCHAR2(50)
SQL> SELECT constraint_name
2 FROM user_constraints
3 WHERE table_name = 'TEST'
4 AND constraint_type = 'P';
CONSTRAINT_NAME
--------------------
TEST_PK
1 row selected.
INDEX_NAME COLUMN_NAME
-------------------- --------------------
TEST_PK ID
1 row selected.
STRUCTURE QUERY LANGUAGE
End Chapter quizzes:
Q1 SELECT statement is used for
Q3. Which of the following statements are NOT TRUE about ORDER BY clauses?
A. Ascending or descending order can be defined with the asc or desc keywords.
B. Only one column can be used to define the sort order in an order by clause.
C. Multiple columns can be used to define sort order in an order by clause.
D. Columns can be represented by numbers indicating their listed order in the select
1. Introduction to PL/SQL
PL/SQL is a procedural extension for Oracle’s Structured Query Language. PL/SQL is not a
separate language rather a technology. Mean to say that you will not have a separate place or
prompt for executing your PL/SQL programs. PL/SQL technology is like an engine that executes
PL/SQL blocks and subprograms. This engine can be started in Oracle server or in application
development tools such as Oracle Forms, Oracle Reports etc.
As shown in the above figure PL/SQL engine executes procedural statements and sends SQL
part of statements to SQL statement processor in the Oracle server. PL/SQL combines the data
manipulating power of SQL with the data processing power of procedural languages.
2 Block Structure of PL/SQL:
PL/SQL is a block-structured language. It means that Programs of PL/SQL contain logical
blocks. PL/SQL block consists of SQL and PL/SQL statements.
The Declaration section of a PL/SQL Block starts with the reserved keyword DECLARE. This
section is optional and is used to declare any placeholders like variables, constants, records and
cursors, which are used to manipulate data in the execution section. Placeholders may be any of
Variables, Constants and Records, which stores data temporarily. Cursors are also declared in
this section.
The Execution section of a PL/SQL Block starts with the reserved keyword BEGIN and ends
with END. This is a mandatory section and is the section where the program logic is written to
perform any task. The programmatic constructs like loops, conditional statement and SQL
statements form the part of execution section.
The Exception section of a PL/SQL Block starts with the reserved keyword EXCEPTION. This
section is optional. Any errors in the program can be handled in this section, so that the PL/SQL
Blocks terminates gracefully. If the PL/SQL Block contains exceptions that cannot be handled,
the Block terminates abruptly with errors. Every statement in the above three sections must end
with a semicolon (;). PL/SQL blocks can be nested within other PL/SQL blocks. Comments can
be used to document code.
DECLARE
Variable declaration
BEGIN
Program Execution
EXCEPTION
Exception handling
Variables and Constants: Variables are used to store query results. Forward references are not
allowed. Hence you must first declare the variable and then use it.
Variables can have any SQL data type, such as CHAR, DATE, NUMBER etc or any PL/SQL
data type like BOOLEAN, BINARY_INTEGER etc.
Declaring Variables: Variables are declared in DECLARE section of PL/SQL.
DECLARE
SNO NUMBER (3);
SNAME VARCHAR2 (15);
BEGIN
Assigning values to variables:
SNO NUMBER: = 1001;
or
SNAME: = ‘JOHN’; etc
Following screen shot explain you how to write a simple PL/SQL program and execute it
.
Above program can also be written as a text file in Notepad editor and then executed as
explained in the following screen shot.
4. Control Statements
This section explains about how to structure flow of control through a PL/SQL program. The
control structures of PL/SQL are simple yet powerful. Control structures in PL/SQL can be
divided into selection:
Conditional,
Iterative and
Sequential.
4.1 Conditional Control (Selection): This structure tests a condition, depending on the
condition is true or false it decides the sequence of statements to be executed.
Example
Syntax for IF-THEN
IF THEN
Statements
END IF;
Example:
Example:
Syntax for IF-THEN-ELSIF:
IF THEN
Statements
ELSIF THEN
Statements
ELSE
Statements
END IF;
4.2 Iterative Control
LOOP statement executes the body statements multiple times. The statements are placed
between LOOP – END LOOP keywords. The simplest form of LOOP statement is an infinite
loop. EXIT statement is used inside LOOP to terminate it.
Syntax for LOOP- END LOOP
LOOP
Statements
END LOOP;
Example:
BEGIN
LOOP
DBMS_OUTPUT.PUT_LINE (‘Hello’);
END LOOP;
END;
5. CURSOR
For every SQL statement execution certain area in memory is allocated. PL/SQL allows you to
name this area. This private SQL area is called context area or cursor. A cursor acts as a handle
or pointer into the context area. A PL/SQL program controls the context area using the cursor.
Cursor represents a structure in memory and is different from cursor variable. When you declare
a cursor, you get a pointer variable, which does not point any thing. When the cursor is opened,
memory is allocated and the cursor structure is created. The cursor variable now points the
cursor. When the cursor is closed the memory allocated for the cursor is released.
Cursors allow the programmer to retrieve data from a table and perform actions on that data one
row at a time. There are two types of cursors implicit cursors and explicit cursors.
5.1 Implicit cursors
For SQL queries returning single row PL/SQL declares implicit cursors. Implicit cursors are
simple SELECT statements and are written in the BEGIN block (executable section) of the
PL/SQL. Implicit cursors are easy to code, and they retrieve exactly one row. PL/SQL implicitly
declares cursors for all DML statements.
BEGIN
----
---
END;
Processing multiple rows is similar to file processing. For processing a file you need to open it,
process records and then close. Similarly user-defined explicit cursor needs to be opened, before
reading the rows, after which it is closed. Like how file pointer marks current position in file
processing, cursor marks the current position in the active set.
FETCH statement retrieves one row at a time. Bulk collect clause need to be used to fetch more
than one row at a time. Closing the cursor: After retrieving all the rows from active set the
cursor should be closed. Resources allocated for the cursor are now freed. Once the cursor is
closed the execution of fetch statement will lead to errors.
CLOSE <cursor-name>;
5.5 Explicit Cursor Attributes
Every cursor defined by the user has 4 attributes. When appended to the cursor name these
attributes let the user access useful information about the execution of a multi row query.
The attributes are:
1. %NOTFOUND: It is a Boolean attribute, which evaluates to true, if the last fetch failed.
i.e. when there are no rows left in the cursor to fetch.
2. %FOUND: Boolean variable, which evaluates to true if the last fetch, succeeded.
3. %ROWCOUNT: It’s a numeric attribute, which returns number of rows fetched by the
cursor so far.
4. %ISOPEN: A Boolean variable, which evaluates to true if the cursor is opened otherwise
to false.
In above example I wrote a separate fetch for each row, instead loop statement could be used
here. Following example explains the usage of LOOP.
6. Exceptions
An Exception is an error situation, which arises during program execution. When an error occurs
exception is raised, normal execution is stopped and control transfers to exception-handling part.
Exception handlers are routines written to handle the exception. The exceptions can be internally
defined (system-defined or pre-defined) or User-defined exception.
6.1 Predefined exception is raised automatically whenever there is a violation of Oracle coding
rules. Predefined exceptions are those like ZERO_DIVIDE, which is raised automatically when
we try to divide a number by zero. Other built-in exceptions are given below. You can handle
unexpected Oracle errors using OTHERS handler. It can handle all raised exceptions that are not
handled by any other handler. It must always be written as the last handler in exception block.
The biggest advantage of exception handling is it improves readability and reliability of the code.
Errors from many statements of code can be handles with a single handler. Instead of checking
for an error at every point we can just add an exception handler and if any exception is raised it is
handled by that.
For checking errors at a specific spot it is always better to have those statements in a separate
begin – end block.
The DUP_VAL_ON_INDEX is raised when a SQL statement tries to create a duplicate value in
a column on which a primary key or unique constraints are defined.
Example: To demonstrate the exception DUP_VAL_ON_INDEX.
More than one Exception can be written in a single handler as shown below.
EXCEPTION
When NO_DATA_FOUND or TOO_MANY_ROWS then
Statements;
END;
Raising Exception:
BEGIN
RAISE myexception;
-------
Handling Exception:
BEGIN
------
----
EXCEPTION
WHEN myexception THEN
Statements;
END;
Points To Ponder:
An Exception cannot be declared twice in the same block.
Exceptions declared in a block are considered as local to that block and global to its sub-
blocks.
An enclosing block cannot access Exceptions declared in its sub-block. Where as it
possible for a sub-block to refer its enclosing Exceptions.
emp_rec is automatically created variable of %ROWTYPE. We have not used OPEN, FETCH ,
and CLOSE in the above example as for cursor loop does it automatically. The above example
can be rewritten as shown in the Fig , with less lines of code. It is called Implicit for Loop.
Deletion or Updation Using Cursor:
In all the previous examples I explained about how to retrieve data using cursors. Now we will
see how to modify or delete rows in a table using cursors. In order to Update or Delete rows, the
cursor must be defined with the FOR UPDATE clause. The Update or Delete statement must be
declared with WHERE CURRENT OF
Following example updates comm of all employees with salary less than 2000 by adding 100 to
existing comm.
7. PL/SQL subprograms
A subprogram is a named block of PL/SQL. There are two types of subprograms in PL/SQL
namely Procedures and Functions. Every subprogram will have a declarative part, an executable
part or body, and an exception handling part, which is optional.
1. They allow you to write PL/SQL program that meet our need
2. They allow you to break the program into manageable modules.
3. They provide reusability and maintainability for the code.
7.1 Procedures
Procedure is a subprogram used to perform a specific action. A procedure contains two parts
specification and the body. Procedure specification begins with CREATE and ends with
procedure name or parameters list. Procedures that do not take parameters are written without a
parenthesis. The body of the procedure starts after the keyword IS or AS and ends with keyword
END.
In the above given syntax things enclosed in between angular brackets (“< > “) are user
defined and those enclosed in square brackets (“[ ]”) are optional.
OR REPLACE is used to overwrite the procedure with the same name if there is any.
AUTHID clause is used to decide whether the procedure should execute with invoker (current-
user or person who executes it) or with definer (owner or person created) rights
Example
END;
Let us assume that above procedure is created in SCOTT schema (SCOTT user area) and say is
executed by user SEENU. It will delete rows from the table EMP owned by SCOTT, but not
from the EMP owned by SEENU. It is possible to use a procedure owned by one user on tables
owned by other users. It is possible by setting invoker-rights
AUTHID CURRENT_USER
Parameter Modes
Parameters are used to pass the values to the procedure being called. There are 3 modes to be
used with parameters based on their usage. IN, OUT, and IN OUT. IN mode parameter used to
pass the values to the called procedure. Inside the program IN parameter acts like a constant. i.e
it cannot be modified. OUT mode parameter allows you to return the value from the procedure.
Inside Procedure the OUT parameter acts like an uninitialized variable. Therefore its value
cannot be assigned to another variable.
IN OUT mode parameter allows you to both pass to and return values from the subprogram.
Default mode of an argument is IN.
Example
BEGIN
PROC1; --- PROC1 is name of the procedure.
END;
/
Functions:
A function is a PL/SQL subprogram, which is used to compute a value. Function is same like a
procedure except for the difference that it have RETURN clause.
Examples
Q4. Which cursors are used in queries that return multiple rows?
a) Explicit cursor
b) Implicit cursors
c) Open Cursor
d) Both a and c
2. ACID properties
When a transaction processing system creates a transaction, it will ensure that the transaction
will have certain characteristics. The developers of the components that comprise the transaction
are assured that these characteristics are in place. They do not need to manage these
characteristics themselves. These characteristics are known as the ACID properties. ACID is an
acronym for atomicity, consistency, isolation, and durability.
2.1 Atomicity
The atomicity property identifies that the transaction is atomic. An atomic transaction is either
fully completed, or is not begun at all. Any updates that a transaction might affect on a system
are completed in their entirety. If for any reason an error occurs and the transaction is unable to
complete all of its steps, the then system is returned to the state it was in before the transaction
was started. An example of an atomic transaction is an account transfer transaction. The money
is removed from account A then placed into account B. If the system fails after removing the
money from account A, then the transaction processing system will put the money back into
account A, thus returning the system to its original state. This is known as a rollback, as we said
at the beginning of this chapter..
2.2 Consistency
A transaction enforces consistency in the system state by ensuring that at the end of any
transaction the system is in a valid state. If the transaction completes successfully, then all
changes to the system will have been properly made, and the system will be in a valid state. If
any error occurs in a transaction, then any changes already made will be automatically rolled
back. This will return the system to its state before the transaction was started. Since the system
was in a consistent state when the transaction was started, it will once again be in a consistent
state.
Looking again at the account transfer system, the system is consistent if the total of all accounts
is constant. If an error occurs and the money is removed from account A and not added to
account B, then the total in all accounts would have changed. The system would no longer be
consistent. By rolling back the removal from account A, the total will again be what it should be,
and the system back in a consistent state.
2.3 Isolation
When a transaction runs in isolation, it appears to be the only action that the system is carrying
out at one time. If there are two transactions that are both performing the same function and are
running at the same time, transaction isolation will ensure that each transaction thinks it has
exclusive use of the system. This is important in that as the transaction is being executed, the
state of the system may not be consistent. The transaction ensures that the system remains
consistent after the transaction ends, but during an individual transaction, this may not be the
case. If a transaction was not running in isolation, it could access data from the system that may
not be consistent. By providing transaction isolation, this is prevented from happening.
2.4 Durability
A transaction is durable in that once it has been successfully completed, all of the changes it
made to the system are permanent. There are safeguards that will prevent the loss of information,
even in the case of system failure. By logging the steps that the transaction performs, the state of
the system can be recreated even if the hardware itself has failed. The concept of durability
allows the developer to know that a completed transaction is a permanent part of the system,
regardless of what happens to the system later on.
3 The Concept of Schedules
When transactions are executing concurrently in an interleaved fashion, not only does the
action of each transaction becomes important, but also the order of execution of operations from
each of these transactions. Hence, for analyzing any problem, it is not just the history of previous
transactions that one should be worrying about, but also the “schedule” of operations.
3.1 Schedule (History of transaction):
We formally define a schedule S of n transactions T1, T2 …Tn as on ordering of operations of
the transactions subject to the constraint that, for each transaction, Ti that participates in S, the
operations of Ti must appear in the same order in which they appear in Ti. i.e. if two operations
Ti1 and Ti2 are listed in Ti such that Ti1 is earlier to Ti2, then in the schedule also Ti1 should appear
before Ti2. However, if Ti2 appears immediately after Ti1 in Ti, the same may not be true in S,
because some other operations Tj1 (of a transaction Tj) may be interleaved between them. In
short, a schedule lists the sequence of operations on the database in the same order in which
it was effected in the first place.
For the recovery and concurrency control operations, we concentrate mainly on read and write
operations of the transactions, because these operations actually effect changes to the database.
The other two (equally) important operations are commit and abort, since they decide when the
changes effected have actually become active on the database.
Since listing each of these operations becomes a lengthy process, we make a notation for
describing the schedule. The read operations (Readtr) , write operations(Writetr) of transactions
, commit and abort, we indicate by r, w, c and a and each of them come with a subscript to
indicate the transaction number
For example SA : r1(x); y2(y); w2(y); r1(y), W1 (x); a1
Indicates the following operations in the same order:
Readtr(x) transaction 1
Read tr (y) transaction 2
Write tr (y) transaction 2
Read tr(y) transaction 1
Write tr(x) transaction 1
Abort transaction 1
3.2 Conflicting operations: Two operations in a schedule are said to be in conflict if they satisfy
these conditions
i) The operations belong to different transactions
ii) They access the same item x
iii) At least one of the operations is a write operation.
For example: r1(x); w2 (x)
W1 (x); r2(x)
w1 (y); w2(y)
Conflict because both of them try to write on the same item.
But r1 (x); w2(y) and r1(x) and r2(x) do not conflict, because in the first case the read and
write are on different data items, in the second case both are trying read the same data
item, which they can do without any conflict.
A “Partial order” of the schedule is said to occur, if the first two conditions of the complete
schedule are satisfied, but whenever there are non conflicting operations in the schedule, they
can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any order
without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete schedules. This is
because new transactions keep getting included into the schedule. Hence, often one works with a
“committed projection” C(S) of a schedule S. This set includes only those operations in S that
have committed transactions i.e. transaction Ti whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual outcome
of the schedule, only those transactions, who have completed their commit operations, contribute
to the set and this schedule is good enough in most cases.
3.4 Schedules and Recoverability :
Recoverability is the ability to recover from transaction failures. The success or otherwise of
recoverability depends on the schedule of transactions. If fairly straightforward operations
without much interleaving of transactions are involved, error recovery is a straight forward
process. On the other hand, if lot of interleaving of different transactions have taken place, then
recovering from the failure of any one of these transactions could be an involved affair. In
certain cases, it may not be possible to recover at all. Thus, it would be desirable to characterize
the schedules based on their recovery capabilities.
To do this, we observe certain features of the recoverability and also of schedules. To begin
with, we note that any recovery process, most often involves a “roll back” operation, wherein
the operations of the failed transaction will have to be undone. However, we also note that
the roll back need to go only as long as the transaction T has not committed. If the
transaction T has committed once, it need not be rolled back. The schedules that satisfy this
criterion are called “recoverable schedules” and those that do not, are called “non-
recoverable schedules”. As a rule, such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S commits, until all
transactions T1 that have written an item which is read by T have committed. The concept is a
simple one. Suppose the transaction T reads an item X from the database completes its
operations (based on this and other values) and commits the values. i.e. the output values of T
become permanent values of database.
But suppose, this value X is written by another transaction T’ (before it is read by T), but
aborts after T has committed. What happens? The values committed by T are no more valid,
because the basis of these values (namely X) itself has been changed. Obviously T also needs to
be rolled back (if possible), leading to other rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed transaction needs to be
rolled back. But, it is possible that a cascading roll back scheme may have to be effected, in
which an uncommitted transaction has to be rolled back, because it read from a value contributed
by a transaction which later aborted. But such cascading rollbacks can be very time consuming
because at any instant of time, a large number of uncommitted transactions may be operating.
Thus, it is desirable to have “cascadeless” schedules, which avoid cascading rollbacks.
This can be ensured by ensuring that transactions read only those values which are written by
committed transactions i.e. there is no fear of any aborted or failed transactions later on. If the
schedule has a sequence wherein a transaction T1 has to read a value X by an uncommitted
transaction T2, then the sequence is altered, so that the reading is postponed, till T2 either
commits or aborts.
It may be noted that the recoverable schedule, cascadeless schedules and strict schedules each is
more stringent than it’s predecessor. It facilitates the recovery process, but sometimes the
process may get delayed or even may become impossible to schedule.
4 Serializability
Given two transaction T1 and T2 are to be scheduled, they can be scheduled in a number of ways.
The simplest way is to schedule them without in that bothering about interleaving them. i.e.
schedule all operation of the transaction T1 followed by all operations of T2 or alternatively
schedule all operations of T2 followed by all operations of T1.
T1 T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Time read_tr(X)
X=X+P
Write_tr(X)
T1 T2
read_tr(X)
X=X+P
Write_tr(X)
read_tr(X)
X=X+N
write_tr(X )
read_tr(Y)
Y=Y+N
Write_tr(Y)
T1 T2
read_tr(X )
X=X+N
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Interleaved (non-serial schedule): C
T1 T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
For a value X=2, both produce the same result. Can be conclude that they are equivalent?
Though this may look like a simplistic example, with some imagination, one can always come
out with more sophisticated examples wherein the “bugs” of treating them as equivalent are less
obvious. But the concept still holds -result equivalence cannot mean schedule equivalence. One
more refined method of finding equivalence is available. It is called “ conflict equivalence”.
Two schedules can be said to be conflict equivalent, if the order of any two conflicting
operations in both the schedules is the same (Note that the conflicting operations essentially
belong to two different transactions and if they access the same data item, and atleast one of
them is a write _tr(x) operation). If two such conflicting operations appear in different orders in
different schedules, then it is obvious that they produce two different databases in the end and
hence they are not equivalent.
4.1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node labeled T1 in the
precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x), create an
edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x), create an
edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a write_tr(x), create
an edge from Ti to Tj in the graph.
5. The schedule S is serialisable if and only if there are no cycles in the graph.
If we apply these methods to write the precedence graphs for the four cases of section 4,
we get the following precedence graphs.
T1 T2 T1 T2
X
Schedule A Schedule B
T1 T2
T1 T2
Schedule C Schedule D
Two schedules S and S1 are said to be “view equivalent” if the following conditions are
satisfied.
i) The same set of transactions participates in S and S1 and S and S1 include the
same operations of those transactions.
ii) For any operation ri(X) of Ti in S, if the value of X read by the operation has been
written by an operation wj(X) of Tj(or if it is the original value of X before the
schedule started) the same condition must hold for the value of x read by
operation ri(X) of Ti in S1.
iii) If the operation Wk(Y) of Tk is the last operation to write, the item Y in S, then
Wk(Y) of Tk must also be the last operation to write the item y in S1.
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the
write operations of each transaction must produce the same results. Hence, the
read operations are said to see the same view of both the schedules. It can easily
be verified when S or S1 operate independently on a database with the same initial
state, they produce the same end states. A schedule S is said to be view
serializable, if it is view equivalent to a serial schedule.
It can also be verified that the definitions of conflict serializability and view serializability are similar, if a condition of “constrained write
assumption” holds on all transactions of the schedules. This condition states that any write operation wi(X) in Ti is preceded by a ri(X) is
Ti and that the value written by wi(X) in Ti depends only on the value of X read by ri(X). This assumes that computation of the new value
of X is a function f(X) based on the old value of x read from the database. However, the definition of view serializability is less restrictive
than that of conflict serializability under the “unconstrained write assumption” where the value written by the operation Wi(x) in Ti can be
independent of it’s old value from the database. This is called a “blind write”.
But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.
Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method – impose restrictions on
the transactions themselves. These restrictions, when followed by every participating
transaction, automatically ensure serializability in all schedules that are created by these
participating schedules.
Also, since transactions are being submitted at different times, it is difficult to determine
when a schedule begins and when it ends. Hence serializability theory can be used to
deal with the problem by considering only the committed projection C(CS) of the
schedule. Hence, as an approximation, we can define a schedule S as serializable if it’s
committed C(CS) is equivalent to some serial schedule.
5.1 The lost update problem: This problem occurs when two transactions that access the same
database items have their operations interleaved in such a way as to make the value of some
database incorrect. Suppose the transactions T1 and T2 are submitted at the (approximately)
same time. Because of the concept of interleaving, each operation is executed for some period of
time and then the control is passed on to the other transaction and this sequence continues.
Because of the delay in updating, this creates a problem. This was what happened in the
previous example. Let the transactions be called TA and TB.
TA TB
Read –tr(X)
Read –tr(X) Time
X = X – NA
X = X - NB
Write –tr(X)
Write –tr(X)
Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost the
updating of TB.
This happens when a transaction TA updates a data item, but later on (for some reason) the
transaction fails. It could be due to a system failure or any other operational reason or the system
may have later on noticed that the operation should not have been done and cancels it. To be
fair, it also ensures that the original value is restored.
But in the meanwhile, another transaction TB has accessed the data and since it has no indication
as to what happened later on, it makes use of this data and goes ahead. Once the original value is
restored by TA, the values generated by TB are obviously invalid.
TA TB
Read –tr(X) Time
X=X–N
Write –tr(X)
Read –tr(X)
X=X-N
Write –tr(X)
Failure
X=X+N
Write –tr(X)
The value generated by TA out of a non-sustainable transaction is a “dirty data” which is read by
TB, produces an illegal value. Hence the problem is called a dirty read problem.
5.3 The Incorrect Summary Problem: Consider two concurrent operations, again called TA and
TB. TB is calculating a summary (average, standard deviation or some such operation) by
accessing all elements of a database (Note that it is not updating any of them, only is reading
them and is using the resultant data to calculate some values). In the meanwhile T A is updating
these values. In case, since the Operations are interleaved, TA, for some of it’s operations will be
using the not updated data, whereas for the other operations will be using the updated data. This
is called the incorrect summary problem.
TA TB
Sum = 0
Read –tr(A)
Sum = Sum + A
Read –tr(X)
X=X–N
Write –tr(X)
Read tr(X)
Sum = Sum + X
Read –tr(Y)
Sum = Sum + Y
Read (Y)
Y=Y–N
Write –tr(Y)
In the above example, both TA will be updating both X and Y. But since it first updates X
and then Y and the operations are so interleaved that the transaction TB uses both of them in
between the operations, it ends up using the old value of Y with the new value of X. In the
process, the sum we got does not refer either to the old set of values or to the new set of
values.
6 Locking techniques for concurrency control
Many of the important techniques for concurrency control make use of the concept of the lock. A
lock is a variable associated with a data item that describes the status of the item with respect to
the possible operations that can be done on it. Normally every data item is associated with a
unique lock. They are used as a method of synchronizing the access of database items by the
transactions that are operating concurrently. Such controls, when implemented properly can
overcome many of the problems of concurrent operations listed earlier. However, the locks
themselves may create a few problems, which we shall be seeing in some detail in subsequent
sections.
Write lock(X)
Unlock(X)
If lock(X) = “write locked”
Then {Lock(X) “unlocked”’
Wakeup one of the waiting transaction, if any
}
else if Lock(X) = “read locked”
then { no of reads(X) no of reads –1
if no of reads(X)=0
then { Lock(X) = “unlocked”
wake up one of the waiting transactions, if any
}
}
The two phase locking, though provides serializability has a disadvantage. Since the
locks are not released immediately after the use of the item is over, but is retained till all
the other needed locks are also acquired, the desired amount of interleaving may not be
derived – worse, while a transaction T may be holding an item X, though it is not using
it, just to satisfy the two phase locking protocol, another transaction T1 may be
genuinely needing the item, but will be unable to get it till T releases it. This is the price
that is to be paid for the guaranteed serializability provided by the two phase locking
system.
T11 T21
readlock(Y)
T11 T21
readtr(Y)
readlock(X) The status graph
readtr(X)
writelock(X)
writelock(Y)
A better way to deal with deadlocks is to identify the deadlock when it occurs and then
take some decision. The transaction involved in the deadlock may be blocked or aborted or the
transaction can preempt and abort the other transaction involved. In a typical case, the concept
of transaction time stamp TS (T) is used. Based on when the transaction was started, (given by
the time stamp, larger the value of TS, younger is the transaction), two methods of deadlock
recovery are devised.
It may be noted that in both cases, the younger transaction will get aborted. But the actual
method of aborting is different. Both these methods can be proved to be deadlock free, because
no cycles of waiting as seen earlier are possible with these arrangements.
There is another class of protocols that do not require any time stamps. They include the “no
waiting algorithm” and the “cautious waiting” algorithms. In the no-waiting algorithm, if a
transaction cannot get a lock, it gets aborted immediately (no-waiting). It is restarted again at a
later time. But since there is no guarantee that the new situation. is dead lock free, it may have to
aborted again. This may lead to a situation where a transaction may end up getting aborted
repeatedly.
To overcome this problem, the cautious waiting algorithm was proposed. Here, suppose the
transaction Ti tries to lock an item X, but cannot get X since X is already locked by another
transaction Tj. Then the solution is as follows: If Tj is not blocked (not waiting for same other
locked item) then Ti is blocked and allowed to wait. Otherwise Ti is aborted. This method not
only reduces repeated aborting, but can also be proved to be deadlock free, since out of Ti & Tj,
only one is blocked, after ensuring that the other is not blocked.
6.3 Starvation:
The other side effect of locking in starvation, which happens when a transaction cannot proceed
for indefinitely long periods, though the other transactions in the system, are continuing
normally. This may happen if the waiting schemes for locked items is unfair. I.e. if some
transactions may never be able to get the items, since one or the other of the high priority
transactions may continuously be using them. Then the low priority transaction will be forced to
“starve” for want of resources.
The solution to starvation problems lies in choosing proper priority algorithms – like first-come-
first serve. If this is not possible, then the priority of a transaction may be increased every time it
is made to wait / aborted, so that eventually it becomes a high priority transaction and gets the
required services.
6.4.2 An algorithm for ordering the time stamp: The basic concept is to order the transactions
based on their time stamps. A schedule made of such transactions is then serializable. This
concept is called the time stamp ordering (To). The algorithm should ensure that whenever a
data item is accessed by conflicting operations in the schedule, the data is available to them in
the serializability order. To achieve this, the algorithm uses two time stamp values.
1. Read_Ts (X): This indicates the largest time stamp among the transactions that have
successfully read the item X. Note that the largest time stamp actually refers to the
youngest of the transactions in the set (that has read X).
2. Write_Ts(X): This indicates the largest time stamp among all the transactions that have
successfully written the item-X. Note that the largest time stamp actually refers to the
youngest transaction that has written X.
The above two values are often referred to as “read time stamp” and “write time stamp” of the
item X.
6.4.3 The concept of basic time stamp ordering: When ever a transaction tries to read or write
an item X, the algorithm compares the time stamp of T with the read time stamp or the write
stamp of the item X, as the case may be. This is done to ensure that T does not violate the order
of time stamps. The violation can come in the following ways.
1. Transaction T is trying to write X
a) If read TS(X) > Ts(T) or if write Ts (X) > Ts (T) then abort and roll back T and
reject the operation. In plain words, if a transaction younger than T has already
read or written X, the time stamp ordering is violated and hence T is to be aborted
and all the values written by T so far need to be rolled back, which may also
involve cascaded rolling back.
b) If read TS(X) < TS(T) or if write Ts(X) < Ts(T), then execute the write tr(X)
operation and set write TS(X) to TS(T). i.e. allow the operation and the write time
stamp of X to that of T, since T is the latest transaction to have accessed X.
Whenever a transaction T writes into X, a new version XK+1 is created, with both write.
TS(XK+1) and read TS(Xk+1) being set to TS(T). Whenever a transaction T reads into X, the value
of read TS(Xi) is set to the larger of the two values namely read TS(Xi) and TS(T).
To ensure serializability, the following rules are adopted.
i) If T issues a write tr(X) operation and Xi has the highest write TS(Xi) which is less than or
equal to TS(T), and has a read TS(Xi) >TS(T), then abort and roll back T, else create a new
version of X, say Xk with read TS(Xk) = write TS(Xk) = TS(T)
In plain words, if the highest possible write timestamp among all versions is less than or
equal to that of T, and if it has been read by a transaction younger than T, then, we have no
option but to abort T and roll back all it’s effects otherwise a new version of X is created with
it’s read and write timestamps initiated to that of T.
ii) If a transaction T issues a read tr(X) operation, find the version Xi with the highest write
TS(Xi) that is also less than or equal to TS(T) then return the value of Xi to T and set the value
of read TS(Xi) to the value that is larger amongst TS(T) and current read TS(Xi).
This only means, try to find the highest version of Xi that T is eligible to read, and return
it’s value of X to T. Since T has now read the value find out whether it is the youngest
transaction to read X by comparing it’s timestamp with the current read TS stamp of X. If X is
younger (if timestamp is higher), store it as the youngest timestamp to visit X, else retain the
earlier value.
The recovery manager of the DBMS will decide at what intervals, check points need to be
inserted (in turn, at what intervals data is to be written back to the disk). It can be either after
specific periods of time (say M minutes) or specific number of transaction (t transactions) etc.,
When the protocol decides to check point it does the following:-
The force writing need not only refer to the modified data items, but can include the various lists
and other auxiliary information indicated previously.
However, the force writing of all the data pages may take some time and it would be wasteful to
halt all transactions until then. A better way is to make use of the “Fuzzy check pointing” where
in the check point is inserted and while the buffers are being written back (beginning from the
previous check point) the transactions are allowed to restart. This way the i/o time is saved.
Until all data up to the new check point is written back, the previous check point is held valid for
recovery purposes.
Secondly, if all pages are updated once the transaction commits, then it is a “force approach”,
otherwise it is called a “no force” approach.
Most protocols make use of steal / no force strategies, so that there is no urgency of writing back
to the buffer once the transaction commits.
However, just the before image (BIM) and After image (AIM) values may not be sufficient for
successful recovery. A number of lists, including the list of active transaction (those that have
started operating, but have not committed yet), committed transactions as also aborted
transactions need to be maintained, to avoid a brute force method of recovery.
4. Recovery techniques based on Deferred Update:
This is a very simple method of recovery. Theoretically, no transaction can write back
into the database, until it has committed. Till then, it can only write into a buffer. Thus in case
of any crash, the buffer needs to be reconstructed, but the DBMS need not be recovered.
However, in practice, most transactions are very long and it is dangerous us to hold all their
updates in the buffer, since the buffers can run out of space and may need a page replacement.
To avoid such situations, where in a page is removed inadvertently, a simple two pronged
protocol is used.
1. A transaction cannot change the DBMS values on the disk until it commits.
2. A transaction does not reach commit stage until all it’s update values are written on to the
log and log itself in force written on to the disk.
Notice that in case of failures, recovery is by the No UNDO/REDO techniques, since all data
will be in the log if a transaction fails after committing.
4.1 An algorithm for recovery using the deferred update in a single user environment.
In a single user entrainment, the algorithm is a straight application of the REDO
procedure i.e. it uses two lists of transactions: The committed transactions since the last check
point and the currently active transactions when the crash occurs, apply the REDO to all write tr
operations of the committed transactions from the log. And let the active transactions run again.
The assumption is that the REDO operations are “idem potent”. I.e. the operations produce the
same results irrespective of the number of times they are redone provided, they start from the
same initial state. This is essential to ensure that the recovery operation does not produce a result
that is different from the case where no crash was there to begin with.
(Through this may look like a trivial constraint, students may verify themselves that not all
DBMS applications satisfy this condition).
Also since there was only one transaction active (because it was a single user system)
and it had not updated the buffer yet, all that remains to be done is to restart this
transaction.
To simplify the matters, we presume that we are in talking of strict and serializable
schedules. I.e. there is strict two phase locking and they remain effective till the
transactions commit themselves. In such a scenario, an algorithm for recovery could
be as follows:-
Use two lists: The list of committed transactions T since the last check point and the list of active
checkpoints T1 REDO all the write operations of committed transactions in the order in which
they were written into the log. The active transactions are simply cancelled and resubmitted.
Note that once we put the strict serializability conditions, the recovery process does not
vary too much from the single user system.
Note that in the actual process, a given item x may be updated a number of times, either
by the same transaction or by different transactions at different times. What is important to the
user is it’s final value. However, the above algorithm simply updates the value whenever it’s
value was updated in the log. This can be made more efficient by the following manner. Instead
of starting from the check point and proceeding towards the time of the crash, traverse the log
from the time of the crash backwards. Whenever a value is updated, for the first time, update it
and maintain the information that it’s value has been updated. Any further updating of the same
can be ignored.
This method though guarantees correct recovery has some drawbacks. Since the items
remain locked with the transactions until the transaction commits, the concurrent execution
efficiency comes down. Also lot of buffer space is wasted to hold the values, till the transactions
commit. The number of such values can be large, when the long transactions are working in
concurrent mode, they delay the commit operation of one another.
5.1 A typical UNDO/REDO algorithm for a immediate update single user environment
Here, at the time of failure, the changes envisaged by the transaction may have
already been recorded in the database. These must be undone. A typical procedure
for recovery should follow the following lines:
a) The system maintains two lists: The list of committed transactions since the last
checkpoint and the list of active transactions (only one active transaction, infact,
because it is a single user system).
b) In case of failure, undo all the write_tr operations of the active transaction, by using
the information on the log, using the UNDO procedure.
c) For undoing a write_tr(X) operation, examine the corresponding log entry
writetr(T,X,oldvalue, newvalue) and set the value of X to oldvalue. The sequence of
undoing must be in the reverse order, in which operations were written on to the log.
d) REDO the writetr operations of the committed transaction from the log in the order in
which they were written in the log, using the REDO procedure.
a) Use two lists maintained by the system: The committed transactions list(since the last
check point) and the list of active transactions.
b) Undo all writetr(X) operations of the active transactions which have not yet
committed, using the UNDO procedure. The undoing operation must be in the
reverse order of writing process in the log.
c) Redo all writetr(X) operations of the committed transactions from the log in the order
in which they were written into the log.
Normally, the process of redoing the writetr(X) operations begins at the end of the log and
proceeds in the reverse order, so that when a X is written into more than once in the log, only
6. Shadow paging
It is not always necessary that the original database is updated by overwriting the
previous values. As discussed in an earlier section, we can make multiple versions of
the data items, whenever a new update is made. The concept of shadow paging
illustrates this:
Current Directory Pages Shadow Directory
1 Page 2 1
2 Page 5 2
3 Page 7 3
4 Page 7(new) 4
5 Page5 (New) 5
6 Page 2 (new) 6
7 7
8
In a typical case, the database is divided into pages and only those pages that need
updation are brought to the main memory(or cache, as the case may be). A shadow directory
holds pointers to these pages. Whenever an update is done, a new block of the page is created
(indicated by the suffice(new) in the figure) and the updated values are included there. Note that
(i) the new pages are created in the order of updatings and not in the serial order of the pages. A
current directory holds pointers to these new pages. For all practical purposes, these are the
“valid pages” and they are written back to the database at regular intervals.
Now, if any roll back is to be done, the only operation to be done is to discard the current
directory and treat the shadow directory as the valid directory.
One difficulty is that the new, updated pages are kept at unrelated spaces and hence the
concept of a “continuous ” database is lost. More importantly, what happens when the “new”
pages are discarded as a part of UNDO strategy? These blocks form ”garbage” in the system.
(The same thing happens when a transaction commits the new pages become valid pages, while
the old pages become garbage). A mechanism to systematically identify all these pages and
reclaim them becomes essential.
It may be noted that in all these cases, the role of the DBA becomes critical. He
normally logs into the system under a DBA account or a superuser account, which
provides full capabilities to manage the Database, ordinarily not available to the other
uses. Under the superuser account, he can manage the following aspects regarding
security.
i) Account creation: He can create new accounts and passwords to users
or user groups.
ii) Privilege granting: He can pass on privileges like ability to access
certain files or certain records to the users.
iii) Privilege revocation: The DBA can revoke certain or all privileges
granted to one/several users.
iv) Security level assignment: The security level of a particular user account
can be assigned, so that based on the policies, the users become
eligible /not eligible for accessing certain levels of information.
Another concept is the creation of “views”. While the database record may have
large number of fields, a particular user may be authorized to have information only
about certain fields. In such cases, whenever he requests for the data item, a “view” is
created for him of the data item, which includes only those fields which he is authorized
to have access to. He may not even know that there are many other fields in the
records.
The concept of views becomes very important when large databases, which
cater to the needs of various types of users are being maintained. Every user can have
and operate upon his view of the database, without being bogged down by the details.
It also makes the security maintenance operations convenient.
Chapter: 7
DATABASE RECOVEY, BACKUP & SECURITY
End Chapter quizzes
Q2. The granting of a right or privilege that enables a subject to have legitimate access to a
system or a system’s objects.
a) Authentication
b) Authorization
c) Data Unlocking
d) Data Encryption
Q3. The process of periodically taking a copy of the database and log file on to offline
storage media
a) Back up
b) Data Recovery
c) Data Mining
d) Data Locking
Q4. The encoding of the data by a special algorithm that renders the data unreadable
a) Data hiding
b) Encryption
c) Data Mining
d) Both a and c
a) Geographical Database
b) Statistical Database
c) Web Database
d) Time Database