Beruflich Dokumente
Kultur Dokumente
Information Technology
(DISTANCE MODE)
DCS 114
Database Management Systems
I SEMESTER
COURSE MATERIAL
Mr. G. Nagappan
Lecturer
Department of Computer Science and Engineering
Anna University Chennai
Chennai – 600 025
Reviewer
Dr. D. Manjula
Assistant Professor
Department of Computer Science and Engineering
Anna University Chennai
Chennai – 600 025
Editorial Board
Copyrights Reserved
(For Private Circulation only)
ACKNOWLEDGEMENT
The author also owes a special dept of gratitude to the principal and collegues of DMI
College of Engineering, Family members and friends for their constant encouragement.
The author has drawn inputs from several sources for the preparation of this course
material, to meet the requirements of the syllabus. The author gratefully acknowledge
the Following sources:
1. Ramez Elamasri and shankant B. Nowathe, “Fundamentals of Database Systems”,
Third Edition, Pearson Education Delhi, 2002.
2. Abraham silberschatz, Henry F.Korth and S. Sundarshan, “Database system
Concepts”, Fourth Edition. MC Graw Hill, 2002.
3. C.J. Data. “An Introduction to Database systems”, Seventh Edition, Pearson
Education, Delhi, 2002.
4. Web resources.
Mr.G.Nagappan,
ASSISTANT PROFESSOR,
Department of Computer Science and Engineering,
DMI College of Engineering,
Palanchur,
Chennai – 602 103.
DCS 114 DATABASE MANAGEMENT SYSTEMS
UNIT I
UNIT II
UNIT III
UNIT IV
Query and Transaction Processing: Algorithms for Executing Query Operations – using
Hermistics in Query operations – Cost Estimation – Semantic Query Optimization –
Transaction Processing – Properties of Transactions - Serializability – Transaction
support in SQL.
UNIT V
TEXT BOOK
REFERENCES
Page Nos.
UNIT 1
1.1 INTRODUCTION 1
1.2 FILE SYSTEMS VERSUS DATABASE SYSTEMS 2
1.3 DATA MODELS 6
1.4 DBMS ARCHITECTURE 11
1.5 DATA MODELING USING ENTITY-RELATIONSHIP DIAGRAMS 15
1.6 EER MODEL DESIGN EXAMPLE 22
1.7 RELATION SCHEMES AND INSTANCES 32
UNIT 2
STORAGE STRUCTURES
2.1 INTRODUCTION 37
2.2 RAID 42
2.3 FILE OPERATIONS 48
2.4 HASHING 54
2.5 INDEXING 62
UNIT 3
RELATIONAL MODEL
3.1 INTRODUCTION 78
3.2 RELATIONAL ALGEBRA 84
3.3 STRUCTURED QUERY LANGUAGE (SQL) 100
3.4 BASIC QUERIES USING SINGLE ROW FUNCTIONS 103
3.5 COMPLEX QUERIES USING GROUP FUNCTIONS 105
3.6 VIEWS 105
3.7 INTEGRITY CONSTRAINTS 109
3.8 RELATIONAL ALGEBRA AND CALCULUS 110
3.9 TUPLE RELATIONAL CALCULUS 115
DCS 114 DATABASE MANAGEMENT SYSTEMS
Page Nos.
NOTES
UNIT 1
INTRODUCTION TO DATABASE
MANAGEMENT SYSTEMS
Learning Objectives
To Learn the difference between File systems and Database systems
To learn about various Data models
To Study the Architecture of DBMS
To learn about Various Data Independence
To Learn various Data Modeling Techniques
1.1 INTRODUCTION
A database is a collection of data elements (facts) stored in a computer in such
a systematic way that a computer program can consult it to answer questions. The
answers to those questions become information that can be used to make decisions
that may not be made with the data elements alone. The computer program used to
manage and query a database is known as a database management system (DBMS).
A database is a collection of related data that we can use for
Defining (specifying types of data)
Constructing (storing & populating)
Manipulating (querying, updating, reporting)
A Database Management System (DBMS) is a software package to
facilitate the creation and maintenance of a computerized database. A Database System
(DB) is a DBMS together with the data.
1.1.1 Features of a database
It is a persistent (stored) collection of related data.
The data is input (stored) only once.
The data is organized (in some fashion).
The data is accessible and can be queried (effectively and efficiently).
Definition:
Data models are a collection of conceptual tools for describing data, data
relationships, data semantics and data constraints.
Customer Account
NOTES
NOTES
Database
Data independence means that any part of data is not dependent on any other
part of the data in the DBMS. They are no way dependent upon their physical storage
and logical arrangements. As shown in Fig. 1.5, they depend only upon various exter-
nal factors such as new hardware, new users, new technology, new linkages and link-
ages to other databases.
o Logical data independence
o change the conceptual schema without having to change the external
schemas
o Physical data independence
o change the internal schema without having to change the conceptual
schema
The DBMS architecture constitutes the Disk storage, Storage manager, Query
processor and Database users, connected through various tools and applications like
query and application tools and application programs and application interfaces.
Disk storage consists of Data in the form logical tables, indices, data dictionary
and statistical data. Data Dictionary stores the data about data i.e. its structure, etc.
Indices are used for easy searching in a data base. Statistical data is the log storage
details about various transactions which occur on the database.
The users submit query which passes to optimizer where the query is optimized,
the physical execution plan goes to execution engine.
Query Execution engine passes the request to index/ file / record manager.
That in turn passes the request to buffer manager requesting it to allocate memory to
store pages. Buffer manager in turn sends the pages to storage manager which takes
care of physical storage.
The resulting data out of physical storage comes in reverse order. The catalog
is the Data Dictionary which contains statistics and schema. Every query execution
which takes place in execution engine is logged and recovered when required.
NOTES API/GUI
Query
Optimizer
Stats
Physical plan
Logging, recovery
Exec. Engine
SchemasData/etc Requests
Catalog
Index/file/rec Mgr
Data/etc Requests
Buffer Mgr = logical
Pages Pages - physical
Storage Mgr
Data Requests
Storage
Fig 1.7 Query processing layers
Application Architectures
1.5.3 ER Notations
NOTES
Example:
The subclass is specialization of the super class and it adds additional data or
behaviours or overrides behaviours of the super class.
Super classes are generalization of their sub classes.
This is related to instances of entities that are involved in a specialization /generali-
zation relationship
If E1 specializes E2, then each instance of E1 is also an instance of E2.
Therefore Class(E1) Class(E2)
Example
Constraint on whether the entities may belong to more than one lower-level entity NOTES
set within a single generalization.
Disjoint
An entity can belong to only one lower-level entity set.
It is noted in E-R diagram by writing disjoint next to the ISA triangle.
Overlapping
An entity can belong to more than one lower-level entity set.
Completeness constraint — It specifies whether or not an entity in the
higher-level entity set must belong to at least one of the lower-level entity sets
within a generalization.
total : an entity must belong to one of the lower-level entity sets
partial: an entity need not belong to one of the lower-level entity sets
Notations
• Disjoint, Total d
• Disjoint, Partial d
• Overlapping, Total O
• Overlapping, Partial O
Example:
NOTES Keywords:
Entity, Attribute, Relationship, Entity Type, Entity Instance, Entity Class, Participation
Constraints, Weak Entity, Strong Entity, Generalization, Specialization, Subclass,
Super Class.
Short Questions:
1. Define Entity, Attribute, Relationship, Entity Type, Entity Instance, Entity
Class.
2. Differentiate Weak and Strong Entity Set.
3. What are the various notations for ER Diagrams?
4. What is participation constraint? Mention different types of participation
constraint.
5. Define Generalization and Specialization.
Descriptive Questions:
1. Explain various types of Attributes with suitable examples.
2. State different types of Participation Constraint and explain with a diagram-
matic example.
• For films, we want to store the title, year of production and type (thriller, NOTES
comedy, etc.)
• We want to know who directed and who acted in each film. Every film has one
director. We store the salary of each actor for each film
• An actor can receive an award for his part in a film. We store information about
who got which award for which film, along with the name of the award and
year.
• We also store the name and telephone number of the organization who gave
the award. Two different organizations can give an award with the same name.
A single organization does not give more than one award with a particular name
per year.
Let us see a sample ER Model before discussing it in detail
address
id
birthday
Movie
Person name
name phone
number
Organization IS
A
Won
salary
Acted Directed
Award In
year Film
year nam
e
title type
id birthday
Actor
name address
id birthday
Actor
name address
id birthday
Actor
name address
NOTES Example
title
birthday
id Actor Film
Acted year
In
name
address type
Salary
Recursive Relationships
• An entity set can participate more than once in a relationship.
• In this case, we add a description of the role to the ER-diagram.
phone
number
manager
id
Employee Manage
worker s
name
address
id Director name
id
Actor Produced Film title
name
id Actor Film
Acted In title
name
Key Constraints
• Key constraints specify whether an entity can participate in one or more than
one relationships in a relationship set.
• When there is no key constraint an entity can participate any number of times.
• When there is a key constraint, the entity can participate one time at most.
Key constraints are drawn using an arrow from the entity set to the relationship set.
• We express cardinality constraints by drawing either a directed line (),
signifying “one,” or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
NOTES One-to-Many
A film is directed by one director at most.
A director can direct any number of films.
id
Director Directed Film title
name
Many-to-Many
A film is directed by any number of directors.
A director can direct any number of films.
id Director Film
Directed title
name
NOTES
id
Director Directed Film title
name
age
father
id
Person
FatherOf
child
name
Actor
id name
id
Director produced Film title
name
id
NOTES
Director Film
Directed title
name
Example (2)
• We can combine key and participation constraints.
• What does this diagram mean?
id
Director Directed Film title
name
Keywords:
Entity, Entity Set, Attribute, Relation, Relationship set, Key Constraint, One to one,
one to many, Many to many, Recursive relations.
Descriptive Questions:
1. Explain various Relationship types with examples.
2. Compare the Recursive Relations with Participation Constraint with example.
Relational scheme
A relation scheme is the definition; i.e. a set of attributes
A relational database scheme is a set of relation schemes: i.e. a set of sets of
attributes
Relation instance (simply relation)
A relation is an instance of a relation scheme.
A relation r over a relation scheme R = {A1, ..., An} is a subset of the
Cartesian product of the domains of all attributes, i.e.
r = D (A1), D (A2, … , D (An)
1.7.1 Domains
A domain is a set of acceptable values for a variable.
Example: Names of Canadians, Salaries of professors
1.7.2 Simple/Composite domains
Address = Street Name + Street Number + City + Province +Postal Code
1.7.3 Domain compatibility
Binary operations (e.g. comparison to one another, addition, etc) can be
performed on them.
Full support for domains is not provided in many current relational DBMS.
1.7.6 Properties
Based on the set theory
No ordering among attributes and tuples
No duplicate tuples allowed
Value-oriented: tuples are identified by the attributes values
All attribute values are atomic
No repeating groups
1.7.7 Degree
It is the number of attributes that is available in a relation is called Degree of the
relation.
1.7.8 Cardinality
It is the number of tuples available in a relation.
Cardinality constraints are specified in the form l..h, where l denotes the
minimum and h the maximum number of relationships an entity can participate.
Beware: the positioning of the constraints is exactly the reverse of the
positioning of constraints in E-R diagrams.
The cardinality is 3
1.7.9 Integrity Constraints
# Key Constraints
Referring to Mark sheet relation as shown above
Key: A set of attributes that uniquely identifies tuples ( e.g. : Rno , Name )
A super key of an entity set is a set of one or more attributes whose values
uniquely determine each entity.(e.g. : Rno , Name & DOB , Rno & Name ,
Rno & Name & DOB)
A candidate key of an entity set is a minimal super key (e.g. : Rno , Name
& DOB)
Although several candidate keys may exist, one of the candidate keys with
minimum number of attributes is selected to be the primary key. ( e.g. :Rno )
# Data Constraints
~ Check constraints (e.g.: to check percentage >80)
# Others
~ Entity Integrity Constraint: No primary key value can be null. (e.g.: Rno
cannot be null)
~ Referential Integrity Constraint: A tuple in one relation must refer to an
existing tuple.
1.7.10 Characteristics of Relations
Ordering of Tuples in a Relation
Ordering of values within a tuple and alternative definition of a relation
Values in tuples: Each value in a tuple is an atomic value.
1.7.11 Advantages over other models
Simple concepts
Solid mathematical foundation
Set theory
Powerful query languages
Keywords:
Relation, Relational Scheme, Relational Instance, Domain, Degree, Cardinality, Integrity
Constraints, Candidate Key, Primary Key.
Short Questions:
1. Define the following:
a. Relation
b. Relational Scheme
c. Relational Instance
d. Domain
e. Degree
f. Cardinality
g. Integrity Constraints
h. Candidate Key
i. Primary Key
2. What is a Domain Compatibility?
3. State the properties of a Relational Model.
Descriptive Questions:
1. Design a database in Relational Model and represent it in Relational Schema.
2. Explain various constraints in Relational Model.
Summary
DBMS is used to maintain and query large datasets.
Benefits include recovery from system crashes, concurrent access, quick
application development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs and are well-paid!
DBMS R&D is one of the broadest, most exciting areas in CS.
Conceptual design follows requirements analysis, Yields a high-level
description of data to be stored
NOTES ER model popular for conceptual design Constructs are expressive, close to
the way people think about their applications.
Basic constructs: entities, relationships, and attributes (of entities and
relationships).
Some additional constructs: weak entities, ISA hierarchies, and aggregation.
Note: There are many variations on ER model.
Several kinds of integrity constraints can be expressed in the ER model: key
constraints, participation constraints, and overlap/covering constraints
for ISA Hierarchies. Some foreign key constraints are also implicit in the
definition of a relationship set.
Some constraints (notably, functional dependencies) cannot be expressed in
the ER model.
Constraints play an important role in determining the best database design for
an enterprise.
ER design is subjective. There are often many ways to model a given scenario!
Analyzing alternatives can be tricky, especially for a large enterprise.
Common choices include: Entity vs. attribute, entity vs. relationship, binary or
nary relationship, whether or not to use ISA hierarchies, and whether or not to
use aggregation.
Ensuring good database design: resulting relational schema should be analyzed
and refined further. FD information and normalization techniques are especially
useful.
References :
1. Ramez Elamasri and shankant B.Navathe, “ Fundamentals of Database
Systems”, Third Edition , Pearson Education, Delhi, 2002.
2. Abraham Silberschatz, Henry F.Korth and S.Sundarshan, “Database System
concepts”, Fourth Edition, McGrawHill, 2002.
3. C.J.Date,”An Introduction to Database Systems”, Seventh Edition, Pearson
Education, Delhi, 2002.
NOTES
UNIT 2
STORAGE STRUCTURES
Learning Objectives
To Learn about Secondary Storage Devices
To learn about RAID Technology
To Study about File Operations
To learn about Hashing Techniques
To Learn about Indexing ( Both Single and Multi level )
To study B+ Tree and Indexes on Multiple keys
2.1 INTRODUCTION
2.1.1 Storage Structure
A database file is partitioned into fixed-length storage units called
blocks. Blocks are units of both storage allocation and data transfer.
Database system seeks to minimize the number of block transfers
between the disk and memory. We can reduce the number of disk
accesses by keeping as many blocks as possible in main memory.
Secondary storage devices are used to store data permanently.
Buffer – portion of main memory available to store copies of disk
blocks.
Buffer manager – subsystem responsible for allocating buffer space in
main memory.
Volatile storage:
o does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
o survives system crashes
examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
H Life of storage
4 Volatile storage: It loses contents when power is switched off
4 Non-volatile storage:
memory.
2.1.1.2 Categories of Physical Storage Media
Cache
4 The fastest and most costly form of storage
4 Volatile
4 Managed by the computer system hardware.
Main memory
4 Fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9 seconds)
4 Generally too small (or too expensive) to store the entire database
4 Capacities of a few Gigabytes are widely used currently.
4 Capacities have gone up and per-byte costs have decreased steadily and rapidly.
4 Volatile: Contents of main memory are usually lost if a power failure or system
crash occurs.
Flash memory
4 Data survives power failure.
4 Data can be written at a location only once, but location can be erased and
written again.
Magnetic-disk
4 Data is stored on spinning disk, and it is read/written magnetically.
4 Primary medium for the long-term storage of data; typically stores entire data
base.
4 Data must be moved from disk to main memory for access, and written back for
storage.
4 Much slower access than main memory (more on this later)
Optical storage
4 Non-volatile, data is read optically from a spinning disk using a laser
4 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
4 Write-one, read-many (WORM) optical disks used for archival storage (CD-R
and DVD-R)
4 Multiple write versions also available (CD-RW, DVD-RW, and DVD-RAM)
4 Reads and writes are slower than with magnetic disk
4 Juke-box systems, with large numbers of removable disks, a few drives, and a
mechanism for automatic loading/unloading of disks available for storing large volumes
of data
Optical Disks
Compact disk-read only memory (CD-ROM)
Tape storage
4 Non-volatile, used primarily for backup (to recover from disk failure), and for
archival data
4 Sequential-access – much slower than disk
4 Very high capacity (40 to 300 GB tapes available)
4 Tape can be removed from drive storage costs much cheaper than disk, but
drives are expensive
4 Tape jukeboxes available for storing massive amounts of data
4 Hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1 petabyte
= 1012 bytes)
Magnetic Tapes
n Hold large volumes of data and provide high transfer rates
H Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT (Digital
Linear Tape) format, 100 GB+ with Ultrium format, and 330 GB with Ampex
helical scan format
H Transfer rates from few to 10s of MB/s
n Very slow access time in comparison with magnetic disks and optical disks
H Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks)
n To read/write a sector
H Disk arm swings to position head on right track
n Head-disk assemblies
H Multiple disk platters on a single spindle (typically 2 to 4)
NOTES H Seek Time – time it takes to reposition the arm over the correct track.
4 Average Seek Time is 1/2 the worst case seek time.
– Would be 1/3 if all tracks had the same number of sectors, and we
H Rotational latency – time it takes for the sector to be accessed to appear under
the head.
4 Average latency is 1/2 of the worst-case latency.
n Data-Transfer Rate – the rate at which data can be retrieved from or stored to the
disk.
H 4 to 8 MB per second is typical
4 Multiple disks may share a controller, so rate that controller can handle is also
important
2.1.5 Mean Time To Failure (MTTF) – the average time the disk is expected to run
continuously without any failure.
4 Typically 3 to 5 years
4 Probability of failure of new disks is quite low, corresponding to a “theoretical
MTTF” of 30,000 to 1,200,000 hours for a new disk.
2.2 RAID
2.2.1 RAID: Redundant Arrays of Independent Disks
o Disk organization techniques that manage a large numbers of disks,
provide a view of a single disk of
high capacity and high speed by using multiple disks in
parallel, and
high reliability by storing data redundantly, so that data can
be recovered even if a disk fails
The chance that some disk out of a set of N disks will fail is much higher than
the chance that a specific single disk will fail.
E.g. a system with 100 disks, each with MTTF of 100,000
hours (approx. 11 years), will have a system MTTF of 1000
hours (approx. 41 days)
o Techniques for using redundancy to avoid data loss are critical with
large numbers of disks
NOTES Requests for different blocks can run in parallel if the blocks reside
on different disks
A request for a long sequence of blocks can utilize all disks inparallel
2.2.2 RAID Levels
o Schemes to provide redundancy at lower cost by using disk striping
combined with parity bits
Different RAID organizations, or RAID levels, have differing
cost, performance and reliability characteristics
RAID Level 0: Block striping; non-redundant.
o Used in high-performance applications where loss of data is not
critical.
RAID Level 1: Mirrored disks with block striping
o Offers best write performance.
o Popular for applications such as storing log files in a database
system.
RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit
striping.
RAID Level 3: Bit-Interleaved Parity
o a single parity bit is enough for error correction, not just detection,
since we know which disk has failed
When writing data, corresponding parity bits must also be
computed and written to a parity bit disk.
To recover data in a damaged disk, compute XOR of bits
from other disks (including parity bit disk).
o Faster data transfer than with a single disk, but fewer I/Os per
second since every disk has to participate in every I/O.
o Subsumes Level 2 (provides all its benefits at lower cost).
RAID Level 4: Block-Interleaved Parity; uses block-level striping, and
keeps a parity block on a separate disk for corresponding blocks from N
other disks.
o When writing data block, corresponding block of parity bits must
also be computed and written to parity disk.
o To find the value of a damaged block, compute XOR of bits from
corresponding blocks (including parity block) of other disks.
o Provides higher I/O rates for independent block reads than Level 3
NOTES
NOTES
Free Lists
o Store the address of the first deleted record in the file header.
o Use this first record to store the address of the second deleted
record, and so on
o Can think of these stored addresses as pointers since they “point” to
the location of a record.
o More space efficient representation: reuse space for normal at-
tributes of free records to store pointers. (No pointers stored in in-
use records.)
NOTES Record types that allow variable lengths for one or more fields.
Record types that allow repeating fields (used in some older
data models).
o Byte string representation
Attach an end-of-record () control character to the end of
each record
Difficulty with deletion
Difficulty with growth
Variable-Length Records: Slotted Page Structure
Pointer method
A variable-length record is represented by a list of fixed-length records,
chained together via pointers.
It can be used even if the maximum record length is not known.
Disadvantage to pointer structure; space is wasted in all records except the first
in a a chain.
Solution is to allow two kinds of block in file:
Anchor block – contains the first records of chain
Overflow block – contains records other than those that are the first records
of chairs.
NOTES the result specifies in which block of the file the record should be placed
Records of each relation may be stored in a separate file. In a clustering
file organization, records of several different relations can be stored in the
same file
Motivation: store related records on the same block to minimize I/O
NOTES
NOTES
o Worst Hash function maps all search-key values to the same bucket; this
makes access time proportional to the number of search-key values in the
file.
o An ideal hash function is uniform, i.e. each bucket is assigned the same
number of search-key values from the set of all possible values.
o Ideal hash function is random, so each bucket will have the same number
of records assigned to it irrespective of the actual distribution of search-
key values in the file.
o Typical hash functions perform computation on the internal binary repre-
sentation of the search-key.
o For example, for a string search-key, the binary representations
of all the characters in the string could be added and the sum modulo
the number of buckets could be returned. .
2.4.3 Handling of Bucket Overflows
o Bucket overflow can occur because of
Insufficient buckets
NOTES
NOTES Multiple entries in the bucket address table may point to a bucket.
Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically due to
coalescing and splitting of buckets.
2.4.6.2 Use of Extendable Hash Structure
o Each bucket j stores a value ij; all the entries that point to the
same bucket have the same values on the first ij bits.
o To locate the bucket containing search-key Kj:
o 1. Compute h(Kj) = X
o 2. Use the first i high order bits of X as a displacement into
bucket address table, and follow the pointer to appropriate
bucket
o To insert a record with search-key value Kj,
follow same procedure as look-up and locate the bucket, say j.
o If there is room in the bucket j, insert record in the bucket.
o Otherwise the bucket must be split and insertion re-attempted.
Overflow buckets used instead in some cases.
2.4.6.3 Example of Extendable Hash Structure
In this structure, i2 = i3 = i, whereas i1 = i – 1
NOTES
Hash structure after insertion of one Brighton and two Downtown records
NOTES
Fig 2.10 Hash structure after insertion of Redwood and Round Hill records
NOTES
NOTES
multiple keys.
n Records in a relation are assumed to be numbered sequentially from, say, 0.
o All the search-keys in the subtree to which P1 points are less NOTES
than K1.
o For 2 i n – 1, all the search-keys in the subtree to which Pi
points have values greater than or equal to Ki–1 and less than
Km–1.
Example of a B+-tree
NOTES o Insertions and deletions to the main file can be handled efficiently,
as the index can be restructured in logarithmic time (as we shall
see).
2.5.5.3 Queries on B+-Trees
Find all records with a search-key value of k.
o Start with the root node
Examine the node for the smallest search-key value
> k.
If such a value exists, assume it is Kj. Then follow
Pi to the child node.
Otherwise k Km–1, where there are m pointers in
the node. Then follow Pm to the child node.
o If the node reached by following the pointer above is not a leaf
node, repeat the above procedure on the node, and follow the
corresponding pointer.
o Eventually reach a leaf node. If for some i, key Ki = k follow
pointer Pi to the desired record or bucket. Otherwise, no
record with search-key value k exists.
In processing a query, a path is traversed in the tree from the root to
some leaf node.
If there are K search-key values in the file, the path is no longer than
logn/2(K).
A node is generally the same size as a disk block, typically 4 kilo-
bytes, and n is typically around 100 (40 bytes per index entry).
With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
Contrast this with a balanced binary free with 1 million search key
values — around 20 nodes are accessed in a lookup.
o above difference is significant since every node access may need
a disk I/O, costing around 20 milliseconds!
Updates on B+-Trees: Insertion
o Find the leaf node in which the search-key value would appear.
o If the search-key value is already there in the leaf node, record is
added to file and if necessary a pointer is inserted into the bucket.
o If the search-key value is not there, then add the record to the main
file and create a bucket if necessary. Then:
o The removal of the leaf node containing “Downtown” does not NOTES
result in its parent having too little pointers. So the cascaded
deletions stop with the deleted leaf node’s parent.
NOTES o Insertion and deletion are handled in the same way as insertion
and deletion of entries in a B+-tree index.
Fig 2.14 Nonleaf node – pointers Bi are the bucket or file record pointers.
Key words :
Secondary storage , Magnetic Disk , cache , main memory , optical , primary , second-
ary , tertiary , hashing , indexing , B Tree , B+ Tree , Node , Leaf Node , Bucket
Indexes can be classified as clustered vs. unclustered, primary vs. secondary, NOTES
and dense vs. sparse. Differences have important consequences for utility/
performance.
Understanding the nature of the workload for the application, and the
performance goals, is essential to develop a good design.
What are the important queries and updates? What attributes/relations are
involved?
Indexes must be chosen to speed up important queries (and perhaps some
updates!).
Index maintenance overhead on updates to key fields.
Choose indexes that can help many queries, if possible.
Build indexes to support index-only strategies.
Clustering is an important decision; only one index on a given relation can
be clustered!
Order of fields in composite index key can be important.
References :
1. 1.Ramez Elamasri and shankant B.Navathe, “ Fundamentals of
Database Systems”, Third Edition , Pearson Education, Delhi, 2002.
2. 2.Abraham Silberschatz, Henry F.Korth and S.Sundarshan , “Data-
base System concepts “, Fourth Edition , McGrawHill , 2002.
3. 3.C.J.Date,”An Introduction to Database Systems”,Seventh
Edition,Pearson Education,Delhi, 2002.
NOTES
UNIT 3
RELATIONAL MODEL
Learning Objectives
To Learn Relational Model Concepts
To learn about Relational Algebra
To Study SQL , Queries , Views and Constraints
To learn about Relational Calculus
To Have idea of Commercial RDBMSs
To Learn to design Databases with functional dependencies
To learn about various Normal Forms and Database Tuning
_____________________________________________________________
3.1 INTRODUCTION
3.1.1 Data Models
o A data model is a collection of concepts for describing data.
o A schema is a description of a particular collection of data, using a given
data model.
o The relational model of data is the most widely used model today.
o Main concept: relation, basically a table with rows and columns.
o Every relation has a schema, which describes the columns, or fields
(that is, the data’s structure).
3.1.2 Relational Database: Definitions
o Relational database: a set of relations
o Relation: made up of 2 parts:
o Instance : a table, with rows and columns.
o Schema : specifies name of relation, plus name and type of each
column.
E.G. Students(sid: string, name: string, login: string, age:
integer, gpa: real).
o It can think of a relation as a set of rows or tuples that share the same structure.
NOTES Bottom-up
o Consider relationships between attributes.
o Build up relations.
o It is also called design by synthesis.
3.1.6.1 Informal Measures for Design
Semantics of the attributes.
o Design a relation schema so that it is easy to explain its meaning.
o A relation schema should correspond to one semantic object (entity
or relationship).
o Example: The first schema is good due to clear meaning.
Faculty (name, number, office)
Department (dcode, name, phone)
or
Faculty_works (number, name, Salary, rank, phone, email)
Reduce redundant data
o Design has a significant impact on storage requirements.
o The second schema needs more storage due to redundancy.
Faculty and Department
or
Faculty_works
Avoid update anomalies
Relation schemes can suffer from update anomalies
Insertion anomaly
1) Insert new faculty into faculty_works
o We must keep the values for the department consistent
between tuples
2) Insert a new department with no faculty members into faculty_works
o We would have to insert nulls for the faculty info.
o We would have to delete this entry later.
Deletion anomaly
o Delete the last faculty member for a department from the faculty_works
relation.
o If we delete the last faculty member for a department from the database,
all the department information disappears as well.
o This is like deleting the department from the database.
name manf
Beers
Set theoretic
Relation as table Domain — set of values
Rows = tuples like a data type
Columns = components Cartesian product (or product)
Names of columns = attributes D1 D 2 . .. Dn
n-tuples (V1,V2,...,Vn)
Set of attribute names = schema
s.t., V1 D 1, V2 D 2,...,Vn D
n
REL (A1,A2,...,An) Relation-subset of cartesian product
of one or more domains
A1 A2 A3 ... An FINITE only; empty set allowed
C Attributes Tuples = members of a relation inst.
a a1 a2 a3 an Arity = number of domains
r Components = values in a tuple
Domains — corresp. with attributes
d b1 b2 a3 cn Cardinality = number of tuples
i Tuple
n a1 c3 b3 bn
a .
l .
i . Component
t
y x1 v2 d3 wn
Arity
Fig 3.2 Relation as Table
attributes
(or columns)
customer-name customer-street customer-city
Jones Main Harrison
Smith North Rye tuples
Curry North Rye (or rows)
Lindsay Park Pittsfield
customer
Fig 3.3 Relation instances
Name Address Telephone
Bob 123 Main St 555-1234
Bob 128 Main St 555-1235
Pat 123 Main St 555-1235
Harry 456 Main St 555-2221
Sally 456 Main St 555-2221
1 7
23 10
Relation r: A B C
10 1
20 1
30 1
40 2
A,C (r) A C A C
1 1
1 1
1 = 2
2
Relations r, s: A B A B
1 2
2 3
1
s
r
r s: A B
1
2
1
3
For r s to be valid.
r, s must have the same arity (same number of attributes)
The attribute domains must be compatible (e.g., 2nd column
of r deals with the same type of values as does the 2nd
column of s)
E.g. to find all customers with either an account or a loan
customer-name (depositor) customer-name (borrower)
1 2
2 3
1
s
r
r – s: A B
1
1
Relations r, s: A B C D E
1 10 a
2 10 a
20 b
r 10 b
s
r x s:
A B C D E
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
NOTES r x s = {t q | t r and q s}
Assume that attributes of r(R) and s(S) are disjoint. (That is,
R S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
3.2.6 Composition of Operations
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
A=C(r x s) A B C D E
1 10 a
2 20 a
2 20 b
Find the names of all customers who have a loan at the Perryridge
branch.
Query 1
customer-name(branch-name = “Perryridge” (
borrower.loan-number = loan.loan-number(borrower x loan)))
Query 2
customer-name(loan.loan-number = borrower.loan-number
((branch-name = “Perryridge”(loan)) x borrower))
Find the largest account balance
Rename account relation as d
The query is:
balance(account) - account.balance
(account.balance < d.balance (account x d (account)))
NOTES o E1 - E2
o E1 x E2
o p (E1), P is a predicate on attributes in E1
o s(E1), S is a list consisting of some of the attributes in E1
o x (E1), x is the new name for the result of E1
3.2.9.1 Additional Operations
Additional operations do not add any power to the
relational algebra, but simplify common queries.
Set intersection
Natural join
Division
Assignment
Set-Intersection Operation
Notation: r s
Defined as:
r s ={ t | t r and t s }
Assume:
o r, s have the same arity
o Attributes of r and s are compatible.
Note: r s = r - (r - s)
Example :
Relation r, s: A B A B
1 2
2 3
1
rs r s
A B
2
Example :
Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s A B C D E
1 a
1 a
1 a
1 a
2 b
Fig 3.12 Natural Join Operation
r s = { t | t R-S(r) u s ( tu r ) }
Example 1 :
Relations r, s: A B
B
1 1
2 2
3
1
1 s
1
3
4
6
1
2
r s: A r
Example 2 : Relations r, s: A B C D E D E
a a 1 a 1
a a 1 b 1
a b 1 s
a a 1
a b 3
a a 1
a b 1
a b 1
r
r s: A B C
a
a
Property NOTES
o Let q – r s
o Then q is the largest relation satisfying q x s r
Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S R
r s = R-S (r) –R-S ( (R-S (r) x s) – R-S,S(r))
To see why
Assignment Operation
The assignment operation () provides a convenient way to express com-
plex queries.
o Write query as a sequential program consisting of
a series of assignments
followed by an expression whose value is displayed as a
result of the query.
o Assignment must always be made to a temporary relation variable.
Example: Write r s as
CN(BN=“Downtown”(depositor account))
CN(BN=“Uptown”(depositor account))
Relation r: NOTES
A B C
7
7
3
10
g sum(c) (r)
sum-C
27
branch-name balance
Perryridge 1300
Brighton 1500
Redwood 700
Fig 3.16 Aggregate Operation - group by
Relation loan
Relation borrower
customer- loan-number
Jones L-170
Smith L-230
Hayes L-155
Inner Join
loan Borrower
It is possible for tuples to have a null value, denoted by null, for some of their
attributes.
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
Aggregate functions simply ignore null values.
o It is an arbitrary decision. It could have returned null as result
instead.
o We follow the semantics of SQL in its handling of null values.
For duplicate elimination and grouping, null is treated like any other value,
and two nulls are assumed to be the same.
o Alternative: assume each null is different from each other
o Both are arbitrary decisions, so we simply follow SQL.
Comparisons with null values return the special truth value unknown.
o If false was used instead of unknown, then not (A < 5)
would not be equivalent to A >= 5
Three-valued logic using the truth value unknown:
o OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
o AND: (true and unknown) = unknown,
Insertion NOTES
To insert data into a relation, we either:
o specify a tuple to be inserted
o write a query whose result is a set of tuples to be inserted
in relational algebra, an insertion is expressed by:
r r E
where r is a relation and E is a relational algebra expression.
The insertion of a single tuple is expressed by letting E be a constant relation
containing one tuple.
Examples
Insert information in the database specifying that Smith has
$1200 in account A-973 at the Perryridge branch.
account account {(“Perryridge”, A-973, 1200)}
depositor depositor {(“Smith”, A-973)}
Provide as a gift for all loan customers in the Perryridge
branch, a $200 savings account. Let the loan number serve
as the account number for the new savings account.
r1 (branch-name = “Perryridge” (borrower loan))
account account branch-name, account-number,200 (r1)
depositor depositor customer-name, loan-number(r1)
Updating
NOTES Examples
It can be
H Used stand-alone within a DBMS command
H Embedded in triggers and stored procedures
H Used in scripting or programming languages
Example: NOTES
CREATE TABLE PROJECT (ProjectID Integer Primary Key, Name
Char(25) Unique Not Null, Department VarChar (100) Null, MaxHours
Numeric(6,1) Default 100);
3.3.5 Constraints
Constraints can be defined within the CREATE TABLE statement, or they can be
added to the table after it is created using the ALTER table statement.
Five types of constraints:
H PRIMARY KEY may not have null values.
H UNIQUE may have null values.
H NULL/NOT NULL
H FOREIGN KEY
H CHECK
Example:
CREATE TABLE PROJECT (ProjectID Integer Primary Key, Name
Char(25) Unique Not Null, Department VarChar (100) Null, MaxHours
Numeric(6,1) Default 100);
Example
ALTER TABLE ASSIGNMENT ADD CONSTRAINT EmployeeFK FOR-
EIGN KEY (EmployeeNum) REFERENCES EMPLOYEE
(EmployeeNumber) ON UPDATE CASCADE ON DELETE NO ACTION;
DROP TABLE statement removes tables and their data from the database.
A table cannot be dropped if it contains foreign key values needed by other table.
H Use ALTER TABLE DROP CONSTRAINT to remove integrity constraints in
the other table.
Example:
H DROP TABLE CUSTOMER;
H ALTER TABLE ASSIGNMENT DROP CONSTRAINT ProjectFK;
Require quotes around values for Char and VarChar columns, but no quotes for
Integer and Numeric columns.
AND may be used for compound conditions.
IN and NOT IN indicate ‘match any’ and ‘match all’ sets of values, respectively.
Wildcards _ and % can be used with LIKE to specify a single or multiple unknown
characters, respectively.
IS NULL can be used to test for null values
Example: SELECT Statement
SELECT Name, Department, MaxHours FROM PROJECT;WHERE
Name=”XYX”;
3.3.8.3 Sorting the Results
3.3.9
INSERT INTO Statement
The order of the column names must match the order of the values.
Values for all NOT NULL columns must be provided.
No value needs to be provided for a surrogate primary key.
It is possible to use a select statement to provide the values for bulk inserts
from a second table.
Examples:
– INSERT INTO PROJECT VALUES (1600, ‘Q4 Tax Prep’, ‘Accounting’, 100);
– INSERT INTO PROJECT (Name, ProjectID) VALUES (‘Q1+ Tax Prep’, 1700);
months_between(date1, date2)
1. select empno, ename, months_between (sysdate, hiredate)/12 from emp;
2. select empno, ename, round((months_between(sysdate, hiredate)/12),
0) from emp;
3. select empno, ename, trunc((months_between(sysdate, hiredate)/12),
0) from emp;
add_months(date, months)
Trunc(num_col,size) NOTES
1. select trunc(123.26516, 3) from dual;
sqrt(num_column)
1. select sqrt(100) from dual;
3.5 COMPLEX QUERIES USING GROUP FUNCTIONS
3.5.1 Group Functions
There are five built-in functions for SELECT statement:
1. COUNT counts the number of rows in the result.
2. SUM totals the values in a numeric column.
3. AVG calculates an average value.
4. MAX retrieves a maximum value.
5. MIN retrieves a minimum value.
Result is a single number (relation with a single row and a single column).
Column names cannot be mixed with built-in functions.
Built-in functions cannot be used in WHERE clauses.
Example: Built-in Functions
1. Select count (distinct department) from project;
2. Select min (maxhours), max (maxhours), sum (maxhours) from project;
3. Select Avg(sal) from emp;
3.5.2 Built-in Functions and Grouping
GROUP BY allows a column and a built-in function to be used together.
GROUP BY sorts the table by the named column and applies the built-in
function to groups of rows having the same value of the named column.
WHERE condition must be applied before GROUP BY phrase.
Example
1. Select department, count (*) from employee where employee_number
< 600 group by department having count (*) > 1
3.6 VIEWS
3.6.1 Base relation
A named relation whose tuples are physically stored in the database is called
as Base Relation or Base Table.
3.6.2 Definition
It is the tailored presentation of the data contained in one or more tables.
NOTES A view is a virtual relation that does not actually exist in the database but is
produced upon request by a particular user, at the time of request.
A view is defined using the create view statement which has the form
create view v as <query expression
where <query expression> is any legal relational algebra query expression.
The view name is represented by v.
Once a view is defined, the view name can be used to refer to the virtual
relation that the view generates.
View definition is not the same as creating a new relation by evaluating the
query expression.
Rather, a view definition causes the saving of an expression; the expression is
substituted into queries using the view.
3.6.3 Usage
Provide a powerful and flexible security mechanism by hiding parts of the
database from certain users.
The user is not aware of the existence of any attributes or tuples that are missing
from the view.
Permit users to access data in a way that is customized to their needs, so that
different users can see the same data in different ways, at the same time.
Simplify (for the user) complex operations on the base relations.
3.6.4 Advantages
The following are the advantages of views over normal tables.
1. They provide additional level of table security by restricting access to the data
2. They hide data complexity.
3. They simplify commands by allowing the user to select information from mul-
tiple tables.
4. They isolate applications from changes in data definition.
5. They provide data in different perspectives.
3.6.5 Creation of a view
The following SQL Command is used to create a View in Oracle.
SQL> CREATE [OR REPLACE] VIEW <VIEW-NAME> AS < ANY
VALID SELECT STATEMENT>;
Anna University Chennai 106
DCS 114 DATABASE MANAGEMENT SYSTEMS
Example: NOTES
SQL> CREATE VIEW V1 AS SELECT EMPNO, ENAME, JOB FROM
EMP;
WHERE (CONDITION);
branch-name
(branch-name = “Perryridge” (all-customer))
3.7.4.1 Insert. If a tuple t2 is inserted into r2, the system must ensure that there is a
tuple t1 in r1 such that t1[K] = t2[]. That is
t2 [] K (r1)
3.7.4.2 Delete. If a tuple, t1 is deleted from r1, the system must compute the set of
tuples in r2 that reference t1:
= t1[K] (r2)
If this set is not empty, the delete command is rejected as an error, or the tuples
that reference t1 must themselves be deleted (cascading deletions are possible).
• Set operators are binary and will only work on two relations or sets of data.
• Can only be used on union compatible sets
R (A1, A2, …, AN) and S(B1, B2, …, BN) are union compatible if:
degree (R) = degree (S) = N
domain (Ai) = domain (Bi) for all i
NOTES
Sailors Reserves
sid sname rating age sid bid day
22 Dustin 7 45.0 22 101 10/10/96
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96
Example: NOTES
Faculty (fnum, name, office, salary, rank)
• σ salary > 27000 (Faculty)
• σ rank = associate (Faculty)
• σ fnum = 34567 (Faculty)
• σ(salary>26000) and (rank=associate) (Faculty)
• σ(salary<=26000) and (rank !=associate) (Faculty)
• σmax(salary) (Faculty)
Example:
Faculty (fnum, name, office, salary, rank)
• name, office (Faculty)
• fnum, salary (Faculty)
NOTES Example:
Faculty (fnum, name, office, salary, rank)
Associates σrank = associate(Faculty)
Result name (Associates)
Example Queries
NOTES Find the names of all customers having a loan, an account, or both at the bank:
{t | s borrower( t[customer-name] = s[customer-name])
u depositor( t[customer-name] = u[customer-name])
Find the names of all customers who have a loan and an account
at the bank:
{ c | l ({ c, l borrower
b,a( l, b, a loan b = “Perryridge”))
a( c, a depositor
b,n( a, b, n account b = “Perryridge”))}
Find the names of all customers who have an account at all
branches located in Brooklyn:
{ c | s, n ( c, s, n customer)
x,y,z( x, y, z branch y = “Brooklyn”)
a,b( x, y, z account c,a depositor)}
Find the names of all customers having a loan at the Perryridge branch:
{t | s borrower(t[customer-name] = s[customer-name]
u loan(u[branch-name] = “Perryridge”
u[loan-number] = s[loan-number]))}
Find the names of all customers who have a loan at the
Perryridge branch, but no account at any branch of the bank:
Find the names of all customers who have an account at all branches located in
Brooklyn:
NOTES
{t | c customer (t[customer.name] = c[customer-name])
s branch(s [branch-city] = “Brooklyn”
u account ( s [branch-name] = u [branch-name]
s depositor ( t [customer-name] = s [customer-name]
s [account-number] = u[account-number] )) )}
Safety of Expressions
Find the names of all customers who have a loan from the
Perryridge branch and the loan amount:
{ c, a | l ( c, l borrower b( l, b, a loan
b = “Perryridge”))}
or { c, a | l ( c, l borrower l, “Perryridge”, a loan)}
NOTES Find the names of all customers having a loan, an account, or both at the
Perryridge branch:
{ c | l ({ c, l borrower
b,a( l, b, a loan b = “Perryridge”))
a( c, a depositor
b,n( a, b, n account b = “Perryridge”))}
Find the names of all customers who have an account at all
branches located in Brooklyn:
{ c | s, n ( c, s, n customer)
x,y,z( x, y, z branch y = “Brooklyn”)
a,b( x, y, z account c,a depositor)}
Safety of Expressions
{ x1, x2, …, xn | P(x1, x2, …, xn)}
is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom(P)
(that is, the values appear either in P or in a tuple of a relation mentioned in P).
2. For every “there exists” subformula of the form x (P1(x)), the subformula
is true if and only if there is a value of x in dom(P1) such that P1(x) is true.
3. For every “for all” subformula of the form x (P1 (x)), the subformula is
true if and only if P1(x) is true for all values x from dom (P1).
3.11 RELATIONAL DATABASE DESIGN
Functional Dependency is a particular relationship between two attributes. It
is a relation as for ever valid instance of one attribute , that value of attribute
uniquely determines the value of other attribute. For any relation R , attribute
A is functionally dependent on A. A B.
3.11.1 Functional Dependencies Definition
1 4
1 5
3 7
o On this instance, A B does NOT hold, but B A does hold.
NOTES o We use normal form tests to determine the level of normalization for
the scheme.
3.13.2 Pitfalls in Relational Database Design
Relational database design requires that we find a “good” collection of
relation schemas. A bad design may lead to
o Repetition of Information.
o Inability to represent certain information.
Design Goals:
o Avoid redundant data.
o Ensure that relationships among attributes are represented.
o Facilitate the checking of updates for violation of database integrity
constraints.
Example
o Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
3.13.3 Redundancy:
Data for branch-name, branch-city, assets are repeated for each loan
that a branch makes.
o Wastes space
o Complicates updating, introducing possibility of inconsistency of
assets value
o Null values
o Cannot store information about a branch if no loans exist
o Can use null values, but they are difficult to handle.
A B A B
1 1
2 2
1 B(r)
A(r)
r
A B
A (r) B (r)
1
2
1
2
1NF
2NF
3NF
Boyce-Codd NF
4NF
5NF
Atomicity is actually a property of how the elements of the domain are used.
o E.g. Strings would normally be considered indivisible
o Suppose that students are given roll numbers which are strings of the
form CS0012 or EE1127
o If the first two characters are extracted to find the department, the
domain of roll numbers is not atomic.
o Doing so is a bad idea: leads to encoding of information in
application program rather than in the database.
NOTES R is in 3NF
JK L JK is a superkey
L K K is contained in a candidate key
BCNF decomposition has (JL) and (LK)
Testing for JK L requires a join
o There is some redundancy in this schema
o Equivalent to example in book:
Banker-schema = (branch-name, customer-name, banker-name)
banker-name branch name
branch name customer-name banker-name
NOTES J L K
j1 l1 k1
j2 l1 k1
j3 l1 k1
null l2 k2
BUT: NOTES
Space overhead: for storing the materialized view
Time overhead: Need to keep materialized view up to date when
relations are updated
Database system may not support key declarations on
materialized views.
3.18.2 Multivalued Dependencies
There are database schemas in BCNF that do not seem to be sufficiently
normalized.
Consider a database
o classes(course, teacher, book)
such that (c,t,b) classes means that t is qualified to teach c, and b
is a required textbook for c
The database is supposed to list for each course the set of teachers any one
of which can be the course’s instructor, and the set of books, all of which are
required for the course (no matter who teaches it).
NOTES
course teacher
database Avi
database Hank
database Sudarshan
operating systems Avi
operating systems Jim
teaches
course book
database DB Concepts
database Ullman
operating systems OS Concepts
operating systems Shaw
text
Fig 3.23 Decomposed Tables
We shall see that these two relations are in Fourth Normal Form (4NF).
NOTES
In our example:
course teacher
course book
The above formal definition is supposed to formalize the notion that given a
particular value of Y (course) it has associated with a set of values of Z (teacher)
and a set of values of W (book), and these two sets are in some sense indepen-
dent of each other.
Note:
o If Y Z then Y Z
o Indeed we have (in above notation) Z1 = Z2
The claim follows.
NOTES ourselves only with relations that satisfy a given set of functional and multivalued
dependencies.
If a relation r fails to satisfy a given multivalued dependency, we can construct a
relations r that does satisfy the multivalued dependency by adding tuples to r.
Theory of MVDs
From the definition of multivalued dependency, we can derive the following rule:
o If , then
That is, every functional dependency is also a multivalued dependency.
The closure D+ of D is the set of all functional and multivalued dependencies
logically implied by D.
o We can compute D+ from D, using the formal definitions of functional depen-
dencies and multivalued dependencies.
o We can manage with such reasoning for very simple multivalued dependen-
cies, which seem to be most common in practice.
o For complex dependencies, it is better to reason about sets of dependencies
using a system of inference rules .
3.18.5 Fourth Normal Form
A relation schema R is in 4NF with respect to a set D of functional and
multivalued dependencies if for all multivalued dependencies in D+ of the form
, where R and R, at least one of the following hold:
o is trivial (i.e., or = R)
o is a superkey for schema R
If a relation is in 4NF it is in BCNF.
3.18.6 Restriction of Multivalued Dependencies
The restriction of D to Ri is the set Di consisting of
o All functional dependencies in D+ that include only attributes of Ri
o All multivalued dependencies of the form
( Ri)
where Ri and is in D+
3.18.7 4NF Decomposition Algorithm
result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri
Fig 3.28 The loan Relation Fig 3.29 The branch Relation
NOTES
NOTES
Short Answers :
1. Define Data model.
2. Define Schema.
3. Define Relational model.
4. Write a short on the following.
a. Two design approches
b. Informal Measures for Design
c. Modifications of the Database
d. BASIC QUERIES USING SINGLE ROW FUNCTIONS
e. COMPLEX QUERIES USING GROUP FUNCTIONS
f. INTEGRITY CONSTRAINTS
g. Functional Dependencies
h. Multivalued Dependencies
Answer in Detail :
1. Explain Basic Structure of Relational Model.
2. Explain Relational Algebra Operations.
3. Explain Formal Definition of Relational Algebra.
4. Explain Aggregate Functions and Operations.
5. Explain STRUCTURED QUERY LANGUAGE.
6. Explain VIEWS & View Operations.
7. Explain RELATIONAL ALGEBRAAND CALCULUS Operations.
8. Explain Tuple Relational Calculus.
9. Explain Domain Relational Calculus.
10. Explain Relational Database Design.
11. Explain Normalization – Types of Normal forms.
Summary :
Relation is a tabular representation of data.
Simple and intuitive, currently the most widely used.
Integrity constraints can be specified by the DBA, based on application
semantics. DBMS checks for violations.
Two important ICs: primary and foreign keys
In addition, we always have domain constraints.
NOTES
UNIT 4
4.1 INTRODUCTION
4.1.1 Basic Principles of Query Execution
o Many DB operations require reading tuples, tuple vs. previous tuples, or
tuples vs. tuples in another table.
o Techniques generally used for implementing operations:
Iteration: for/while loop comparing with all tuples on disk
Index: if comparison of attribute that’s indexed, look up matches in
index & return those
Sort/merge: iteration against presorted data (interesting orders)
Hash: build hash table of the tuple list, probe the hash table
Must be able to support larger-than-memory data
n Evaluation
The query-execution engine takes a
query-evaluation plan, executes that plan, and returns the answers to the query.
n Optimizer
The selection of optimal execution plan
4.1.2.1 Optimization
n A relational algebra expression may have many equivalent expressions
evaluation-plan.
E.g., can use an index on balance to find accounts with balance < 2500,
or can perform complete relation scan and discard accounts with balance
2500
n Query Optimization: Amongst all equivalent evaluation plans choose the one
selection condition.
n Algorithm A1 (linear search). Scan each file block and test all records to see
è For A V(r) use index to find first index entry v and scan
index sequentially from there, to find pointers to records.
è For AV (r) just scan leaf pages of index finding pointers to
records, till first entry > v
NOTES Hash joins usually done with static number of hash buckets
Generally have fairly long chains at each bucket
What happens when memory is too small?
Hash Join
Read entire inner relation into hash table (join attributes as key)
For each tuple from outer, look up in hash table & join
Very efficient, very good for databases
Not fully pipelined
Supports equijoins only
Delay-sensitive
Other Operators
Duplicate removal very similar to grouping
All attributes must match
No aggregate
Union, difference, intersection:
o Read table R, build hash/search tree
o Read table S, add/discard tuples as required
Cost: b(R) + b(S)
4.2 OVERVIEW OF COST ESTIMATION & QUERY OPTIMIZATION
A query plan: algebraic tree of operators, with choice of algorithm for each op
Two main issues in optimization:
For a given query, which possible plans are considered?
o Algorithm to search plan space for cheapest (estimated) plan
How is the cost of a plan estimated?
o Ideally: Want to find best plan
o Practically: Avoid worst plans!
4.2.1 The System-R Optimizer:
Establishing the Basic Model
Most widely used model; works well for < 10 joins
Cost estimation: Approximate art at best
o Statistics, maintained in system catalogs, used to estimate cost of
operations and result sizes
o Considers combination of CPU and I/O costs
Plan Space: Too large, must be pruned
o Only the space of left-deep plans is considered.
o Left-deep plans allow output of each operator to be pipelined into the NOTES
next operator without storing it in a temporary relation.
o Cartesian products are avoided
Schema for Examples
o Reserves:
Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
o Sailors:
Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
4.2.2 Query Blocks: Units of Optimization
An SQL query is parsed into a collection of query blocks, and these
are optimized one block at a time.
Nested blocks are usually treated as calls to a subroutine, made
once per outer tuple.
Relational Algebra Equivalences
Allow us to choose different join orders and to ‘push’ selections and
projections ahead of joins.
4.2.2.1 Selections:
More Equivalences
A projection commutes with a selection that only uses attributes retained by
the projection.
Selection between attributes of the two arguments of a cross-product
converts cross-product to a join.
A selection on ONLY attributes of R commutes with R ? S:
(R ? S) (R) ? S
If a projection follows a join R ? S, we can “push” it by retaining only
attributes of R (and S) that are needed for the join or are kept by the
projection
4.2.2.2 Enumeration of Alternative Plans
There are two main cases:
o Single-relation plans
o Multiple-relation plans
For queries over a single relation, queries consist of a combination of selects,
projects, and aggregate ops:
o Each available access path (file scan / index) is considered, and the one
with the least estimated cost is chosen.
NOTES The different operations are essentially carried out together (e.g., if an index is
used for a selection, projection is done for each retrieved tuple, and the resulting
tuples are pipelined into the aggregate computation).
4.2.3 Cost Estimation
For each plan considered, must estimate cost:
Must estimate cost of each operation in plan tree.
o Depends on input cardinalities.
Must also estimate size of result for each operation in tree!
o Use information about the input relations.
For selections and joins, assume independence of predicates.
Assume can calculate a “reduction factor” (RF) for each selection predicate.
4.2.3.1 Cost Estimates for Single-Relation Plans
Index I on primary key matches selection:
o Cost is Height(I)+1 for a B+ tree, about 1.2 for hash index.
Clustered index I matching one or more selects:
o (NPages(I)+NPages(R)) * product of RF’s of matching selects.
Non-clustered index I matching one or more selects:
o (NPages(I)+NTuples(R)) * product of RF’s of matching selects.
Sequential scan of file:
o NPages(R).
Example
Given an index I on rating: (RF= 1/NKeys(I)
o (1/NKeys(I)) * NTuples(R) = (1/10) * 40000 tuples retrieved
Clustered index: (1/NKeys(I)) * (NPages(I)+NPages(R)) = (1/10) *
(50+500) pages are retrieved
Unclustered index: (1/NKeys(I)) * (NPages(I)+NTuples(R)) = (1/10) *
(50+40000) pages are retrieved
Given an index on sid:
o Would have to retrieve all tuples/pages. With a clustered index, the
cost is 50+500, with unclustered index, 50+40000
A simple sequential scan:
o We retrieve all file pages (500)
4.3 SEMANTIC QUERY OPTIMIZATION
Remember a relational query does not specify how the system should
compute the result, but it should also compute a meaningful result, that is why
supplier ( S# integer not null, Sname char (30), Saddress char(60), ..........)
cardinality = 100
supplies-parts ( S# integer, P# integer)
cardinality = 10,000
part ( P# integer not null, Pname char(20) - not needed )
Select Distinct S.Sname
from S, SP
where S.S# = SP.S#
and SP.P# = ‘p2’.
50 parts are for p2 therefore the output will be only 50 tuples
READ tuple from disk to memory 50 microseconds
RESTRICT (select function per tuple) 20 microseconds
JOIN (per tuple) 30 microseconds
PROJECT (per tuple ) 10 microseconds
Answer:
Unoptimised Optimised
read S & SP read SP only
join S to SP restrict for ‘p2’
restrict for ‘p2’ read S only
project S.Sname join S with SP’
project S.Sname
Total time
SQL Optimization
o Query optimization, access path selection, compilation
Access path selection
o based on statistics accumulated
o cost of an access path is the weighted sum of calculated IO + CPU
cost
IO cost : function of the estimated number of pages read for a query
CPU cost: function of the estimated number of calls to the DB2 component
NOTES is retrieved and compared with each tuple in the indexed relation (which uses
the indexes to access the tuples directly + is efficient if index fits into
memory)
Database Performance
In DB2 the aim is to provide maximum concurrency with minimal
o number of instructions executed by CPU
o number of I/O operations
o time to perform I/O operation
o failure due to resource overcommitment
To achieve this = 4 aspects
o SQL Optimization
o optimized internal design of DB2
o facility to the user for application design
o tuning facilities
DBMS Tuning
With the help of Database Tuning, Performance of a system is determined by
a number of factors such as use ability, portability and timing.
Tuning is Measurement by direct observation or by programmed processes
of activity at various points in a computer system to find out where bottle-
necks and delays are taking place.
DB2 generates statistics which can be used for tuning.
User then uses SQL to repartitioning base tables, creating new indexes etc.
Run time tuning : e.g. buffer space allocation is also possible
Focuses on:
o Utilisation of resources
o Responsiveness to enquires
o general productivity
Interest:
o Computer Management : summary reports
o Database Manager/Administrator : timing reports -> capacity planning
o Research / System Designers: very detailed timing reports
result
result
interpretation
interpretation
system
tuning
environmental
change user
user
changes
changes
IBM Kit
database
response time
extraction file
Database tool
System SAS
database
buffer
file
log tapes
daily
performance
reports
capacity monthly
planning performance SAS
reports
NOTES Metrics
o Responsiveness:
o time (user input -> user output)
o Response time, I/O time
o influences: I/O calls, Q length, buffer, paging rates, locality of access
o Utilisation
o ratio length of time used by particular system component for interval
o device time/ total process time
o influence: program size, paging rules, overflow techniques, unbalanced
scheduling
o Productivity
o information processed over time
o influences : time slice quantum, switching time, scheduling and multi-
processing, contention
o Cost
o hardware prices, development time, depreciation over time
o only used in benchmarking
4.3.6 System Components
Basic components which effect performance
CPU
o In processor bound applications, the throughput of a CPU is very
important, especially so in batch processing.
o Multi-processor systems try to distribute workload over many
processors, but there still may be contention for other resources.
Disk Subsystem
o A low performance disk subsystem can waste CPU - i.e... under utilized.
o VLDB demand large and fast disk systems.
o Methods of organizing disk allocation at the physical level has a great
impact on performance as disk organization at the logical level
Terminal Subsystems
Workstations and users influences tend to be overlooked in performance
studies.
especially important if processing utilizes workstations
TPNS for IBM
Operating Systems
o OS can provide some services to DBMS e.g. locking which effect NOTES
performance.
o OS can actually impair performance, some DBMS choose to ignore
some sub-components of OS.
DBMS Application Software
o most relational systems comply to a minimal SQL standard but
performance of each system differs greatly because of different
application types.
o remember!
o measurement and analysis of overall system performance is difficult
because of many inteorperating components.
o so standard benchmarks were developed to test most of these
components in a particular type of environment.
4.4 TRANSACTION PROCESSING & PROPERTIES
4.4.1 Transaction Concept
A transaction is a unit of program execution that accesses and possibly
updates various data items.
A transaction must see a consistent database.
During transaction execution the database may be inconsistent.
When the transaction is committed, the database must be consistent.
Two main issues to deal with:
o Failures of various kinds such as hardware failures and system
crashes
o Concurrent execution of multiple transactions
NOTES Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
4.4.2.1 Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1. read( A)
2. A:= A- 50
3. write( A)
4. read(B)
5. B:= B+ 50
6. write( B)
Consistency requirement — the sum of A and B is unchanged by the execution of the
transaction.
Atomicity requirement — if the transaction fails after step 3 and before step 6, the
system should ensure that its updates are not reflected in the database, or else an
inconsistency will result.
Durability requirement — once the user has been notified that the transaction has
completed (i.e. the transfer of the $50 has taken place), the updates to the database by
the transaction must persist despite failures.
Isolation requirement — if between steps 3 and 6, another transaction is allowed to
access the partially updated database, it will see an inconsistent database (the sum A+
B will be less than it should be).
memory
WAL(A) LOG(A)
1st
begin commit
A=A+1
time
2nd
Read(A) Write(A)
disk
memory
WAL(A) LOG(A)
1st
begin A=A+1 commit
tim
2nd
Read(A) Write(A)
disk
NOTES All updates are made on a shadow copy of the database, and db pointer is
made to point to the updated shadow copy only after the transaction reaches
partial commit and all updated pages have been flushed to disk.
In case transaction fails, old consistent copy pointed to by db pointer can be
used, and the shadow copy can be deleted.
4.4.5.4. Schedules
Schedules are sequences that indicate the chronological order in which instructions
of concurrent transactions are executed.
A schedule for a set of transactions must consist of all instructions of those NOTES
transactions.
Must preserve the order in which the instructions appear in each individual
transaction
Example Schedules
Let T1 transfer $50 from A to B, and
T2 transfer 10% of the Balance from A to B.
The following is a Serial Schedule in which
T1 is followed by T2.
Let T1 and T2 be the transactions defined
previously. The schedule 2 is not a serial
schedule, but it is equivalent to Schedule 1. Figure 4.8 Schedule -1
We say that a schedule Sis conflict serializable if it is conflict equivalent to a serial NOTES
schedule.
Example of a schedule that is not conflict serializable:
We are unable to swap instructions in the above
schedule to obtain either the serial schedule < T3, T4 >,
or the serial schedule < T4, T3 >.
Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2
follows T1, by a series of swaps of non-conflicting instructions. Therefore Schedule
3 is conflict serializable.
4.5.3. View Serializability
Let Sand S0 be two schedules with the same set of transactions. Sand S0 are
view equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule
S, then transaction Ti must, in schedule S0, also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read( Q) in schedule S, and
that value was produced by transaction Tj (if any), then transaction Ti must in
schedule S0 also read the value of Q that was produced by transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write
(Q) operation in schedule S must perform the final write (Q) operation in sched-
ule S0.
As can be seen, view equivalence is also based purely on reads and writes
alone.
A schedule Sis view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable.
The following schedule is view-serializable but not conflict serializable.
Every view serializable schedule, which is not conflict serializable, has blind writes.
4.5.4. Precedence graph: It is a directed graph where the vertices are the transactions
(names). We draw an arc from Ti to Tj if the two transactions conflict, and Ti accessed
NOTES the data item on which the conflict arose earlier. We may label the arc by the item that
was accessed.
Example :
Summary :
Many DB operations require reading tuples, tuple vs. previous tuples, or tuples vs.
tuples in another table.
Basic Steps in Query Processing are Parsing , translation , Evaluation and Optimizer
Basic Query Operators are One-pass operators and Multi-pass operators.
A query plan is algebraic tree of operators, with choice of algorithm for each operator.
With the help of Database Tuning, Performance of a system is determined by a
number of factors such as use ability, portability and timing.
ACID Properties are Atomicity,Consistency,Isolation and Durability.
Questions :
1. Explain the properties of transaction.
2. Define ACID.
3. Differentiate One pass and Two pass Operators.
4. Define Database Tuning.
5. Explain Transaction failures.
6. Explain Serializability, Query processing, Optimization , Recoverability.
NOTES References :
NOTES
UNIT 5
CONCURRENCY, RECOVERY AND
SECURITY
Learning Objectives
To Learn various Locking Techniques
To learn about Time Stamp Ordering
To Study various Validation Techniques
To learn about Granularity of Data Items
To Study about Recovery Concepts
To Learn about Shadow paging and Log Based Recovery
To learn about Data Security Issues such as Access Control , Statistical
Database security
A lock is a mechanism to control concurrent access to a data item Data items can be
locked in two modes:
1. Exclusive (X) mode. Data item can be both read as well as written. X-lock
is requested using lock-X instruction.
2. Shared (S) mode. Data item can only be read. S-lock is requested using
lock-S instruction.
NOTES Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
conflict serializable.
5.1.6.2. Lock Conversions
Two-phase locking with lock conversions:
First Phase:
_ can acquire a lock-S on item
_ can acquire a lock-X on item
_ can convert a lock-S to a lock-X (upgrade)
Second Phase:
_ can release a lock-S
_ can release a lock-X
_ can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the programmer to insert the various
locking instructions.
5.1.6.3. Automatic Acquisition of Locks
A transaction Ti issues the standard read/write instruction, without explicit locking calls.
The operation read( D) is processed as:
if Ti has a lock on D
then
read( D)
else
begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read( D)
end;
write( D) is processed as:
if Ti has a lock-X on D
then
write( D)
else
begin
5.1.6.6 Locks
Exclusive
Shared
Granularity
pA
time
pB
• Deadlock happens in DBMS also due to
blockage of resources by few processes.
read R read S
process
R
process
• DBMS maintains 'Wait For' graphs
read S
showing processes waiting and resources
read R
S held.
wait on S wait on R
• Furthermore:
– most recent data
– no.+how far back depends on disk space
– number of rollback segments for uncommitted tx.
– Log of committed transactions
• Failures can leave database in a corrupted or inconsistent state -weakening
Consistency and Durability features of transaction processing.
• Oracle level of failures:
– statement: causes relevant tx. to be rolled back - database returns to
previous state releasing resources
• Log archiving provides the power to do on-line backups without shutting down
the system, and full automatic recovery. It is obviously a necessary option for
operationally critical database applications.
Use this information during recovery to find blocks that may be inconsistent,
and only compare copies of these.
Used in hardware RAID systems
If either copy of an inconsistent block is detected to have an error (bad
checksum), overwrite it by the other copy. If both have no error, but are
different, overwrite the second block by the first block.
5.4.3 Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main memory.
Block movements between disk and main memory are initiated through the
following two operations:
o input(B) transfers the physical block B to main memory.
o output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
x2 disk
x1
y1
work area work area
of T1 of T2
memory
Transaction failure
Transaction abort
Forced by local transaction error
Disk failure
Disk copy of DB lost
Any transactions, which have committed before a checkpoint will not have to NOTES
have their WRITE operations redone in the event of a crash since we can be
guaranteed that all updates were done during the checkpoint operation.
Consists of:
Suspend all executing transactions
Force-write all modified buffers to disk
Write a [checkpoint] record to log
Force the log to disk
Resume executing transactions
5.6.2. Transaction Rollback
If data items have been changed by a failed transaction, the old values must be
restored.
If T is rolled back, if any other transaction has read a value modified by T, then that
transaction must also be rolled back…. cascading effect.
5.6.3. Two Main Techniques
Deferred Update
No physical updates to db until after a transaction commits.
During the commit, log records are made then changes are made permanent on
disk.
What if a Transaction fails?
o No UNDO required
o REDO may be necessary if the changes have not yet been made permanent
before the failure.
Immediate Update
Physical updates to db may happen before a transaction commits.
All changes are written to the permanent log (on disk) before changes are made
to the DB.
What if a Transaction fails?
o After changes are made but before commit – need to UNDO the changes
o REDO may be necessary if the changes have not yet been made permanent
before the failure.
o Transaction is allowed to commit before all its changes are written to the
database
o (Note that the log files would be complete at the commit point)
NOTES Whenever any page is about to be written for the first time
o A copy of this page is made onto an unused page.
o The current page table is then made to point to the copy
o The update is performed on the copy
To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
o keep a pointer to the shadow page table at a fixed (known) location
on disk.
o to make the current page table the new shadow page table, simply
update the pointer to point to current page table on disk
Once pointer to shadow page table has been written, transaction is commit-
ted.
No recovery is needed after a crash — new transactions can start right
away, using the shadow page table.
Pages not pointed to from current/shadow page table should be freed
(garbage collected).
Advantages of shadow-paging over log-based schemes
o no overhead of writing log records
o recovery is trivial
Disadvantages : NOTES
o Copying the entire page table is very expensive
n Can be reduced by using a page table structured like a B+-tree
n No need to copy entire tree, only need to copy paths in the tree
that lead to updated leaf nodes
o Commit overhead is high even with above extension
n Need to flush every updated page, and page table
o Data gets fragmented (related pages get separated on disk)
o After every transaction completion, the database pages containing old
versions of modified data need to be garbage collected
o Hard to extend algorithm to allow transactions to run concurrently
Easier to extend log based schemes
5.7.2 Recovery With Concurrent Transactions
We modify the log-based recovery schemes to allow multiple transactions to
execute concurrently.
All transactions share a single disk buffer and a single log
transactions
We assume concurrency control using strict two-phase locking;
i.e. the updates of uncommitted transactions should not be visible
to other transactions
Otherwise how to perform undo if T1 updates A, then T2 updates
log.
n The checkpointing technique and actions taken on recovery
have to be changed
since several transactions may be active when a checkpoint is
performed.
n Checkpoints are performed as before, except that the check
point log record is now of the form
< checkpoint L>
where L is the list of transactions active at the time of the
checkpoint
When Ti finishes it last statement, the log record <Ti commit> is written. NOTES
We assume that log records are written directly to stable storage (that is, they
are not buffered)
Two approaches using logs
o Deferred database modification
o Immediate database modification
NOTES
5.8.3 Checkpoints
Tc Tf
T1
T2
T3
T4
system failure
checkpoint
Fig 5.14 Check point
T1 can be ignored (updates already output to disk due to checkpoint)
T2 and T3 redone.
T4 undone
• database administrators
non-computer-based non-computer-based
controls computer-based
data hardware
DBMS communication link user
O.S.
controls
controls
NOTES • Passwords
– low storage overhead
– many passwords and users forget them - write them down!!
– user time high - type in many passwords
– held in file and encrypted.
• Lists
– initial password entry to system
– user name checked against control list
– the access control list has very limited access, superuser
– if many users and applications and data then list can be large
• READ, WRITE, MODIFY access controls
• Restrictions at many levels of granularity
– Database Level: ‘add a new DB’
– Record Level: ‘delete a new record’
– Data Level: ‘delete an attribute’
• Remember there are overheads with security mechanisms
• views
– subschema
– dynamic result of one or more relational operations operating on
base relations to produce another relations
– virtual relation - doesn’t exist but is produce at runtime
• back-up
– periodic copy of database and log file (programs) onto offline
storage
– stored in secure location
• Journaling
– keeping log file of all changes made to database to enable recovery
in the event of failure
• Check pointing
– synchronization point where all buffers in the DBMS are force-
written to secondary storage
• integrity (see later)
• encryption
– data encoding by special algorithm that render data unreadable
without decryption key
• File systems identify certain access privileges on files, E.g., read, write,
execute.
• In partial analogy, SQL identifies six access privileges on relations, of which
the most important are:
1. SELECT = the right to query the relation.
2. INSERT = the right to insert tuples into the relation – may refer to one
attribute, in which case the privilege is to specify only one column of the
inserted tuple.
3. DELETE = the right to delete tuples from the relation.
4. UPDATE = the right to update tuples of the relation – may refer to one
attribute.
NOTES a) If you have been given a privilege by several different people, then all
of them have to revoke in order for you to lose the privilege.
b) Revocation is transitive.
Example :
If A granted P to B, who then granted P to C, and then A revokes P from B, it is as if
B also revoked P from C.
Summary :
A lock is a mechanism to control concurrent access to a data item
THE TWO-PHASE LOCKING PROTOCOL ensures conflict-
serializable schedules. So in order to avoid conflicts between the various
schedules
Failure Classifications are Transaction failure, System crash and Disk failure
DBMS recovery subsystem are Active transactions , Committed transac-
tions since last checkpoint and Aborted transactions since last checkpoint
Recovery and Atomicity are log-based recovery, and shadow-paging
Log record buffering is to buffer log records in main memory, instead of
being output directly to stable storage.
Database maintains an in-memory buffer of data blocks
The deferred database modification scheme records all modifications to the
log, but defers all the writes to after partial commit.
The immediate database modification scheme allows database updates of an
uncommitted transaction to be made as the writes are issued
Streamline recovery procedure by periodically performing check pointing
Integrity constraints are Implicit Constraints , Relational constraints ,Domain
constraints and Referential constraints
Questions :
1. Define Lock.
2. Define Two phase locking protocol.
3. Define Check point.
4. Explain System failures.
5. Explain Integrity Constraints.
6. Explain Database Modifications.
7. Explain Various Recovery procedures.
References :
NOTES
NOTES
NOTES
NOTES
NOTES
NOTES