Beruflich Dokumente
Kultur Dokumente
1. INTRODUCTION
The first DBMS appeared during the 1960's at a time in human history where
projects of momentous scale were being contemplated, planned and engineered.
Never before had such large datasets been assembled in this new technology.
Problems on the floor were identified and solutions were researched and
developed - often in real-time.
The DBMS became necessary because the data was far more volatile than had
earlier been planned, and because there were still major limiting factors in the
costs associated with data storage media. Data grew as a collection, and it also
needed to be managed at a detailed transaction by transaction level. In the
1980's all the major vendors of hardware systems large enough to support the
evolving needs of evolving computerized record keeping systems of larger
organizations, bundled some form of DBMS with their system solution.
The first DBMS species were thus very much vendor specific. IBM as usual led
the field, but there were a growing number of competitors and clones whose
database solutions offered varying entry points into the bandwagon of
computerized record keeping systems.
1.1.1. Database
1.1.2. DBMS
A. Casual end users: who occasionally access the database but they need
different information each time.
B. Parametric end user: make up a sizable portion of the database end user
their main job function involves constantly querying and updating the
database. By using standard types of queries and updates called canned
transaction tat have been carefully programmed and tested. Such as bank
tellers' checks accounts balances, withdraws and deposits.
C. Sophisticated end users: includes engineers, scientist, and business
analyst who toughly familiarize with the facilities of the DBMS so as to
implement their application to meet the complex requirement.
2. DATA MODELS
Data model is a set of concepts that can be used to describe the structure of the
data base.
The actual data in the database may change frequently. In a data base changes
occur every time. We add a new student or entry a new grade for a student. The
data in the database at the particular moment of time is called the database state
or instance or snapshot.
Mapping between two levels is specifying by the any of the two languages. In
some DBMS VDL (view definition language) is used to specify the users view
and their mapping to the conceptual schema. But in most DBMS, DDL is used to
• Retrieval
• Insertion
• Deletion
• Modification
Menu based interfaces: these interfaces present the user within list of options,
called menus, which lead the user through the formulation of a request. The
Graphical user interface: GUI displays a schema to the user. User can then
specify a query by manipulating the diagram. Most GUI uses a pointing device as
mouse to pick up the certain part of the displayed schema.
Natural language interface: natural language interface refers to the world in its
schema as well as a set of standard word to interpret the request. If the
interpretation is successful, the interface generate a high level query
corresponding to the natural language request and submit it to the DBMS for
processing.
Interfaces for parametric users: parametric users, such as bank teller, often
have a small set of operations that they must perform repeatedly. System analyst
and programmer designed and implement a special interface for parametric user.
They generate keys by which that command automatically runs.
Interfaces for the DBA: the DBA staff uses these interfaces. These commands
are for creating accounts, setting system parameters, granting account
authorization, changing a schema and reorganizing the storage structure of a
database.
2.7.5. Pre-compiler
Extracts DML commands from an application program written in a host language.
Then commands send to the DML compiler for compilation of object code.
• Database application
• Application program
Entities: the basic object that the ER model represents is an entity. The entity
may be an object with a physical existence –a particular person, car, house or
employee or it may be an object with conceptual existence – a company, a job or
a universally course.
Simple or atomic attribute: Attributes that are not divisible are called simple or
atomic attribute.
Single valued attributes: most attribute have a single value for a particular
entity, such attribute are called single valued attribute. Ex: (age) single valued
attribute for person.
Multi valued attributes: the attributes, which may have more than one value.
Colors attributes of a car. Car with one color have a single value where cars may
have multiple values. Such attributes are called multi-valued attributes.
Stored attributes: in some cases two attributes values are related. Ex: age and
birth date of person. The value of an age can be determined by the current data
and the value of the person’s birth date. The age attribute is called the derived
attribute and the birth date is called the stored attribute.
Null values: in some cases a particular entity may not have appropriate value for
an attribute. Ex: apartment number
Entity sets: the collection of all entities of a particular entity type in the database
at any point in time is called an entity sets.
Key attributes: an entity type usually have key attribute whose values are
distinct for each individual entity in the collection. Such an attribute is called the
key attribute.
Values sets (domain of attribute) each simple attribute of an entity type is
associated with a value set (or domain of value), which specify the set of values
that may be assigned to that attribute for each individual entity.
Ex: employee
Age specify in the range 16 to 70.
Role name: each entity type that participates in a relationship type plays a
particular role in relationship. The role name specify the role that a particular
entity from the entity play in each relationship instances and helps to explain
what the relationship means.
Recursive relationship: Role name is not important where all the participating
entity type is distinct, since each entity type name can be used as the role name.
Employee and supervisor entities are the member of the same employee entity
types.
Weak entity type: The entity types that do not have key attribute are called weak
entity type. Weak entity type some times called the child entity type.
Regular/ strong entity type: that have key attribute are called the regular or
strong entity type. Identifying entity type is also some time called the parent entity
type or dominant entity type.
Symbols Meaning
2.8.5. Generalization
We think of a reverse process of abstraction in which we suppress the
differences among several entity type, identifying their common features and
generalize them into a single super class.
2.8.6. Aggregation
Aggregation is an abstraction concept for building composite object from their
component objects. There are calls where this concept can be used and related
to EER module.
• Where we aggregate attribute value of an object to form the whole object.
• When we represent an aggregate relationship as an ordinary relationship.
• Combining objects that are related by a particular relation instances.
3. RELETIONAL MODEL
The relational model represents the database as a collection of relations.
Relation is thought of as a table of values, each row in the table represents a
collection of related data values.
In relational model, each row in the table corresponds to entity or relationship. In
a relational model concept, a row is called a tuples, columns are called as
attributes, and the table is called a relation.
The data type describing the type of values that can appear in each column is
called a domain.
Domain:
The domain D is a set of atomic values. Atomic means that each value in the
domain is indivisible.
USA_phone_number- 10 .digits
Relation schemas:
R is denoted as R(A1,A2,A3.......An)
R is the relation name
Ai attributes for I=1,2,3,.....n
Domain constraints: it specifies that the value of each attribute must be atomic
value.
Key constraints: a relation is defined as a set of tuples. All elements of sets are
distinct. Hence all tuples in the relation must be distinct. No two tuple can have
the same combination of all their attribute values.
Entity integrity constraints: no primary key value can be null, because it is
used to identify the individual tuples n a relation.
Referential integrity constraints: it is specified between two relations and is
used to maintain the consistency among tuples of the two relations. It is based on
the foreign key concepts.
Delete operation: it is used to delete a tuple from a relation if the tuple is being
deleted as referenced by the foreign key from other tuple in the database. We
use condition to delete the tuple.
Update operation: is used to change the value of one or more attribute in a tuple
of relation R.
3.4.1. Union
The result of this operation is denoted by the R U S, is a relation that includes all
tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.
R U S = S U R {commutative operation}
Select salesman 'ID", name
From sales_master
Where city =.mumbai.
union
Select client "ID" , name
From client_master
Where city =.mumbai.;
3.4.2. Intersection
The result of this operation is denoted by RП S is a relation that include all tuples
that are in both R and S.
R X (join condition) S
There are some categories of join operations.
1. Cartesian product (cross product) or (cross Join); the main difference
between the Cartesian product and join, in join, only combination of tuples
satisfy the join condition appear in the result.
2. equi join: where only comparison operator is used =, is called the equi
join. Each pair of attributes with identical value is spurious. Removal of
spurious tuples is followed by natural join R * S.
3.4.7. COUNT
This function is used to count tuples and attributes.
3.4.8. Grouping
This is used to group the attribute of any relation.
Only tuples from R that have matching tuple in S will be selected in the result and
without matched tuples are eliminated. Null tuples also eliminated.
A set of operations, called outer join can be used when we want to keep all the
tuples in R and S or in both. The relations whether they match or not.
Outer join is used to take the union of tuples of twp relations, if the relation is not
union compatible. Then they are called partially compatible. Only some of their
attributes are union compatible. This type of attribute has a key for both the
relation.
Outer union:
Student (name, SSN, department, advisor)
Faculty (name, SSN, department, Rank)
Result (name, SSN, department, advisor, Rank)
All the tuples of both the relation will appear in the result.
{t | cond(t)}
Result will display the set of all tuple t that satisfy cond(t).
Ex: find all employees whose salary > 50,000.
Formula:
Formula is made up of predicate calculus atoms which can be one of the
followings.
1. An atom of the form R(ti)
where R relation name
ti tuple variable
R(ti) identifies the range of the tuple variable ti as the relation whose
name is R
2. An atom of the form ti. A op tj.B
where op comparison operator
set = { > < >= <= #}
ti and tj are tuple variable
A attribute of the relation on which ti ranges
B attribute of the relation on which tj ranges
3. An atom of the form ti.A op c or c op tj.B
where op comparison operator
ti and tj are tuple variable
A attribute of the relation on which ti ranges
B attribute of the relation on which tj ranges
C constant value
A formula is made up of one or more atoms connected via the logical operator
AND, OR, NOT, and is defined as follows
1. Every atom is a formula.
2. If f1 & f2 are formulas, then so are ( f1 AND f2), ( f1 OR f2), ( f1 NOT f2)
and NOT (f2)
Firstly we need to define the concept of free and bound tuples in formulas.
Bound: a tuple variable t is bound if it is quantified meaning that it appear in an
and
Free: otherwise it is free.
We can define the tuple variable in a formula as free and bound according to the
following rule.
1. An occurrence of a tuple variable in a formula F that is an atom is free in
F.
2. An occurrence of a tuple variable t is free or bound in formula made up of
logical connectives. (f1 AND f2), ( f1 OR f2), ( f1 NOT f2) and NOT (f2)
depending on whether it is free or bound in f1 and f2. a tuple variable may be
free or bound either in f1 or in f2.
3. All free occurrence of a tuple variable t in f are bound in a formula f of the form.
F.= ( f) or
F.= (F)
The tuple variables are quantifier specified in f.
F1= d.dname=.research.
F2= ( d.dname=t.DNO)
F3= ( d.mgrssn=.12345677)
Tuple variable d is free in both f1 & f2. where it is found to the universal
quantifier in f3.
t is bound to the quantifier in f2.
1. if f is formula then so
is ( f)
where t tuple variable
4. DATABASE DESIGN
Conceptual database design gives us a set of relational schemas and integrity
constraints (ICs) that can be regarded as a good starting point for the final
database design. This initial design must be refined by taking the ICs in to
account more fully than is possible with just the ER model constructs and also by
considering performance criteria and typical workloads.
We concentrate on an important class of constraints called functional
dependencies. Other kind of ICs, for example, multi-valued dependencies and
join dependencies, also provide useful information. They can sometimes reveal
redundancies that cannot be detected using functional dependencies alone.
4. Spurious tuples:
Spurious tuples are those tuples which give the wrong information. The spurious
tuples are marked by asterisks (*).
Example:
Emp_loc (ename, plocation)
Emp_proj (ssn, pno, hours, pname, plocation)
This means that the value of Y component of a tuple is depend on, or determine
by the value of X components or vise-versa.
X called the left hand side of the FD
Y called the right hand of the FD
X functionally determines the Y in a relation R if and only if whenever two
tuples of r(R) agree on their x value and agree on y values.
1. X is a candidate key. Because the key constraints imply that not two
tuples will have the same value of X.
2. if X Y in R, this does not say whether or not Y X in R.
F={ssn {ename,bdate,address,dnumber},
Dnumber {dname,dmgrssn}}
Ssn {dname,dmgrssn},
Ssn ssn,
Dnumber dname
4.3. NORMALIZATION
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF
table which does not have multiple overlapping candidate keys is guaranteed to
be in BCNF. Depending on what its functional dependencies are, a 3NF table
with two or more overlapping candidate keys may or may not be in BCNF.
I=1UmRi=R
R = {R1UR2UR3………………….Rm}
This is called attribute preservation condition of decomposition.
Word loss in loss-less refers to the loss of information, not loss of tuples. If
decomposition does not have loss-less join property. We may get additional
spurious tuples.
An employee may work on several projects and several dependent. But project
and dependent are independent to each other. To make the relation consistent,
we must have a separate tuple to represent every combination of an employee’s
dependent and employee project. This constraint is specified as multi-valued
dependency.
Example:
Example:
Foreign key constraints cannot specify as FD’s or MVD’s. Because it relates
attributes across relations. It can be specified as inclusive dependencies.
Inclusive dependency is also used to represent the constraints between two
relations.
An inclusive dependency
R.X<S.Y between two relation (set of attributes)
X of relation R
And Y of relation S
5. TRANSACTION MANAGEMENT
And is determine by statements or system calls of the form begin transaction and
end transaction.
The transaction consist of all the operation between begin and end. To ensuring
the integrity of data, we require that the database must maintain the following
properties.
1. Read(X): which transfer the data item X from the database to local buffer
belonging to the transaction that execute the read operation.
2. Write(X): that execute the write back to the database.
Example:
Ti that transfer $50 from account A to account B.
Ti:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE (B)
Initial value of A and b are 1000$ and 2000$. Suppose the system failure occurs
after the write (A) and before .Then the account information
A=950$
B=2000$
Active: the initial state, the transaction stays in this state while executing.
Partially committed: after the final statement has been executed.
Failed: after the discovery that normal executing can no longer proceed.
Aborted: after the transaction has been rolled back and the database has
been restored the prior state.
Committed: after successful completion.
Example:
Hardware or logical errors, such as a transaction must be rolled back, then
entered the aborted state, at this point system has two options.
Example:
Consider the set of transaction that access and updates the bank account.
Let T1 and T2 be two transactions.
T1:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)
T2:
READ(A)
CASE1.
If T1 followed by T2
A=855$
B=2145$
CASE2
If T2 followed by T1
A=850$
B=2150$
5.5. Schedule
Execution sequences are called as schedules that show the order of transaction
execution. These schedules are called serial schedule. Each serial schedule
consists of a sequence of instruction from the various transactions, where the
instruction belonging to the single transaction appears together in the execution.
If two transactions are running concurrently, the CPU switches between the two
transactions or shared among all the transaction.
Since both instructions are write operation. The order of this instruction does not
affect Ti and Tj. But the value obtained by the next read (Q) instruction of S is
affected.
We sat that Ii and Ij conflict if they are operation by different transaction on the
same data item, and at least one of these instructions is a write operation.
Serial schedule is defined as the all the instruction of any transaction executes
together.
It is similar to conflict Serializability and based on the only read and write
operations of transactions. Consider the two schedule S and S’, where the same
set of transaction participates in both schedule. The schedule and S’ are said to
view Serializability, is they satisfy the following three conditions:
5.7. Recoverability
Still we are discussing about which schedule will ensure the consistency of the
database and which will not. With assuming that there is no transaction failure
now, we address the effect of transaction failure during concurrent execution.
Transaction Ti that fails, for what ever reason and we need to undo the effect
of Ti to ensure atomicity property. In a system that allows concurrent execution.
Tj that is dependent upon on Ti.( Tj reads the data item written by the Ti) also
aborted.
That’s why we need to place some restrictions on that schedules.
T10 writes a value that is read by T11. Suppose T10 fails, T10 must be rolled
back. Since T11 is dependent on T10, T11 and T12. Then all the remaining
transaction must be rolled back.
The phenomenon in which a single transaction failure leads to a series of
transaction rollbacks is called Cascading roll back. It is desirable that cascading
roll backing should not be occurs in a schedule. Such schedules are called as
cascade less schedule. For every pairs of transactions, such as Ti and Tj, where
Tj reads the data item written by the Ti, the execution of Ti must finish before Tj.
Then it is easy to identify that recoverable schedule is cascade less schedule.
Schedule S
Tj reads a value written by Ti { Ti Tj}
If schedule S is view serializable then any schedule S’ i.e equivalent to schedule
S.
Tk executes write(Q)
Then in schedule S’
Tk Ti
Either
Tj Tk
It can not appear between Ti and Tj.
6. CONCURRENCY CONTROL
When several transactions executes concurrently in the database, the isolation
property may no longer preserved. It is necessary for the system to control the
interaction among concurrent transaction. These types of controls are termed as
concurrency control schemes.
6.1.1. Locks
There are various modes in which a data item may be locked.
Share mode: if a transaction Ti has share mode lock (denoted by S) on the data
item Q, then Ti can read but can not write Q.
Exclusive mode: (denoted by X) then Ti can perform both read and write on Q.
T2: LOCK-S(A)
READ(A)
UNLOCK(A)
LOCK-S(B)
UNLOCK(B)
DISPLAY(A+B);
Initial amount
A=100$
B=200$
Case1. T1 followed by T2
Case 2. T2 followed by T1
Case 3.
This situation is called deadlock. When deadlock occurs, the system must roll
back on of the two transactions. The data item that was locked by that
transaction is unlocked. These data items were available to other transactions.
T1 will wait
T2 lock-S(Q)( has)
T1 lock-X(Q) (wait)
T3 lock-S(Q)( request)
T4 lock-S(Q)( request)
Advantages:
1. Unlocking may access earlier that leads to shorter waiting time and to
increase concurrency.
2. Protocol is deadlock free, no roll backs are required.
Disadvantages:
1. 1. Locking results in increased locking
2. 2. Additional waiting time
3. 3. Potential decrease in concurrency
Time-stamp:
With each transaction Ti in the system, we associated a unique fixed time stamp
denoted by TS (Ti).
This time stamp assigned by the database system before the transaction Ti starts
execution.
If TS(Ti) T0 transaction Ti
There are various types of failure that occurs in the system. Each of which deals
with in a different manners.
Simple failure: does not loss of information in a system.
Difficult failure: of information in a system.
Here we consider only the following types of failure:
There are two types of error that may cause transaction to fail.
Logical error: transaction can no longer proceed with its normal execution. Due
to such as bad input, data not found, overflow or resource limit exceeded.
System crash: Such as bug in the database software, operating system fails,
that causes loss of contents of volatile storage.
Disk failure: Disk blocks loses its contents, either head crash. To recover this
types of failure , tapes are used.
Transaction identifier:
It is a unique identifier of the transaction that performs write operation.
Old value:
Value of the data item prior to the write operation.
New value:
Value of the data item will have after the write operation.
Log record exist to record significant events during transaction processing.
< Ti, start> transaction Ti has started.
T0:
READ(A)
A:=A-50;
WRITE(A)
READ(B)
B:=B+50;
WRITE(B)
T1:
READ(C)
C:=C-100;
WRITE(C)
The instruction between client and server might proceed as follows during
processing of an SQL query.
1. The client passes a users query and decomposition it into a number of
independent site queries. Each site query is sent to the appropriate
receiver site.
2. Each server processes the local query and sends the resulting relation to
the client site.
3. The client site combines the result of the sub queries to improve the result
of the originally submitted query. In this approach SQL server has called a
database processor (DP) or a back-end machine whereas the client has
been called as application processor (AP) or front-end machine.
The DDBMS, it is to divide the software modules into the three levels.
Availability: If one site fails then the relation may found on the other site. This
system may continue the process.
Increased parallelism: where the majority of access to the relation r result in
only the reading the relation. Then the several sites can process the queries
involving r in parallel. Then there is the chance that needed data is found when
the transaction is executing.
Increased overhead on update:
The system must ensure that all replicas of a relation r are consistent; otherwise
error ness computations may result. Whenever r is updated, the update must be
propagating to all sites containing replicas.
Each transaction locks all the data item before it begin it execution.
Disadvantages:
i. It is often hard to predict, before the transaction begins, what data item
need to be locked.
ii. Data item utilization will be very low, since many of data items may be
locked but unused for a long time.
Disadvantages:
i. One or more transactions involved in deadlock.
The Relational Model defines two root languages for accessing a relational
database -- Relational Algebra and Relational Calculus. Relational Algebra is a
low-level, operator-oriented language. Creating a query in Relational Algebra
involves combining relational operators using algebraic notation. Relational
Calculus is a high-level, declarative language. Creating a query in Relational
Calculus involves describing what results are desired.
DDL stands for data definition language. DDL statements are SQL Statements
that define or alter a data structure such as a table.
DDL statements are used to define the database structure or schema. Some
examples:
SQL> commit;
SQL> rollback;
The create table statement implicitly committed the transaction. The insertion of
the value 2 into ddl_test_1 cannot be rolled back any more.
----------
8.2. DML
Data Manipulation Language (DML) statements are used for managing data
within schema objects. Some examples:
schema-name. table-name
Column names can be qualified by table name with optional schema
qualification.
• SELECT
• FROM
• WHERE
The SELECT clause specifies the table columns that are retrieved. The FROM
clause specifies the tables accessed. The WHERE clause specifies which table
rows are used. The WHERE clause is optional; if missing, all table rows are
used.
For example,
The first form inserts a single row into table-1 and explicitly specifies the column
values for the row. The second form uses the result of query-specification to
insert one or more rows into table-1. The result rows from the query are the rows
added to the insert table. Note: the query cannot reference table-1.
INSERT Examples
Before After
pno descr color pno descr color
P1 Widget Blue P1 Widget Blue
P2 Widget Red P2 Widget Red
=>
P3 Dongle Green P3 Dongle Green
P4 NULL Brown
INSERT INTO sp
SELECT s.sno, p.pno, 500
Before After
sno pno qty sno pno qty
S1 P1 NULL S1 P1 NULL
S2 P1 200 S2 P1 200
S3 P1 1000 => S3 P1 1000
S3 P2 200 S3 P2 200
S2 P3 500
The optional WHERE Clause has the same format as in the SELECT Statement.
The set-list contains assignments of new values for selected columns.
UPDATE Examples
Before After
sno pno qty sno pno qty
S1 P1 NULL S1 P1 NULL
S2 P1 200 S2 P1 220
=>
S3 P1 1000 S3 P1 1020
S3 P2 200 S3 P2 220
UPDATE s
SET name = 'Tony', city = 'Milan'
WHERE sno = 'S3'
Before After
sno name city sno name city
S1 Pierre Paris S1 Pierre Paris
=>
S2 John London S2 John London
The optional WHERE Clause has the same format as in the SELECT Statement.
DELETE Examples
Before After
sno pno qty sno pno qty
S1 P1 NULL S3 P2 200
S2 P1 200
=>
S3 P1 1000
S3 P2 200
Before After
pno descr color pno descr color
P1 Widget Blue P1 Widget Blue
P2 Widget Red => P2 Widget Red
P3 Dongle Green
WORK is an optional keyword that does not change the semantics of COMMIT.
ROLLBACK [WORK]
table-name is the new name for the table. column-descr is a column declaration.
constraint is a table constraint.
view-name is the name for the new view. column-list is an optional list of names
for the columns of the view, comma separated. query-1 is any SELECT
statement without an ORDER BY clause. The optional WITH CHECK OPTION
clause is a constraint on updatable views.
column-list must have the same number of columns as the select list in query-1.
If column-list is omitted, all items in the select list of query-1 must be named. In
either case, duplicate column names are not allowed for a view.
table-name is the name of an existing base table in the current schema. The
CASCADE and RESTRICT specifiers define the disposition of other objects
dependent on the table. A base table may have two types of dependencies:
RESTRICT specifies that the table not be dropped if any dependencies exist. If
dependencies are found, an error is returned and the table isn't dropped.
CASCADE specifies that any dependencies are removed before the drop is
performed:
• Views that reference the base table are dropped, and the sequence is
repeated for their dependencies.
• Constraints in other tables that reference this table are dropped; the
constraint is dropped but the table retained.
RESTRICT specifies that the view not be dropped if any dependencies exist. If
dependencies are found, an error is returned and the view isn't dropped.
CASCADE specifies that any dependencies are removed before the drop is
performed:
• Views that reference the drop view are dropped, and the sequence is
repeated for their dependencies.
• Constraints in base tables that reference this view are dropped; the
constraint is dropped but the table retained.
The GRANT statement grants each privilege in privilege-list for each object
(table) in object-list to each user in user-list. In general, the access privileges
apply to all columns in the table or view, but it is possible to specify a column list
with the UPDATE privilege specifier:
If the optional column list is specified, UPDATE privileges are granted for those
columns only.
The user-list may specify PUBLIC. This is a general grant, applying to all users
(and future users) in the catalog.
The REVOKE Statement revokes each privilege in privilege-list for each object
(table) in object-list from each user in user-list. All privileges must have been
previously granted.
The user-list may specify PUBLIC. This must apply to a previous GRANT TO
PUBLIC.
The UNION, EXCEPT, and INTERSECT operators all operate on multiple result
sets to return a single result set:
• The UNION operator combines the output of two query expressions into a
single result set. Query expressions are executed independently, and
their output is combined into a single result table.
• The EXCEPT operator evaluates the output of two query expressions and
returns the difference between the results. The result set contains all
rows returned from the first query expression except those rows that are
also returned from the second query expression.
The following figure illustrates these concepts with Venn diagrams, in which the
shaded portion indicates the result set.
Result Set
UNION combines the rows from two or more result sets into a single result set.
EXCEPT evaluates two result sets and returns all rows from the first set that are
not also contained in the second set.
INTERSECT computes a result set that contains the common rows from two
result sets.
8.5.1. ALL
If ALL is specified, duplicate rows returned by union_expression are retained. If
two query expressions return the same row, two copies of the row are returned in
the final result. If ALL is not specified, duplicate rows are eliminated from the
result set.
Examples:
CA_CITIES
Cupertino
Los Angeles
Los Gatos
Oakland
San Francisco
San Jose
This example adds the ALL keyword, so duplicate Hq_City and City entries are
retained:
CA_CITIES
San Jose
San Francisco
Oakland
Los Angeles
Los Gatos
San Jose
Cupertino
Los Angeles
San Jose
CA_CITIES
Oakland
San Francisco
CA_CITIES
Los Angeles
San Jose
COMMON_KEYS
1
3
4
5
12
8.6. Cursors
Every SQL statement executed by the RDBMS has a private SQL area that
contains information about the SQL statement and the set of data returned. In
PL/SQL, a cursor is a name assigned to a specific private SQL area for a specific
SQL statement. There can be either static cursors, whose SQL statement is
determined at compile time, or dynamic cursors, whose SQL statement is
determined at runtime. Static cursors are covered in greater detail in this section.
Dynamic cursors in PL/SQL are implemented via the built-in package
DBMS_SQL.
Explicit cursors are SELECT statements that are DECLAREd explicitly in the
declaration section of the current block or in a package specification. Use OPEN,
FETCH, and CLOSE in the execution or exception sections of your programs.
To use an explicit cursor, you must first declare it in the declaration section of a
block or package. There are three types of explicit cursor declarations:
CURSOR company_cur
This technique can be used in packages to hide the implementation of the cursor
in the package body.
Implicit cursor attributes are referenced via the SQL cursor. For example:
BEGIN
UPDATE activity SET last_accessed := SYSDATE
WHERE UID = user_id;
IF SQL%NOTFOUND THEN
INSERT INTO activity_log (uid,last_accessed)
VALUES (user_id,SYSDATE);
END IF
END;
INSTEAD OF triggers are valid on only Oracle8 views. Oracle8i must create a
trigger on a nested table column.
Triggers can fire BEFORE or AFTER the triggering event. AFTER data triggers
are slightly more efficient than BEFORE triggers.
• Native dynamic SQL, where you place dynamic SQL statements directly
into PL/SQL blocks.
• Calling procedures in the DBMS_SQL package.
Static SQL statements do not change from execution to execution. The full texts
of static SQL statements are known at compilation, which provides the following
benefits:
9. QBE
Stands for "Query By Example." QBE is a feature included with various database
applications that provides a user-friendly method of running database queries.
Typically without QBE, a user must write input commands using correct SQL
(Structured Query Language) syntax. This is a standard language that nearly all
database programs support. However, if the syntax is slightly incorrect the query
may return the wrong results or may not run at all.
The Query By Example feature provides a simple interface for a user to enter
queries. Instead of writing an entire SQL command, the user can just fill in blanks
or select items to define the query she wants to perform. For example, a user
may want to select an entry from a table called "Table1" with an ID of 123. Using
SQL, the user would need to input the command, "SELECT * FROM Table1
WHERE ID = 123". The QBE interface may allow the user to just click on Table1,
type in "123" in the ID field and click "Search."
QBE is offered with most database programs, though the interface is often
different between applications. For example, Microsoft Access has a QBE
interface known as "Query Design View" that is completely graphical. The
phpMyAdmin application used with MySQL, offers a Web-based interface where
users can select a query operator and fill in blanks with search terms. Whatever
QBE implementation is provided with a program, the purpose is the same – to
make it easier to run database queries and to avoid the frustrations of SQL
errors.
10.3. Indexes
Consider, for example, a rule-based technique for query optimization that states
that indexed access to data is preferable to a full table scan. Whenever a single
condition specifies the selection, it is a simple matter to check whether or not an
indexed access path exists for the attribute involved in the condition. Queries Q2
and Q3 are two queries which, from a syntactic structure, are identical. However,
In this query, the SSN attribute is the primary key index for the Patient table.
Q3: SELECT *
FROM Patient
WHERE Patient.Name = “Doe, John Q.”
In this query, no index exists on the Name attribute. This requires a full table
scan.
10.4. Selectivities
A more significant problem occurs when more than one condition is used in a
conjunctive selection. In this case the selectivity of each condition must be
considered. Selectivity is defined as the ratio between the numbers of rows that
satisfy the condition to the total number of rows in the table. This is the
probability that a row satisfies the condition, assuming a uniform distribution. If
the selectivity is small, then only a few rows are selected by the condition, and it
is desirable to use this condition first when retrieving records. To calculate
selectivities, the database manager needs statistics on all table and attribute
values. The heuristic rule states that, for multiple conjunctive conditions, the
order of application is from smallest selectivity to largest.
Queries Q4 and Q5 illustrate multiple conditions in a conjunctive selection on the
Patient table. Consider the case where the selectivity on Age is 10,000/1,000,000
= 0.01 (Age is assumed to be uniformly distributed between 0 and 100). The
selectivity on Gender is 500,000/1,000,000 = 0.5 (Gender is assumed to be
either M or F). It is clear that by using age as the first retrieval condition, 10,000
rows are accessed for testing against the gender condition, versus accessing
500,000 rows if the gender attribute was chosen first. This is a 50 times
Q4: SELECT *
FROM Patient
WHERE Age = 45 AND Gender = M
Q5: SELECT *
FROM Patient
WHERE Gender = M AND Age = 45
10.5. Uniformity
In many cases the actual data does not follow a uniform distribution. Consider
the case where 95% of the patients live in the province of New Brunswick and
the remaining 5% live in 199 different states and countries of the world. In this
case there are 200 different values for the Area attribute. The selectivity of the
Area attribute, assuming a uniform distribution, is 5,000/1,000,000 = 0.005. Thus,
this attribute will be accessed first given any query with a conjunctive clause
relating Area and Age. In the example below, query Q6 selects Area based on
the province of Ontario. We estimate that (5% of 1,000,000) / 199, or 251
patients live in Ontario. These rows are accessed first and then tested against
the Age condition. Conversely, query Q7 selects patients in the province of New
Brunswick. In this case, 950,000 patient rows are accessed, or more than 3,700
times the number of rows for the Ontario example. The distribution was skewed
sufficiently to result in a poor choice by the query optimizer. Clearly, non-uniform
data distributions can significantly affect query performance.
Q6: SELECT *
FROM Patient
WHERE Area = “Ontario” AND Age = 45
A uniform distribution for out of province residents predicts that 251 patients live
in Ontario.
Q7: SELECT *
Q8: SELECT *
FROM Patient
WHERE Doctor = 1234 OR Area = “Ontario”
Q9: SELECT *
FROM Patient
WHERE Doctor = 1234 AND Area = “Ontario”
Q11: SELECT *
FROM Patient, Physician
WHERE Physician.Dr_SSN = Patient.SSN
If join selectivities are not used, then these two queries can exhibit quite different
performance.
10.8. Views
A view in SQL is a single table that is derived from other tables. A view can be
considered as a virtual table or as a stored query. A view is often used to specify
a frequently used query. This is of particular benefit if tables must be joined or
This view matches the Physician table to the Treatment table, and then joins the
result to the Patient table.
Q12: SELECT *
FROM DrService
WHERE Specialty = “Opthamologist”
Q13: SELECT *
FROM Physician
WHERE Specialty = “Opthamologist”
11. OODBMS
An object-oriented database management system (OODBMS), sometimes
shortened to ODBMS for object database management system), is a database
management system (DBMS) that supports the modelling and creation of data as
objects. This includes some kind of support for classes of objects and the
inheritance of class properties and methods by subclasses and their objects.
There is currently no widely agreed-upon standard for what constitutes an
OODBMS, and OODBMS products are considered to be still in their infancy. In
the meantime, the object-relational database management system (ORDBMS),
the idea that object-oriented database concepts can be superimposed on
Another important benefit is that users are allowed to define their own methods of
access to data and how it will be represented or manipulated. The most
significant benefit of the OODBMS is that these databases have extended into
areas not known by the RDBMS. Medicine, multimedia, and high-energy physics
are just a few of the new industries relying on object-oriented databases.
12. ORACLE
The Oracle Database (commonly referred to as Oracle RDBMS or simply Oracle)
consists of a relational database management system (RDBMS) produced and
marketed by Oracle Corporation. As of 2009[update], Oracle remains a major
presence in database computing.
The Oracle RDBMS stores data logically in the form of tablespaces and
physically in the form of data files. Tablespaces can contain various types of
memory segments, such as Data Segments, Index Segments, etc. Segments in
turn comprise one or more extents. Extents comprise groups of contiguous data
blocks. Data blocks form the basic units of data storage.
Oracle database management tracks its computer data storage with the help of
information stored in the SYSTEM tablespace. The SYSTEM tablespace
contains the data dictionary — and often (by default) indexes and clusters. A
data dictionary consists of a special collection of tables that contains information
about all user-objects in the database.
The SCOTT schema has seen less use as it uses few of the features of the more
recent releases of Oracle. Most recent examples supplied by Oracle Corporation
reference the default HR or OE schemas.
Each Oracle instance allocates itself an SGA when it starts and de-allocates it at
shut-down time. The information in the SGA consists of the following elements,
each of which has a fixed size, established at instance startup:
• the database buffer cache: this stores the most recently-used data blocks.
These blocks can contain modified data not yet written to disk (sometimes
known as "dirty blocks"), unmodified blocks, or blocks written to disk since
modification (sometimes known as clean blocks). Because the buffer
cache keeps blocks based on a most-recently-used algorithm, the most
active buffers stay in memory to reduce I/O and to improve performance.
• the redo log buffer: this stores redo entries — a log of changes made to
the database. The instance writes redo log buffers to the redo log as
quickly and efficiently as possible. The redo log aids in instance recovery
in the event of a system failure.
• the shared pool: this area of the SGA stores shared-memory structures
such as shared SQL areas in the library cache and internal information in
the data dictionary. An insufficient amount of memory allocated to the
shared pool can cause performance degradation.
The library cache stores shared SQL, caching the parse tree and the execution
plan for every unique SQL statement.
If multiple applications issue the same SQL statement, each application can
access the shared SQL area. This reduces the amount of memory needed and
reduces the processing-time used for parsing and execution planning.
The data dictionary comprises a set of tables and views that map the structure of
the database.
Oracle databases store information here about the logical and physical structure
of the database. The data dictionary contains information such as:
The size and content of the PGA depends on the Oracle-server options installed.
This area consists of the following components:
• stack-space: the memory that holds the session's variables, arrays, and so
on.
• session-information: unless using the multithreaded server, the instance
stores its session-information in the PGA. (In a multithreaded server, the
session-information goes in the SGA.)
• private SQL-area: an area in the PGA which holds information such as
bind-variables and runtime-buffers.
• sorting area: an area in the PGA which holds information on sorts, hash-
joins, etc.
12.4. Configuration
3. What is a 'tuple'?
a. An attribute attached to a record.
b. Another name for the key linking different tables in a database.
c. A row or record in a database table.
d. Another name for a table in an RDBMS.
14. Referring to the following table, what type of relationship exists between
the Product table and the Manufacturer table?
PRODUCT
=======
Product ID
Product Description
Manufacturer ID
MANUFACTURER
============
Manufacturer ID
Manufacturer Name
15. You are writing a database application to run on your DBMS. You do not
want your users to be able to view the underlying table structures. At the
same time you want to allow certain update operations.
a. Cursor table
b. Table filter
c. Dynamic procedure
d. View
e. Summary table
a. OS requirement
b. User analysis
c. Performance monitoring
d. Data dictionary specification
e. System requirement
17. You have been asked to construct a query in the company's RDBMS. You
have deployed a Right Outer Join operation.
Referring to the scenario above, what will happen to the final results when
there is NO match between the tables?
18. Which phase of the data modeling process contains security review?
a. Structure
b. Design issue
c. Data source
20. Which one of the following capabilities do you expect to see in a majority
of RDBMS extensions to ANSI SQL-92?
22. For performance, you denormalize your database design and create some
redundant columns.
Referring to the scenario above, what RDBMS construct can you use to
automatically prevent the repeated columns from getting out of sync?
a. Cursors
b. Constraints
c. Views
d. Stored procedures
e. Trigger