Beruflich Dokumente
Kultur Dokumente
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
65
66
70
72
72
72
73
73
75
77
82
84
86
88
91
93
95
96
96
97
100
102
105
107
111
66
5.1
by
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
114
124
125
129
131
132
132
136
138
144
146
147
151
152
155
Introduction
SQL is an acronym for Structured Query Language and is the name of the most
important tool for defining and manipulating relational databases. The development of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.
The success of an experimental IBM database system (known as System R) that
incorporated SQL compelled a number of software manufacturers to join IBM
in developing relational database systems that incorporated SQL. In 1982, the
American National Standards Institute (ANSI) initiated the development of a
standard for a query language for relational database systems, it opted for SQL
as its prototype. The resulting ANSI standard, issued in 1986, was adopted as
an International Standard by the International Organization for Standardization
(ISO) in 1987.
In the late 1980s, embedded SQL was standardized by ANSI, and work on
expanding SQL continues. A much extended version of the original standard,
known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect current trends in the database field towards object-relational technology, a new
standard ISO/IEC 9075-1, known as SQL99, was published in July 1999. As
we shall see, SQL99 is a superset of SQL92. New features incorporated by this
standard include object-relational extensions (user-defined data types, reference
types, collections, large object support, table hierarchies), active database features (triggers), stored procedures and functions, on-line analytic processing
extensions, etc. More recently, in 2003, a new standard was issued. This new
edition of the standard includes a new chapter that deals with the interaction
between SQL and XML (which we discuss in Chapter 10), correction to SQL99,
and several new features.
Our presentation concentrates initially on common SQL features, applicable
to a wide range of SQL implementations.
67
5.2
68
addr
city
PATRONS
zip
telno
date of birth
After inserting a first row, the next value of the tabular variable PATRONS
is the table:
name
Ann Richards
addr
56 Green Ln
PATRONS
city
zip
Natick
02170
telno
508-561-0987
date of birth
02/15/78
A second insertion yields a new table as the value for the tabular variable:
name
Ann Richards
Ron Scott
addr
56 Green Ln
50 Cider Hill
PATRONS
city
zip
Natick
01170
Framingham
01160
telno
508-561-0987
608-663-0211
date of birth
02/15/78
11/4/80
If the first patron moves to a new address, the first row is modified and the
tabular variable assumes a third value:
name
Ann Richards
Ron Scott
addr
77 Lake St.
50 Cider Hill
PATRONS
city
zip
Milton
02186
Framingham
02160
telno
617-364-0606
608-663-0211
date of birth
02/15/78
11/4/80
The values that the tabular variable PATRONS may assume are the actual
tables that have the name and the heading specified at the creation of the
tabular variable. In addition, we can specify several types of constraints that
any value of the tabular variable must satisfy.
Before it is possible to create tabular variables and form queries, it is necessary to create an empty database in which to work. In practice, this is generally
done at the level of the operating system, usually with a command that is provided by the vendor of the DBMS.
To start, we assume that we have created an empty database. In this section
we begin to discuss a part of the data definition component of SQL, namely, the
creation of tabular variables, or informally, the creation of database tables.
5.2.1
Table Creation
69
A slightly more general form (that ignores certain details related to the
physical design of databases), the directive that creates a tabular variable is
create table and has the form: following syntax:
create table [schema.]table name
[(hattr def | table constraint | table ref clause i
{,hattr def | table constraint | table ref clause i})],
where the attribute definition attr def has the syntax:
attribute name domain [default expr] [column ref clause]{column constraint }
As a result of the execution of this directive, an initial amount of space
is reserved in secondary memory to accommodate future values of the tabular
variable, and the metadata are modified to reflect the addition of the new tabular
variable. Specialized SQL constructions, discussed later (insert, delete, and
update) can be used to modify the value of this variable.
Creation of tabular variables permits placing restrictions, called constraints
on the contents of any value that the tabular variable may assume. The constraints that follow have a global character (which means that they apply to
the contents of a table in its entirety) and apply to any value that the tabular
variable may assume.
Definition 5.2.2 A primary key constraint has the form
[constraint constraint name] primary key(list of attributes)
when the primary key consists of the attributes of the list.
Alternate keys of tables can be specified using unique constraints. The syntax
of this type of constraints is:
[constraint constraint name] unique(list of attributes)
This indicates that no two rows of a table that is a value of the tabular variable
may have the same values for the attributes specified in the list.
A constraint of the form cC that involves conditions C that are a Boolean
combination of conditions involving only components of tuples and constants is
denoted by:
[constraint constraint name] check(C)
When a constraint involves more than one attribute it is considered a table
constraint ; otherwise, it is a column constraint. Referential integrity can be
imposed by using the column constraint references in the definition of an
attribute. To prevent certain components of tuples from assuming a null value
we can impose the column constraint not null.
Example 5.2.3 To create the tabular variable INSTRUCTORS of the college
database we use the following create table directive:
create table INSTRUCTORS(empno varchar(11) not null,
name varchar(35),
rank varchar(25),
roomno integer,
telno varchar(4), primary key(empno));
70
A script that creates all tabular variables of the college database is contained
in Appendix A.
Example 5.2.4 To express that the primary key of the table GRADES consists
of the attributes stno cno sem year we can say that this table satisfies the primary
key constraint:
constraint pkg primary key (stno, cno, sem, year)
Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we could
introduce the tuple conditions:
constraint pos_sal check(salary > 0)
and
constraint suf_sal check(position != Programmer or salary > 65000),
respectively. They express that the salary must be a positive number and that
somebody who is a programmer must be paid more than 65000 dollars, respectively.
Thus, the creation of the table EMPHIST can be achieved by:
create table EMPHIST(empno integer not null references PERSINFO(empno),
position varchar2(30),
dept varchar2(20),
appt_date date,
term_date date,
salary float,
check(position != Programmer or salary > 65000),
constraint pos_sal check(salary > 0));
A script that creates the tables PERSINFO, EMPHIST, and REPORTING is contained in Appendix C.
71
Example 5.2.6 In the directives enclosed below we state that stno is both a
foreign key for ADVISING and, also, its primary key. In addition, empno is a
foreign key for this table (being the primary key for the table INSTRUCTORS).
create table ADVISING(stno varchar2(10) not null
references STUDENTS(stno),
empno varchar2(11)
references INSTRUCTORS(empno),
primary key(stno));
create table GRADES(stno varchar2(10)
not null references STUDENTS(stno),
empno varchar2(11)
not null references INSTRUCTORS(empno),
cno varchar2(5)
not null references COURSES(cno),
sem varchar2(6) not null,
year smallint not null,
grade integer,
primary key(stno,cno,sem,year),
check (grade <= 100));
The definition of the tabular variable GRADES specifies referential integrity constraints for each of the attributes stno, empno,cno. In addition, this designates
the set of attributes stno,cno,sem,year as the primary key of GRADES and, also,
imposes the constraint grade < 100.
To remove the tabular variable T we use the construct
drop table T
Rows can be inserted in a table individually, as we show below, or as they
are produced by a select phrase (as we shall see later). To insert a row in a
table T whose heading is A1 An we write in SQL a directive of the form:
insert into T (A1 , . . . , An )
values (a1 , . . . , an );
For example, to insert the row
(1011,Edwards P. David,10 Red Rd.,Newton,MA,02159)
into the table STUDENTS we write:
insert into STUDENTS(stno,name,addr,city,state,zip)
values (1011,Edwards P. David,10 Red Rd.,Newton,MA,02159);
It is possible to insert tuples in the database starting from text files by using
a special utility or ORACLE known as the SQL*Loader. Details are provided
in Appendix D.
To delete a row specified by a certain condition we can use the construct
delete. For example, to remove the row of the table STUDENTS that corresponds to the student having student number 1011 we write:
delete from STUDENTS
where stno = 1011;
72
If you wish to examine the headings of the tables you created you can issue,
for example, the SQL Plus directive
describe INSTRUCTOR;
Type
-----------VARCHAR2(11)
VARCHAR2(35)
VARCHAR2(25)
NUMBER(38)
VARCHAR2(4)
The directive alter table is used for modifying the structure of an existing
table. Columns may be added or dropped, the names of the columns or their
data types can be modified, etc. A simplified syntax of this directive is:
alter table table name modification specification
In turn, the modification specification depends on the particular change we need
to impose on the table. Examples of such modification specifications include
add column name column type,
drop column name,
modify column name column type,
rename column name to new column name,
as well as many other choices.
Example 5.2.7 To add a new year column to the table ADVISING we use the
directive:
alter table advising add year varchar2(4);
The entries of the new column year will have initially null values.
Column types can be modified using the modify option. For instance, to
increase the maximum length of the values of stno to 12 characters we write:
alter table advising modify stno varchar(12);
5.3
We saw that referential integrity can be imposed in SQL using the column
constraint references. An alternative method is to impose the table constraint
foreign key. Its syntax is:
73
A second tabular variable, STORES, records the stores that a retailer has in
the covered territory, and is created by
create table STORES (storeno integer not null,
address varchar(40) not null,
city varchar(40),
state char(2),
tel char(12),
primary key storeno,
foreign key(city,state) references CITIES(city,state)
on delete cascade);
into
into
into
into
into
CITIES(city,
CITIES(city,
CITIES(city,
CITIES(city,
CITIES(city,
state)
state)
state)
state)
state)
values(Boston,MA);
values(Spingfield,MA);
values(Providence,RI);
values(Hartford,CT);
values(Bayonne,NJ);
74
and
STORENO ADDR
CITY
ST TEL
-----------------------------------------------------1
125 Harvard St.
Boston
MA 617-287-0991
2
50 Storrow Drive Boston
MA 617-566-7629
3
85 Manton Av.
Providence
RI 401-453-1234
4
40 West Street
Hartford
CT 860-232-4484
5
5 Finley Av.
Bayonne
NJ 908-221-0094
6
10 Linton Plaza
Hartford
CT 860-660-2220
7
30 Stilson Rd.
Providence
RI 401-861-5249
Since the referential integrity was imposed between the tabular variables
CITIES and STORES we need to insert the tuples of CITIES before we can insert
the tuples of STORES. Otherwise, the cities mentioned in the values of STORES
can not reference a city in a value of CITIES and the insertion in STORES will
be rejected.
The presence of on delete cascade means that if a row is removed from
a table CITIES that the rows corresponding to that city are also removed. For
example, if the company closes its business in Hartford and we execute
delete from CITIES where
city = Hartford and state = CT;
If the clause on cascade delete is absent, then the deletion of a row from
CITIES is impossible unless we delete first the rows of STORES that correspond
to the city that is removed from CITIES.
5.4
75
SQL makes use of a collection of domains that, in general, varies from one
implementation to another. Not all domains of the standard exist in every
implementation, and not all domains of implementations exist in the standard.
Basic domains supported by virtually all implementations of SQL can be
classified as string domains, numerical domains, and special domains.
5.4.1
String Domains
5.4.2
Numeric Domains
The SQL standard prescribes two kinds of numeric domains: exact numeric data
types: numeric, decimal, integer and smallint, and approximate numeric
data types: float, double precision, and real. Their respective syntax is:
numeric [(p[, s])]
decimal [(p[, s])]
integer
smallint
float [(p)]
double precision
real
Here, p stands for precision and s stands for scale (both of which are nonnegative integers). The precision parameter refers to the total number of digits,
while the scale indicates the number of digits to the right of the decimal point.
The difference between numeric and decimal is that in the latter case, p is
understood to be the maximum number of digits, while in the former case, p is
the exact total number of digits.
The domains smallint and integer have a number of digits dependent on
the implementation; however, the precision of integer is required to be equal
to or larger than the precision of smallint.
The float domain includes approximate representations of real numbers having precision at least p. Also, real and double precision have implementationdependent precision, where the precision of double precision is never smaller
than the one of real.
5.4.3
Special Domains
Specific DBMSs have their own domains. For instance, ORACLE has the long
domain that contains strings of characters of variable length that may be as
76
5.4.4
77
The domain long (also denoted by long varchar) represents variablelength strings of characters with no more than 65,535 characters. At most
one attribute may have this domain in any table.
The number domain in ORACLE can be used in several forms as specified
by the following syntax:
number [(p[, s])],
where p is the precision and s is the scale.
The maximum precision of number is 38. The scale can vary between
84 and 127. If the scale is negative, the number is rounded to the
specified number of places to the left of the decimal point.
The following cases may occur when we insert a value in a column whose
domain is number:
Data
Domain
Stored as
1,234,567.89 number
1234567.89
1,234,567.89 number(9)
1234567
1,234,567.89 number(9,2)
1234567.89
1,234,567.89 number(9,1)
1234567.9
1,234,567.8
number(6)
error: exceeds precision
1,234,567.89 number(10,1) 1234567.9
1,234,567.89 number(7,-2) 1234500
1,234,567.89 number(7,2)
error: exceeds precision
If s > p, then s specifies the maximum number of valid digits after the
decimal point. For instance, number(4,5) requires at least one digit after
the decimal point and rounds the digits after the fifth decimal digit. The
number 0.012358 is stored as 0.01236.
Numbers may also be entered in exponential form, that is, including
an exponent preceded by E. For example, 1234567 can be represented as
1.234567E+6, that is, as 1.234567 106 .
Floating point domains are supported as float, float(*), and float(b),
where b is the binary precision, that is, the number of significant binary
digits. The domains float and float(*) are equivalent, and they consists
of floating point numbers that can be represented by 126 binary digits (or,
equivalently, by about 36 decimal digits).
To provide compatibility with other systems, ORACLE supports such
domains as decimal, integer, smallint, real, and double precision.
However, their internal representation is defined by the format of the
number domain.
5.5
SELECT Phrases
Queries must be written based on the names and headings of the tabular variables and not on the tables that represent their values at any given moment.
This is similar to writing programs. A program should work for all legal inputs
and not just the ones on which it was tested. In both cases, it is important to
78
focus on the abstract structure and not on specific examples. The way we write
SQL constructs must be directed only by the logic of the query and not by the
content of a particular database instance. Just because the query generated the
right answer for a particular instance of the database does not mean that it is
correct.
The main retrieval construction is the select phrase. Consider a query that
we solved previously using relational algebra. Recall that in Example 4.1.25 we
found the names of all instructors who have taught any student who lives in
Brookline. The solution involved using product, selection, and projection:
T1 := (STUDENTS GRADES INSTRUCTORS)
T2 := T1 where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = Brookline
ANS := T2 [INSTRUCTORS.name].
In SQL the same problem can be resolved using a single select phrase as in:
select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS
where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = Brookline;
We can conceptualize the execution of this typical select using the operations of relational algebra as follows:
1. The execution begins by performing the product of the tables listed after
the reserved word from. In our case, this involves computing the product
STUDENTS GRADES INSTRUCTORS
2. The selection specified after the reserved word where is executed next,
if the where part is present (we shall see that this may or may not be
present in a select.) In our case, this amounts to retaining that part of
the table product that satisfies the condition:
STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno
and STUDENTS.city = Brookline
79
The select construct used above requires the table name for the table involved in the retrieval and the list of attributes that we need to extract.
In general, if we need to compute the projection of a table T on a set of
attributes A1 . . . An of the heading of T , we use the construct:
select A1 , . . . , An from T ;
Example 5.5.2 To find out the states where the students originate we project
the table STUDENTS on the attribute state. This is done by
select state from STUDENTS;
The value MA is repeated 7 times because there are seven students who live
in Massachusetts.
Duplicate values can be eliminated from a query by using the option distinct
as in
select distinct state from STUDENTS;
80
5.6
The where clause allows us to extract tuples that satisfy certain conditions; in
other words, using the where clause we can perform selections.
Example 5.6.1 To find students who live in Boston we write:
select stno, name, addr, city, state, zip
from STUDENTS
where city = Boston;
If we want to extract all columns of a table instance, we can use the wildcard character, *, instead of listing all columns. Thus, we can write the equivalent select:
select * from STUDENTS
where city = Boston;
Example 5.6.3 To retrieve the grade records obtained in cs110 during the
Spring of 2000 we can write in SQL:
select * from GRADES
where cno = cs110 and sem = SPRING
and year = 2003;
EMPNO
----------023
023
CNO
----cs110
cs110
81
SEM
YEAR
GRADE
------ ---------- ---------SPRING
2000
75
SPRING
2000
60
EMPNO
----019
019
019
019
023
023
ADDR
--------------15 Pleasant Dr.
1 Main Rd.
Example 5.6.6 Suppose the computer science course numbers were carefully
assigned so that all fundamental programming courses have a 1 as their second
digit. Then the following select construct lists all fundamental programming
courses.
82
CNAME
------------------------Introduction to Computing
Computer Programming
Data Structures
Software Engineering
CR
-4
4
3
3
CAP
--120
100
60
40
Using the reserved word between, we can ensure that certain values are
limited to prescribed intervals (including the endpoints of these intervals).
Example 5.6.7 To find the students who obtained some grade between 65 and
85 in 2002, we apply the following query:
select distinct stno from GRADES
where year = 2003 and
grade between 65 and 85;
Example 5.6.8 A select construct, similar to the one used in Example 5.6.7,
can be used to retrieve the students who have some grade that does not satisfy
the previous condition, that is, the students who have some grade not between
65 and 85:
select distinct stno from GRADES where year = 2003
and grade not between 65 and 85;
83
3566
4022
5571
On the other hand, we can test of the negation of a condition using not. To
list the names of students who live outside those two cities, we write:
select name from STUDENTS
where not(city in (Boston,Brookline));
84
Student
Student
Student
Student
Student
Student
McLane Sandy
Novak Roland
Pierce Richard
Prior Lorraine
Rawlings Jerry
Lewis Jerry
5.7
85
Example 5.7.1 To determine the student numbers of students who took cs210
we write:
select stno from GRADES
where cno = cs210;
To find the students who took both cs210 and cs240 we use the intersect
to link the two previous select phrases into a compound select:
select stno from grades where cno = cs210
intersect
select stno from grades where cno = cs240;
This gives:
STNO
---1011
3566
4022
5571
86
STNO
---1011
2415
2661
3566
4022
5544
5571
If we wish to retain all values in the result, then we need to use union all
to link the select phrases as in:
select stno from grades where cno = cs210
union all
select stno from grades where cno = cs240;
The result contain now all values retrieved by the individual selects:
STNO
---1011
2661
3566
5571
4022
3566
5571
2415
5544
1011
4022
The reverse difference allows us to find students who took cs240 but did not
take cs210:
select stno from grades where cno = cs240
minus
select stno from grades where cno = cs210;
Now we obtain:
87
STNO
---2415
5544
5.8
A select phrase that lists several distinct table names after the reserved word
from computes the product of these tables.
Example 5.8.1 To examine all possible pairs of students/instructors we could
write the following select:
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;
Since our database is in a state that contains 9 students and five instructors,
this will result in 45 rows retrieved:
NAME
NAME
--------------------------------Edwards P. David
Evans Robert
Grogan A. Mary
Evans Robert
Mixon Leatha
Evans Robert
.
.
.
Pierce Richard
Will Samuel
Prior Lorraine
Will Samuel
Rawlings Jerry
Will Samuel
Lewis Jerry
Will Samuel
Observe that the tables are not linked by any where condition; as expected
in the definition of the product, all combinations of rows are considered. After computing the product, a projection eliminates all attributes except STUDENTS.name and INSTRUCTORS.name.
Also, note that we use qualified attributes as required by the definition of
table product (see Definition 4.1.7).
The result produced by the query shown in Example 5.8.1 does not differentiate between the attributes STUDENTS.name and INSTRUCTORS.name and
this may confuse the user. Therefore, it is preferable to rename the columns of
the result using the option as:
select STUDENTS.name as stname, INSTRUCTORS.name as instname
from STUDENTS, INSTRUCTORS;
88
STNAME
INSTNAME
--------------------------------Edwards P. David
Evans Robert
Grogan A. Mary
Evans Robert
Mixon Leatha
Evans Robert
.
.
.
Pierce Richard
Will Samuel
Prior Lorraine
Will Samuel
Rawlings Jerry
Will Samuel
Lewis Jerry
Will Samuel
SQL allows for computations of products of several copies of the same table
through the creation of aliases; the solution proceeds using the logic discussed
in Example 4.1.18. To create an alias S of a table named T we write the name
of the alias after the name of the table in the list of table, making sure that at
least one space (and no comma) exists between the name of the table and its
alias. For example, in the select phrase of Example 5.8.2 we create the alias I
by writing
INSTRUCTORS I
89
=
=
<
<
S2.city and
S3.city and
S2.stno and
S3.stno
5.9
Join in SQL
Earlier version of SQL (at the level of SQL 1) dealt with the join operation
indirectly, using operations like product, selection and projection, which are
already available in SQL. The blueprint of this treatment of the join operation
was outlined in Section 4.2.
Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 in
which we seek to find the names of instructors who have taught any four-credit
course is solved in SQL by writing:
select distinct INSTRUCTORS.name
from COURSES, GRADES, INSTRUCTORS
where COURSES.cr = 4
and COURSES.cno = GRADES.cno
and GRADES.empno = INSTRUCTORS.empno;
90
Example 5.9.2 To list all pairs of student names and course names such that
the student takes the course, the relational algebra solution would require that
we join the tables STUDENTS, GRADES, and COURSES. In SQL we write:
select distinct STUDENTS.name, COURSES.cname
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno
SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle
9i and 10g, and Microsoft SQL Server) allow the use of the constructions inner join and on. For example, the query discussed in Example 5.9.1 has the
alternate solution:
select distinct INSTRUCTORS.name
from INSTRUCTORS, COURSES INNER JOIN GRADES
on COURSES.cno = GRADES.cno
where INSTRUCTORS.empno = GRADES.empno
and COURSES.cr = 4;
This query should be viewed as computing the natural join of COURSES and
GRADES based on the equality of the attributes they share (as specified by the
on clause. Then, the join INSTRUCTORS with the result of the previous join is
computed using the simulation by product and selection method.
In SQL Plus queries involving natural joins among tables who attributes
identically named can be further simplified by applying the using clause, which
lists the attributes involved in the joining.
Example 5.9.3 To retrieve the names of instructors who taught cs110 we can
execute in SQL Plus the query:
select distinct INSTRUCTORS.name
from INSTRUCTORS inner join GRADES
using(empno);
The inner join can be used for joins that involve more than two tables.
91
or, equivalently,
select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname
from GRADES inner join ADVISING
using(stno,empno)
inner join STUDENTS
using(stno)
inner join INSTRUCTORS
using(empno)
which is equivalent to
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;
92
We saw that when joining two tables not all tuples are joinable; tuples that
belong to one table and are not joinable with any tuple of the other table leave no
trace in the join, a situation that is often inconvenient. As we saw in Section 4.3,
the outer join operation and its variants, the left outer join and the right outer
join can rectify this situation.
Let us assume that the tabular variables STUDENTS and INSTRUCTORS
contain the tuples shown in Figure 5.1.
The tabular variable ADVISING has the same content as the one shown in
Figure 3.1.
Example 5.9.7 Oracles own syntax for left outer join is to designate the component that may be null by (+), as in
select students.name, ADVISING.empno from STUDENTS, ADVISING
where STUDENTS.stno = ADVISING.stno(+)
This is equivalent to using the operator left outer join as specified by SQL2:
select STUDENTS.name, ADVISING.empno
from STUDENTS left outer join ADVISING
on STUDENTS.stno = ADVISING.stno
\end{PGMdiplsy}
Either phrase will return:
\begin{PGMdisplay}
name
empno
----------------------------------------Edwards P. David
019
Grogan A. Mary
019
Mixon Leatha
023
McLane Sandy
023
Novak Roland
056
Pierce Richard
126
Prior Lorraine
234
Rawlings Jerry
023
Lewis Jerry
234
Davis Richard
Chu Martin
The computation of the right outer join is similar. We can use either Oracles
syntax as in
select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS
where ADVISING.empno(+) = INSTRUCTORS.empno;
93
STUDENTS
addr
10 Red Rd.
8 Walnut St.
100 School St.
30 Cass Rd.
42 Beacon St.
70 Park St.
8 Beacon St.
15 Pleasant Dr.
1 Main Rd
45 Algonquin Rd.
90 Rye Dr.
stno
1011
2415
2661
2890
3442
3566
4022
5544
5571
6410
7209
name
Edwards P. David
Grogan A. Mary
Mixon Leatha
McLane Sandy
Novak Roland
Pierce Richard
Prior Lorraine
Rawlings Jerry
Lewis Jerry
Davis Richard
Chu Martin
empno
019
023
056
126
234
323
INSTRUCTORS
name
rank
Evans Robert
Professor
Exxon George
Professor
Sawyer Kathy
Assoc. Prof.
Davis William
Assoc. Prof.
Will Samuel
Assist.Prof.
Campbell Kenneth
Professor
city
Newton
Malden
Brookline
Boston
Nashua
Brookline
Boston
Boston
Providence
Natick
Ayer
roomno
82
90
91
72
90
102
state
MA
MA
MA
MA
NH
MA
MA
MA
RI
MA
MA
zip
02159
02148
02146
02122
03060
02146
02125
02115
02904
01760
01290
telno
7122
9101
5110
5411
7024
7077
94
stno
name
--------------------------1011
Evans Robert
2415
Evans Robert
2661
Exxon George
2890
Exxon George
5544
Exxon George
3442
Sawyer Kathy
3566
Davis William
4022
Will Samuel
5571
Will Samuel
Campbell Kenneth
Finally, the outer join itself can be computed using the operator outer join:
select STUDENTS.name, INSTRUCTORS.name
from students full outer join advising
using(stno)
full outer join instructors
using(empno);
5.10
Subqueries are select phrases that return sets rather than tables. Their main
use is in conditions that involve sets. As we shall see, they are useful in implementing difference and division
in SQL. Syntactically, a subquery is written by placing a select phrase
between a pair of parentheses. For example,
(select empno from INSTRUCTORS where rank = Professor);
95
We refer to the first select as the calling select, or the main select or the outer
select; the select of the subquery is the inner select.
As we saw in the introductory example, membership can be tested using in.
Here is another example.
Example 5.10.1 Let us find the names of students who took cs310. We determine the student numbers of those students using a subquery. Then, in the
main select, we retrieve those students whose student number is in this set.
This can be accomplished using the query:
select name from STUDENTS where
stno in (select stno from GRADES
where cno = cs310);
96
If oper is one of the operators =, !=, <, >, <= or >=, then we can use
conditions of the form
v oper any (select ...)
or
v oper all (select ...)
in comparisons that involve some elements of the set computed by the subquery
(select ) or all elements of the same set, respectively. Here != stands for
inequality.
Example 5.10.3 To find the names of the courses taken by the student whose
student number is 1011, we can use the following query:
select cname from COURSES where
cno = any (select cno from
The construct = any is synonymous with in, and the same query could be
written as:
select cname from COURSES
where cno in (select cno from GRADES where stno= 1011);
Also, instead of = any we could use = some, and so, we have a third way or
writing the same query:
select cname from COURSES where
cno = some (select cno from GRADES where stno= 1011);
97
Example 5.10.4 Let us find the students who obtained the highest grade in
cs110. Although there are methods that we explain later that yield much simpler
solutions for this type of query, for the moment we want to illustrate the oper all
condition. We operate on two copies of GRADES. The copy used in the inner
select is intended for computing the grades obtained in cs110:
select stno from GRADES where cno = cs110
and grade >= all(select grade from GRADES
where cno = cs110);
Example 5.10.5 Let us find the students who obtained a grade higher than any
grade given by a certain instructor, say Prof. Will. Using the all... subquery
we can write:
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Will%));
If we alter this query and replace the instructor with Prof. Davis, who teaches
no courses, then the set computed by the query
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Davis%));
is empty. Therefore, every grade satisfies the inequality, and we obtain all
student numers for students who took any course!
5.11
Parametrized subqueries
98
Example 5.11.1 Suppose that we need to retrieve the course numbers of courses
taken by the student whose student number is STUDENTS.stno. Ignore (for the
moment) the origin of this piece of data. Then, the retrieval is done by the
select construct:
select cno from GRADES
where stno = STUDENTS.stno;
Next, we transform this select into a subquery. The student number STUDENTS.stno is provided by the outer select of the following construct:
select name from STUDENTS where cs310 in
(select cno from GRADES
where stno = STUDENTS.stno);
Observe that this provides an alternate solution to the query discussed in Example 5.10.1. Namely, we use a subquery to compute the courses taken by each
student. Then, we test if cs310 is one of these courses. We use the qualified attribute STUDENTS.stno inside the subquery to differentiate between this input
parameter and the attribute stno of the table GRADES.
Sets of tuples produced by subqueries can be tested for emptiness using the
exists condition. Namely, the condition
exists (select from )
is true if the set returned by the subquery is not empty; similarly,
not exists (select from )
is true if the set returned by the subquery is empty.
Example 5.11.2 Let us give yet another solution to the query we solved in
Example 5.10.1. This time, to find the names of students who took cs310 we
determine the student numbers of those students for whom their set of grades
in cs310 is not empty. This can be done as follows:
select name from STUDENTS where
exists (select * from GRADES where
stno = STUDENTS.stno and
cno = cs310);
Example 5.11.3 To find instructors who never taught cs110, we search for
instructors for whom there is no GRADES record involving cs310 and these
instructors. This can be done by
99
If both the main query and the subquery deal with the same table and the
subquery requires input parameters from the outer query, then we use an alias
of the table in the outer query.
Example 5.11.4 Let us find the student numbers of students whose advisor
is advising at least one other student. The information is contained in the
ADVISING table, and the following select construct uses both ADVISING (in
the subquery) and its alias A in the main query:
select distinct stno from ADVISING A
where exists (select * from ADVISING where
empno = A.empno and stno != A.stno);
Subqueries can be used in the list that follows from in exactly the same
manner that tables are used. This is shown in the next example:
Example 5.11.5 To find the pairs of names of students and instructors such
that the student took some course with the instructor we could write:
select STUDENTS.name as sname, INSTRUCTORS.name as iname
from STUDENTS, INSTRUCTORS,
(select stno, empno from GRADES) PN
where STUDENTS.stno = PN.stno and
INSTRUCTORS.empno = PN.empno;
100
The difference of the tables T and S can be computed by looking for each
tuple of T for which there is no matching tuple in S. This can be done by:
select * from T where
not exists (select * from S where
A1 = T.A1 and and An = T.An )
Example 5.11.6 Courses offered by the continuing education program but not
by the regular program can be found by writing:
select * from CED_COURSES where
not exists (select * from COURSES where
cno = CED_COURSES.cno)
which takes advantage of the fact that cno is a key for both COURSES and
CED COURSES.
5.12
SQL does not have a division operation. However, as we saw in Examples 4.1.27
and 4.2.3, we can perform division using product, projection, and difference. Of
course, we could apply the prescription offered by relational algebra. This type
of solution is discussed in the next example.
Example 5.12.1 The solution envisioned here is
select cno from grades
minus
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno,empno from grades)
computes all pairs of courses and instructor numbers using the product of the
tables GRADES and INSTRUCTORS. Then, the query
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno, empno from grades)
extracts the courses that are part of the pairs of the previous table that do not
appear in the GRADES table, that is, the courses for which there exists a full
professor who did not teach these courses. These are the courses that we need
to exclude from the answer. Thus, the query presented at the beginning of this
example yields the solution of the problem:
101
CNO
----cs110
102
every course whose first digit of the course number is 1. The formulation that
is better suited to SQL implementation is: Find names of instructors for whom
there is no 100 level course that they have not taught. This is solved by the
following select construct:
select name from INSTRUCTORS where
not exists (select * from COURSES
where cno like cs1__ and
not exists (select * from GRADES where
empno = INSTRUCTORS.empno
and cno = COURSES.cno));
The answer that results from our usual database instance is:
NAME
-----------Evans Robert
Exxon George
5.13
Between Chapter 4 and the current chapter, we have shown that SQL is capable
of performing all operations of relational algebra. This fact is known as the
relational completeness of SQL. As we shall see in subsequent chapters, the
capabilities of SQL go well beyond the standard definition of relational algebra.
5.14
5.14.1
Numerical Functions
Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obvious
definitions. For example, sqrt computes the square root of its argument, while
power(x, y) computes xy .
103
into
into
into
into
into
into
into
into
into
into
into
into
This returns:
PTID
---------a
b
c
d
e
f
g
h
i
j
k
l
DIST
---------0
1
2
1
1.41421356
2.23606798
2
2.23606798
2.82842712
3
3.16227766
3.60555128
104
We need to convert the angles to radians before sin is applied. This will return:
SIN30
SIN45
SIN60
---------- ---------- ---------.5 .707106781 .866025404
Microsoft SQL server has a simpler way of performing this type of computations in that it does not require the fictitious table.
Example 5.14.3 In SQL server we can simply write:
select sin(30*3.14159265359/180) as sin30,
sin(45*3.14159265359/180) as sin45,
sin(60*3.14159265359/180) as sin60;
5.14.2
String Functions
String functions can be used to transform strings, extract parts of strings, transform strings, etc.
The functions upper and lower, convert strings to upper and lower characters, respectively.
Example 5.14.4 To print names of students in capital characters and course
titles in small letters we can write:
select distinct upper(STUDENTS.name) as STNAME,
lower(COURSES.cname) as course
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno;
105
data structures
introduction to computing
computer architecture
introduction to computing
These functions are particularly useful for performing string comparisons when
ignoring case. Thus,
STE\% like upper(stephany)
is true.
Example 5.14.5 The string function replace substitutes every occurrence of
its second argument in the value(s) specified by its first argument, by its third
argument. In the select written below the string Computer is replaced by the
string Comp.:
select replace(cname,Computer,Comp.) from COURSES;
Example 5.14.6 The function concat computes the concatenation of two strings
that form its arguments. Its effect is identical to the concatenation operator ||
that we discussed in Example 5.6.11. The phrase below prints the state and zip
code of each students as a single string:
select name, addr, concat(state,zip) as state_zip from STUDENTS;
This returns:
NAME
ADDR
STATE_ZIP
---------------------------------------------Edwards P. David
10 Red Rd.
MA02159
Grogan A. Mary
Walnut St.
MA02148
Mixon Leatha
100 School St.
MA02146
McLane Sandy
30 Cass Rd.
MA02122
Novak Roland
42 Beacon St.
NH03060
Pierce Richard
70 Park St.
MA02146
Prior Lorraine
8 Beacon St.
MA02125
Rawlings Jerry
15 Pleasant Dr. MA02115
Lewis Jerry
1 Main Rd
RI02904
106
will return:
SUB
--rac
yields:
SUBST
----racle
which is the string that begins with the second character of Oracle and ends
with the last character of this string.
Since the second argument of the function call in
select substr(Oracle,-4,3) from dual
is negative, the starting position of the substring is the 4th character counted
from the end (that is, the character a) and thus, the query returns:
SUB
--acl
The functions lpad and rpad can be used to enhance presentation of results
of queries. The syntax of lpad is:
lpad(s, integer [string])
The effect is to padd s to the left with spaces to bring the total length of the
string to the length specified by the second argument of the function. If the
third argument is present, then this string is repeated to the left to fill up the
padded string.
The function rpad has a similar syntax; however, the padding is done at the
right of s.
Example 5.14.8 To print a list of all employees and their salaries (using the
tabular variables EMPHIST and PERSINFO we can use the query:
107
5.14.3
ANN_SAL
------$150000
$120000
$120000
$100000
$$70000
$$70000
$$90000
$$75000
$$70000
Date functions
SQL Plus contains a class of functions that apply to the DATE type: extract,
months between, etc.
Example 5.14.9 The function extract computes a part of a date value. Its
first argument gives the desired date part; the second argument is the date
value. For instance, to obtain the year part of the appt date attribute of the
table EMPHIST we write:
select empno, extract(year from appt_date) as start_y
from emphist;
This returns:
EMPNO
START_Y
---------- ---------1000
1999
1005
1999
1010
2000
1015
1999
1020
1999
1025
2000
1030
2000
1035
2000
1040
2000
108
EMPNO
START_M
---------- ---------1000
10
1005
10
1010
1
1015
10
1020
11
1025
3
1030
1
1035
2
1040
3
109
EMPNO
BONUS
-----------------1000
10430.7253
1005
8262.69438
1010
7652.27254
1015
6804.93348
1020
4733.05642
1025
4155.51299
1030
5688.95627
1035
4550.04006
1040
4194.59488
5.15
Aggregate functions are those functions that operate on sets of values. Typical
examples include: sum, avg, max, min, and count.
The first four functions operate on columns of tables and ignore null values.
The count returns the number of elements of the set that is its argument.
Example 5.15.1 The following select construct determines the largest grade
obtained by the student whose student number is 1011. The function max is
applied to the set of grades of the student whose number is 1011 and returns
the largest value in this set:
select max(grade) as highgr from GRADES
where stno = 1011;
For instance, sum(A) returns the sum of all values of the selected nonnull
A-components of the tuples. Similarly, avg(A) returns the average value of the
same sequence. The expressions max(A) and min(A) yield the largest and the
smallest values in the set of A-components of the tuples selected by a query,
respectively.
The functions sum and avg apply to attributes whose domains are numerical
(such as integer or float); max and min apply to every kind of attribute.
If we wish to discard duplicate values from the sequences of values before
applying these functions, we need to use the word distinct. For instance,
sum(distinct A) considers only the distinct nonnull values that occur in the
sequence of components.
Example 5.15.2 We mentioned that the built-in functions max and min apply
to string domains as well as to numerical domains. We use this feature of these
functions to determine the first and the last student in alphabetical order:
110
Next, we show a select construct where the same functions are applied to
a numerical domain:
select min(grade) as lowgr,
max(grade) as highgr from GRADES
where stno = 1011;
The query
select avg(distinct grade) as avggr from GRADES
where stno = 1011
then the average grade is lower, indicating a preponderance of the higher grades
for this student:
AVGGR
----68.33
111
Since no records exist for any grades given during that semester in cs110, we
obtain the answer:
COUNT(CNO)
---------0
Observe that this table has a system-supplied column name COUNT(cno). This
happens because we did not provide a name using as.
Let us determine how many students have ever registered for any course. We
have to retrieve this result from GRADES, and we must use distinct to avoid
counting the same student several times (if the student took several courses):
select count(distinct stno) as nost
from GRADES;
112
Finally, let us determine the names of instructors who are teaching more
than one subject. For every instructor, we determine in a subquery the number
of courses taught. Then, we retain those instructors who taught more than one
course:
select name from INSTRUCTORS where
1 < any (select count(distinct cno) from GRADES
where empno = INSTRUCTORS.empno);
5.16
Sorting Results
Data obtained from a select construct may be sorted on one or several columns
using the order by clause. This clause also gives the user the possibility of
opting for an ascending or descending sorting order on each of the columns. By
default, the ascending order is chosen.
Example 5.16.1 Suppose that we need to sort the GRADES tuples on the
student number. For each student, we sort the grades in descending order. This
can be done with the query:
select * from GRADES
order by stno, grade desc;
EMPNO
----------019
056
023
019
019
234
019
019
234
019
019
019
056
234
CNO
----cs210
cs240
cs110
cs110
cs240
cs310
cs110
cs210
cs410
cs240
cs110
cs210
cs240
cs310
SEM
YEAR
GRADE
------ ---------- ---------FALL
2003
90
SPRING
2004
90
SPRING
2003
75
FALL
2002
40
SPRING
2003
100
SPRING
2004
100
FALL
2002
80
FALL
2003
70
SPRING
2003
60
SPRING
2003
100
FALL
2002
95
FALL
2003
90
SPRING
2004
80
SPRING
2004
75
019
023
019
056
019
234
019
113
cs210
cs110
cs110
cs240
cs210
cs410
cs240
SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
2004
2003
2002
2004
2004
2003
2003
70
60
100
70
85
80
50
Instead of using the name of the columns one could use their ordinal position
in the select phrase.
Example 5.16.2 An equivalent form of the query from Example 5.16.1 is
select stno, empno, cno, sem, year, grade
from GRADES
order by 1, 6 desc;
EMPNO
----------019
019
019
019
023
023
019
019
019
019
019
234
234
234
234
019
056
019
056
056
019
CNO
----cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs310
cs310
cs410
cs410
cs240
cs240
cs240
cs240
cs240
cs240
SEM
YEAR
GRADE
------ ---------- ---------FALL
2002
40
FALL
2002
80
FALL
2002
95
FALL
2002
100
SPRING
2003
75
SPRING
2003
60
FALL
2003
90
FALL
2003
90
SPRING
2004
70
SPRING
2004
85
FALL
2003
70
SPRING
2004
100
SPRING
2004
75
SPRING
2003
60
SPRING
2003
80
SPRING
2003
100
SPRING
2004
80
SPRING
2003
50
SPRING
2004
70
SPRING
2004
90
SPRING
2003
100
114
stno
1011
2661
3566
5544
1011
4022
1011
3566
4022
5571
2661
3566
5571
1011
4022
5544
2415
2661
4022
3442
5571
empno
019
019
019
019
023
023
019
019
019
019
019
019
019
056
056
056
019
234
234
234
234
cno
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs410
cs410
sem
FALL
FALL
FALL
FALL
SPRING
SPRING
FALL
FALL
SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
year
2002
2002
2002
2002
2003
2003
2003
2003
2004
2004
2003
2003
2003
2004
2004
2004
2003
2004
2004
2003
2003
grade
40
80
95
100
75
60
90
90
70
85
70
100
50
90
80
70
100
100
75
60
80
5.17
The group by clause serves to group together tuples of tables based on the
common value of an attribute or of a group of attributes. Suppose, for instance,
that we wish to partition the table GRADES into groups based on the course
number. This can be done by using a construct like
select ... from GRADES group by cno
Conceptually, we operate on the table shown in Figure 5.2. The reader should
imagine that the table has been divided into five groups, each corresponding
to one course. In the previous select, we left open the target list following
select. Once a table has been partitioned into groups (using group by), the
select construct that we use must return one or more atomic pieces of data for
every group. The term atomic, in this context, refers to simple pieces of data
(numbers, strings, etc.). By contrast, a set of values is not an atomic piece of
data. For instance, the number of students enrolled in each course can be listed
by:
select cno, count(stno) as totenr from GRADES
group by cno
empno
019
019
019
019
023
023
019
019
019
019
019
019
019
019
056
056
056
234
234
234
234
115
cno
cs110
cs110
cs110
cs110
cs110
cs110
cs210
cs210
cs210
cs210
cs210
cs240
cs240
cs240
cs240
cs240
cs240
cs310
cs310
cs410
cs410
sem
FALL
FALL
FALL
FALL
SPRING
SPRING
FALL
FALL
FALL
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
SPRING
year
2002
2002
2002
2002
2003
2003
2003
2003
2003
2004
2004
2003
2003
2003
2004
2004
2004
2004
2004
2003
2003
grade
40
80
95
100
75
60
90
90
70
85
70
100
50
100
70
90
80
100
75
60
80
6
5
6
2
2
because more than one student is enrolled in a course, and therefore the entries
of the result under the attribute stno would be sets of values rather than simple
values. SQL enforces the atomicity of the data generated by a select with
group by by demanding that any component of the target list of such a select
must be either one of the grouping attributes or a built-in function.
Example 5.17.1 Grouping can be done on more than one attribute. Suppose
that now we are interested not in the total enrollment but, rather, in the enrollment numbers for each offering of the courses, that is, in the numbers during
every semester of every year. This can be done using the select construction:
select cno, sem, year, count(stno) as enrol
from GRADES
group by cno, year, sem
order by cno, sem, year;
116
CNO
----cs110
cs110
cs210
cs210
cs240
cs240
cs310
cs410
Example 5.17.2 The next select construct determines the average grade and
the number of courses taken by every student and sorts the results in ascending
order on the student number:
select stno, avg(grade) as average,
count(cno) as ncourses
from GRADES
group by stno
order by stno;
Grouping can be applied in combination with selection. In such cases, selection is applied first and the resulting rows are grouped.
Example 5.17.3 The select construct that follows determines the average
grade in cs110 during successive offerings of this course:
select sem, year, avg(grade) from GRADES
where cno = cs110
group by sem, year
order by year, sem
117
As expected, this will return the same result as the query discussed in Example 5.12.3.
118
5.17.1
The function decode is typically used with four arguments and has the syntax:
decode(value,search value,result,default value)
The value returned by this function is:
(
r if x = s
decode(x, s, r, d) =
d otherwise.
Example 5.17.6 A course is defined as introductory if its first digit is one.
Using the decode function we can print a list of students and the courses they
took followed by an indication of their status using the query:
select stno,cno,
decode(substr(cno,3,1),1,Introductory course,Advanced course)
from grades;
Note that the first digit of the course number is the third character of the cno
value; this digit is extracted by the function substr previously discussed. The
query yields the following result:
STNO
---------1011
1011
1011
1011
2415
2661
2661
2661
3442
3566
3566
3566
4022
4022
4022
4022
5544
5544
5571
5571
5571
CNO
----cs110
cs110
cs210
cs240
cs240
cs110
cs210
cs310
cs410
cs110
cs210
cs240
cs110
cs210
cs240
cs310
cs110
cs240
cs210
cs240
cs410
DECODE(SUBSTR(CNO,3
------------------Introductory course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Introductory course
Advanced course
Advanced course
Advanced course
Advanced course
119
Example 5.17.7 The following variant of the previous query will print First
year course, Second year course, etc., depending on the first digit of the course
number:
select stno,cno,
decode(substr(cno,3,1),1,First year course,
2,Second year course,
3,Third year course,
4,Fourth year course,
Special course)
from grades;
CNO
----cs110
cs110
cs210
cs240
cs240
cs110
cs210
cs310
cs410
cs110
cs210
cs240
cs110
cs210
cs240
cs310
cs110
cs240
cs210
cs240
cs410
DECODE(SUBSTR(CNO,
-----------------First year course
First year course
Second year course
Second year course
Second year course
First year course
Second year course
Third year course
Fourth year course
First year course
Second year course
Second year course
First year course
Second year course
Second year course
Third year course
First year course
Second year course
Second year course
Second year course
Fourth year course
120
end
In the first case the function returns the result that corresponds to the search
value that matches the first argument; in the second case, case returns the
result that corresponds to the first condition that is satisfied.
Example 5.17.8 Using case we can give an alternate solution to the query
solved in Example 5.17.7:
select stno,cno,
case substr(cno,3,1)
when 1 then First year course
when 2 then Second year course
when 3 then Third year course
when 4 then Fourth year course
else Special course
end
from grades;
Example 5.17.9 Suppose that the minimal passing grade is 60 for the first
and second year courses and 70 for the third and fourth year courses. We wish
to print a report that prints Passed or Failed depending on the grade and
level of the course. This can be done with the following query:
select stno,cno, grade,
case when (substr(cno,3,1) in (1,2) and grade >= 60) or
(substr(cno,3,1) in (3,4) and grade >= 70)
then Passed
else Failed
end
from grades
CNO
GRADE CASEWH
----- ---------- -----cs110
40 Failed
cs110
80 Passed
cs110
95 Passed
cs110
100 Passed
cs110
75 Passed
cs110
60 Passed
cs240
100 Passed
cs240
50 Failed
cs240
100 Passed
cs410
60 Failed
cs410
80 Passed
cs210
90 Passed
cs210
70 Passed
cs210
90 Passed
cs210
85 Passed
cs210
70 Passed
cs240
cs240
cs240
cs310
cs310
5.17.2
70
90
80
100
75
121
Passed
Passed
Passed
Passed
Passed
For analyzing complex data, we often wish to partition data into blocks and then
calculate subtotals for these blocks. For example, we may wish to analyze sales
data by geographical region, so we want to calculate values for New England, the
Midwest, the South, etc. Such analyses are faciliatated by ORACLEs rollup
extension of group by.
Example 5.17.10 Suppose that we need to print a report summarizing the
number of grades given in every course by every instructor. We wish to print
subtotals for every course and then a general total for all courses. This can be
done in SQL using three subqueries (each containing a group by clause) as
follows:
select cno,empno,count(grade)
from grades
group by cno,empno
union
select cno,,count(grade)
from grades
group by cno
union
select ,,count(grade)
from grades;
EMPNO
COUNT(GRADE)
----------- -----------019
4
023
2
6
019
5
5
019
3
056
3
6
234
2
2
234
2
2
21
122
It is clear that the execution of this query entails three scans of the table
GRADES followed by the computation of the unions. The result is sorted because
of the use of the union operation.
In SQL Plus we can replace the cumbersome query used in Example 5.17.10
by:
select cno,empno,count(grade)
from grades
group by rollup(cno,empno);
which produces exactly the same result. Note that after the number of grades
for the first two groups are reported in the first two detail rows a blank is printed
for the empno of the third row; this is the rollup way of indicating that this
row contains the subtotal number of grades for the course cs110. A new detail
row follows for cs210 and, since this course is taught only by the employee 019,
the next row contains a subtotal for this course, etc. Finally, the last row, with
blank for the first two columns is the total number of grades for all courses.
We conclude that the rollup extension of group by generates subtotals in
increasing order of aggregation until all expressions in the group by clause are
rolled up.
Example 5.17.11 The next example uses three grouping attributes cno, empno,
stno:
select cno,empno,stno,count(grade)
from grades
group by rollup(cno,empno,stno)
EMPNO
----------019
019
019
019
019
023
023
023
019
019
019
019
019
019
019
019
019
STNO
COUNT(GRADE)
---------- -----------1011
1
2661
1
3566
1
5544
1
4
1011
1
4022
1
2
6
1011
1
2661
1
3566
1
4022
1
5571
1
5
5
2415
1
3566
1
5571
1
019
056
056
056
056
1011
4022
5544
234
234
234
2661
4022
234
234
234
3442
5571
123
3
1
1
1
3
6
1
1
2
2
1
1
2
2
21
The order in which attributes are rolled up influences the result of the query
as the next example shows:
Example 5.17.12 Suppose that we invert the grouping attributes cno and
empno as in
select empno,cno, count(grade)
from grades
group by rollup(empno,cno);
CNO
COUNT(GRADE)
----- -----------cs110
4
cs210
5
cs240
3
12
cs110
2
2
cs240
3
3
cs310
2
cs410
2
4
21
Note that this time the subtotals are computed for every employee, and then,
for all employees.
Partial rollups, that is, rollups that involve only a subset of the grouping
attributes, are always possible as shown in the next example.
Example 5.17.13 Suppose that we need to count the number of times a student takes a course and the number of course offerings a student took. This can
be achieved by:
124
CNO
COUNT(GRADE)
----- -----------cs110
2
cs210
1
cs240
1
4
cs240
1
1
cs110
1
cs210
1
cs310
1
3
cs410
1
1
cs110
1
cs210
1
cs240
1
3
cs110
1
cs210
1
cs240
1
cs310
1
4
cs110
1
cs240
1
2
cs210
1
cs240
1
cs410
1
3
This shows that the student whose number is 1011 took four course offerings
and repeated cs110. Note that for a partial rollup no general total is produced.
The rollup extension is especially useful when there exists a natural order
on the attributes of a table, as is in the next example.
Example 5.17.14 Suppose that we have the table SALES that contains records
of sales in a chain of department stores that is present in several regions of the
country: the North East (NE), South East (SE), and Midwest (MW).
REGION
---------NE
NE
ST
-NY
NY
CITY
--------------New York City
New York City
STORENO
SALESVOL
---------- ---------55
1000
67
800
NY
MA
MA
FL
FL
GA
GA
GA
OH
KS
KS
KS
KS
Syracuse
Worcester
Boston
Miami
Miami
Atlanta
Atlanta
Augusta
Athens
Topeka
Lawrence
Lawrence
Wichita
125
90
41
83
62
74
60
52
95
48
33
72
09
38
600
1000
750
450
900
500
1100
300
590
860
300
700
900
ST SUM(SALESVOL)
-- ------------KS
2760
OH
590
3350
MA
1750
NY
2400
4150
FL
1350
GA
1900
3250
10750
ST
-KS
KS
KS
CITY
SUM(SALESVOL)
--------------- ------------Lawrence
1000
Topeka
860
Wichita
900
126
MW
MW
MW
MW
NE
NE
NE
NE
NE
NE
NE
SE
SE
SE
SE
SE
SE
Boston
Worcester
New York City
Syracuse
FL Miami
FL
GA Atlanta
GA Augusta
GA
2760
590
590
3350
750
1000
1750
1800
600
2400
4150
1350
1350
1600
300
1900
3250
Another useful extension of group by is cube. The rollup extension summarizes at increasing levels of aggregation from left to right; in contrast, cube
summarizes at all possible levels of aggregation.
Example 5.17.15 A full aggregation can be achieved by using cube as in:
select cno,empno,count(grade)
from grades
group by cube(cno,empno);
EMPNO
COUNT(GRADE)
----------- -----------019
4
023
2
6
019
5
5
019
3
056
3
6
234
2
2
234
2
2
019
12
023
2
056
3
234
4
21
127
will result in
EMPNO
----------019
019
019
019
023
023
056
056
234
234
234
CNO
COUNT(GRADE)
----- -----------cs110
4
cs210
5
cs240
3
12
cs110
2
2
cs240
3
3
cs310
2
cs410
2
4
cs110
6
cs210
5
cs240
6
cs310
2
cs410
2
21
The totals computed by either of these cubes are shown in Figure 5.4.
Partial cube aggregations include group by clauses of the form
group by A1 , . . . , Ak , cube (B1 , . . . , B )
and compute total values of an aggregate function for all groups that can be
obtained for values of A1 , . . . , Ak and all combinations of values of B1 , . . . , Bk .
Example 5.17.16 The partial cube aggregation:
select cno,empno,stno,count(grade) from grades
group by cno,cube(empno,stno)
EMPNO
----------019
019
019
019
019
023
STNO
COUNT(GRADE)
---------- -----------1011
1
2661
1
3566
1
5544
1
4
1011
1
128
Total for
cno
course
cs410
cs310
cs240
cs210
cs110
21
5
4
019
023
056
12
234 empno
4
Total for
employee
023
023
4022
1011
2661
3566
4022
5544
019
019
019
019
019
019
1011
2661
3566
4022
5571
1011
2661
3566
4022
5571
019
019
019
019
056
056
056
056
2415
3566
5571
1011
4022
5544
1011
2415
3566
4022
5544
5571
234
234
234
2661
4022
2661
4022
234
234
234
53 rows selected.
3442
5571
3442
5571
129
1
2
2
1
1
1
1
6
1
1
1
1
1
5
1
1
1
1
1
5
1
1
1
3
1
1
1
3
1
1
1
1
1
1
6
1
1
2
1
1
2
1
1
2
1
1
2
130
EMPNO
NOGR
C
E
----------- ---------- ---------- ---------019
4
0
0
023
2
0
0
6
0
1
019
5
0
0
5
0
1
019
3
0
0
056
3
0
0
6
0
1
234
2
0
0
2
0
1
234
2
0
0
2
0
1
019
12
1
0
023
2
1
0
056
3
1
0
234
4
1
0
21
1
1
17 rows selected.
In turn, we can use the grouping values and the having clause to retain only
certain summary rows as in
select cno, empno, count(grade) as nogr,
grouping(cno) as c, grouping(empno) as e
from grades
group by cube(cno,empno)
having grouping(cno) = 1 or grouping(empno) = 1
5.18
5
6
2
2
12
2
3
4
21
0
0
0
0
1
1
1
1
1
131
1
1
1
1
0
0
0
0
1
The function rank that we use in this query computes for each row a numerical rank starting from the content of the window.
The analytical clause used in the previous example indicates that the rows
retrieved by the query are partitioned based on the value of the number of
credits (noc) and, then in each group the rows are ordered according to the
values of the gpa attribute.
In general, the computation of the analytical clause is done after the computation of the from, where, group by, and having clauses.
Analytic functions are classified as shown in the table below:
132
USAGE
Calculating ranks, percentiles and n-tiles
Cumulative and moving averages
Calculating shares
Finding a value in a row located a
specified number of rows from the current row
Linear regression and other statistics
5.18.1
Ranking Functions
SQL Plus contains the ranking functions rank() and dense_rank() that can
be use to rank tuples in an order determined by certain attributes or expressions. Both functions generate ranks in either ascending or descending order,
but dense_rank() does not leave gaps in rank numbers when a tie occurs. The
default order is, as usual, ascending order.
Example 5.18.1 To rank the grade records based on the grade obtained in
any course we may write:
select stno, grade,
rank() over (order by grade)
from grades;
75
75
80
80
80
85
90
90
90
95
100
100
100
100
133
8
8
10
10
10
13
14
14
14
17
18
18
18
18
where the highest ranking is attributed to the grade record that involves the
lowest grade. To reverse the ranking we write:
select stno, grade,
rank() over (order by grade desc)
from grades;
which yields:
STNO
GRADE RANK
---------- ---------- ---5544
100
1
3566
100
1
2415
100
1
2661
100
1
3566
95
5
1011
90
6
1011
90
6
3566
90
6
5571
85
9
2661
80
10
5571
80
10
4022
80
10
1011
75
13
4022
75
13
2661
70
15
5544
70
15
4022
70
15
4022
60
18
3442
60
18
5571
50
20
1011
40
21
Note that the first four grade records are tied for the first place; therefore,
the record that follows the tied records has rank 5. With the dense_rank() all
134
four tied records will have rank 1 and the record that follows will have rank 2.
This can be achieved by writing:
select stno, grade,
dense_rank() over (order by grade desc) as den_rank
from grades;
3
3
3
2
1
1
135
2
2
2
3
4
4
If we wish to rank the students based on the number of courses and, then,
at an equal number of courses, to rank them in the order of the grade point
average, we could write the following query:
select STUDENTS.name, GA.noc as no_of_c, GA.gpa as gpa,
rank() over (partition by GA.noc
order by GA.gpa desc) as rank
from (select stno,
count(distinct cno) as noc,
avg(grade) as gpa
from GRADES
group by stno) GA, STUDENTS
where STUDENTS.stno = GA.stno
The partition by option establishes groups of equal GA.noc value, and then it
ranks the record in each such group using the order by clause. The result of
this query is:
NAME
NO_OF_C
GPA
RANK
------------------------ ---------- ---------Grogan A. Mary
1
100
1
Novak Roland
1
60
2
Rawlings Jerry
2
85
1
Pierce Richard
3
95
1
Mixon Leatha
3
83.33
2
Edwards P. David
3
73.75
3
Lewis Jerry
3
71.66
4
Prior Lorraine
4
71.25
1
8 rows selected.
In general, the expression in the partition by clause divides the set of rows
that results from the query in groups and the rank() function operates within
these groups; in other words, rank() is reset when the defining expression of the
group changes. The order by clause attached to the rank specifies the ranking
criterion and the order of the rows in each group.
5.18.2
Top-n Queries
Top-n queries ask for the n largest or smallest values of a column. Such queries
are solved in ORACLE using the pseudo-attribute ROWNUM which assigns a value
136
starting with 1 to each of the rows returned by a subquery. Thus, a top-n query
in SQL Plus requires the following elements:
1. a subquery containing the order by clause that ensures that the rows
retrieved by the subquery are placed in the proper order;
2. the main query that includes the ROWNUM pseudo-attribute and may include
a where clause to specify the number of returned rows.
Example 5.18.3 To retrieve the top three students in the order of their grade
point averages we write:
select ROWNUM as rank, name, avgg from
(select STUDENTS.stno, STUDENTS.name, avg(grade) as avgg
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by avg(grade) desc)
where ROWNUM <= 3
NAME
AVGG
--------------- ------Grogan A. Mary
100
Pierce Richard
95
Rawlings Jerry
85
will yield:
RANK
---1
2
3
NAME
--------------Novak Roland
Prior Lorraine
Lewis Jerry
AVGG
----60
71.25
71.67
Example 5.18.4 Ties between rows may eliminate rows that we would expect
to see in results of our queries. The next query
137
NAME
NOC
--------------------Prior Lorraine
4
Edwards P. David
3
Mixon Leatha
3
Pierce Richard
3
Lewis Jerry
3
Rawlings Jerry
2
Grogan A. Mary
1
Novak Roland
1
To retrieve the first four students among the ones who took the largest
number of courses we write:
select ROWNUM as rank, name, noc
from (select STUDENTS.stno, STUDENTS.name,
count(distinct cno) as noc
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno
group by STUDENTS.stno, STUDENTS.name
order by count(distinct cno) desc)
where ROWNUM <= 4
138
5.18.3
NAME
C_RANK
-------------- -----Grogan A. Mary
7
Pierce Richard
4
Rawlings Jerry
6
Windowing functions are used in SQL Plus to compute cumulative, moving, and
other aggregate functions applied to a set of tuples called a window. The size
and shape of the window is always defined relative to a row in a block; this
reference row is called the current row.
Aggregate functions that can be used include sum, avg, min, max, statistical functions (discussed in Section 5.19), as well as two special functions,
first value and last value that return the first and last values in a window.
Example 5.18.6 To compute the evolution of the grade average for each student as he or she advances towards graduation, we can write a query that returns
the cumulative average for each student for the sequence of semesters when the
student is active:
select stno, year, sem,
avg(grade) over (partition by stno
order by year, sem desc
rows unbounded preceding) as ag
from grades
order by stno, year, sem desc;
2003
2002
2003
2003
2003
2004
2004
2004
2002
2004
2003
2003
2004
139
SPRING
FALL
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
FALL
SPRING
SPRING
SPRING
SPRING
60
95
97.5
95
60
65
70
71.25
100
85
50
65
71.67
The words unbound preceding mean that the window over which we compute
the grade average extends to all rows that involve the same student and precede
the current row.
The syntax of the windowing functions is:
aggregate function (value expression | *)
over ([partition byvalue expression{,value expression}]
order by value expression [collate clause]
[asc | desc] [nulls first | nulls last]
{,value expression [collate clause]
[asc | desc] [nulls first | nulls last}
[rows | range]
[[unbounded preceding | value expression preceding] |
between [unbounded preceding | value expression preceding]
andhcurrent row | value expression following
5.19
Statistics in SQL
5.19.1
Population and sample variance can be computed using the functions var pop
and var samp, respectively. Both functions take an attribute as argument and
apply to the remaining non-null values. If the sequence of values of an attribute
A is (x1 , . . . , xn ), then the population variance is:
var pop(A) =
Pn
i=1 (xi
x
)2
Pn
i=1
x2i (
n2
Pn
i=1
xi )
140
Pn
x
)2
n
=
n1
i=1 (xi
Pn
i=1
P
2
x2i ( ni=1 xi )
,
n(n 1)
i
. As it is shown in statistics, the sample variance is an
where x
= i=1
n
unbiased estimator of the theoretical variance.
Example 5.19.1 To determine the population variance for the grade population of each student we group the records of GRADES on the student number
and then compute the population variance for each group. This is done by the
following select phrase:
select stno, var_pop(grade)
from GRADES
group by stno;
which returns:
STNO
VAR_POP(GRADE)
---------- -------------1011
417.18
2415
0
2661
155.55
3442
0
3566
16.66
4022
54.68
5544
225
5571
238.88
which yields:
STNO
VAR_SAMP(GRADE)
---------- --------------1011
556.25
2415
2661
233.33
3442
3566
25
4022
72.91
5544
450
5571
358.33
8 rows selected.
To compute the population variance grade over the entire GRADES table we
write:
141
select var_pop(grade)
from GRADES;
which gives:
VAR_POP(GRADE)
-------------275.283447
A similar select
select var_samp(grade)
from GRADES;
If the set of values of the sample contains one value, then the function
var samp returns a null value. This is the case in the query:
select var_samp(grade)
from GRADES
where stno= 1011 and cno = cs110
and year = 1999;
which yields:
VAR_SAMP(GRADE)
---------------
returns:
VARIANCE(GRADE)
--------------0
The population standard deviation and the sample standard deviation that are
the square roots of the population and the sample variance, respectively, can be
computed using the functions stddev pop and stddev samp, respectively.
Example 5.19.2 To compute the population standard deviation of the set of
values of the grade for each student we write:
142
which generates:
STNO
STDDEV_SAMP(GRADE)
---------- -----------------1011
23.58
2415
2661
15.27
3442
3566
5
4022
8.53
5544
21.21
5571
18.92
8 rows selected.
The population and the sample covariances between the values that appear
under the attributes T.A and S.B are computed using the functions covar pop
and covar samp, respectively, as in the following select phrases:
select covar_pop(T.A,S.B) from T,S where T.C = S.D;
select covar_samp(T.A,S.B) from T,S where T.C = S.D;
Example 5.19.3 The table sstudy contains whose creation was described in
Appendix B records the number of hours slept during three successive nights
by a group of students. To determine the population covariance between the
average number of hours slept and the grade point average of the students we
write:
143
144
5.19.2
Linear Regression
n
X
i=1
It is possible toP
show that
minimum of E is achieved when:
Pthe P
n xi yi xi yi
a =
P
P 2
(n x2 xi )
P P i2 P P
yi xi xi xi yi
b =
.
P
P 2
(n x2i xi )
Thus, we obtain the regression line y = ax + b, where a is the slope and b is
the intercept. These numbers are computed by the functions regr slope and
regr intercept, respectively. Both take as arguments the averages of the xsequence and the y-sequence. The quality of the regression line obtained can be
evaluated using the goodness of fit regr r2 which takes the same arguments as
the functions mentioned above.
Example 5.19.5 To compute the regression parameters for the sequences of
average grades and the sequence of average hours of nightly sleep for all students
we write:
select regr_count(g.avggrade, s.avghours) as rc,
regr_avgx(g.avggrade, s.avghours) as avgx,
regr_avgy(g.avggrade, s.avghours) as avgy,
regr_slope(g.avggrade, s.avghours) as slope,
regr_intercept(g.avggrade, s.avghours) as interc,
regr_r2(g.avggrade, s.avghours) as gof
from (select stno, avg(grade) as avggrade
from GRADES
group by stno) g,
(select stno, avg(no_hours) as avghours
from SSTUDY
group by stno) s
where g.stno = s.stno;
145
3
0 e
3
e
?
4 e
?
?
1 e 5 e
6
se
2
we
3 6
7
5.20
Graphs represent binary relations on sets, in the sense of the following definition.
A graph is defined as a pair of sets G = (V, E), where V is the set of vertices of
G and E V V is the set of edges of G. Clearly, E is a binary relation on V .
If (u, v) E, we say that u is origin of the edge (u, v) and v is destination
of the same edge. A graph can be drawn by representing the vertices by points
and edges by arrows. Namely, if (u, v) is an edge, we draw in arrow that begins
at u and ends at v.
Example 5.20.1 Consider the graph G = (V, E), where V = {0, 1, 2, 3, 4, 5, 6}
and E = {(0, 1), (0, 3), (1, 2), (2, 5), (2, 6), (3, 4), (3, 6), (4, 5), (5, 6)}. This graph
is drawn in Figure 5.5.
Graphs can be represented by tables that have the heading origin destination. Each edge (u, v) corresponds to a pair in the table. Clearly, for any graph
the corresponding table contains the same information as the graph.
Example 5.20.2 The graph of Example 5.20.1 is represented by the table:
origin
0
0
1
2
2
3
3
4
5
GRAPH
destination
1
3
2
5
6
4
6
5
6
146
length of the path. A path that begins and ends in the same vertex is a cycle
or a loop. If a graph has no cycles, then we say that the graph is acyclic. Note
that the graph defined in Example 5.20.1 is acyclic.
We write (u, v) E + if there exists a path of length at least 1 that has u as
its origin and v as its destination. The relation E + is transitive closure of the
relation E.
Example 5.20.3 The transitive closure of the relation E defined by the graph
of Example 5.20.1 consists of the following pairs:
(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6),
(1, 2), (1, 5), (1, 6), (2, 5), (2, 6), (3, 4),
(3, 5), (3, 6), (4, 5), (4, 6)
as can be easily seen by inspecting Figure 5.5.
Of course, the transitive closure E + of a relation E V V is itself a relation
on the set V and, therefore, it can also be represented as a table. Namely, the
tabular representation of E + is:
GRAPHPLUS
origin
destination
0
1
0
2
0
3
0
4
0
5
0
6
1
2
1
5
1
6
2
5
2
6
3
4
3
5
3
6
4
5
4
6
147
In Chapter 8 we discuss an algorithm that can be used for compute the transitive
closure for arbitrary graphs (with or without loops).
If a data set has a hierarchical structure, then it can be described by a rooted
tree, that is, by a special acyclic graph G = (V, E) that has a distinguished
vertex v0 called root such that for every other vertex v of the graph there is
a unique path that joins v0 to v. It is not difficult to show that for any two
distinct vertices u, v of a rooted tree there exists at most one path that joins u
to v. If such a path exists then we say that v is a descendant of u.
Example 5.20.5 The option connect by of SQL Plus can be used to find the
descendants of a vertex in a rooted tree. Consider, for example the rooted tree
shown in Figure 5.6. The table that represents this tree is created by the SQL
script included in Appendix F and has the form:
148
1
f
4
f
?
f
s
3 f
Uf
f/7 ?
f wf
5 6
8
f Uf
11
12
f
9
Uf
10
origin
0
0
0
1
1
2
2
2
3
3
7
7
TREE
destination
1
2
3
4
5
6
7
8
9
10
11
12
To retrieve the all descendants of a vertex (in this case, of vertex 2) we write:
select distinct destination as DESCENDANTS from tree
start with origin = 2
connect by origin = prior destination;
This returns:
DESCENDANTS
----------6
7
8
11
12
On the other hand, to retrieve the ancestors of a vertex, that is all vertices
that occur between the root of the tree and a vertex we write:
select distinct origin as ANCESTORS from tree
start with destination = 12
connect by destination = prior origin;
149
ANCESTORS
---------0
2
7
The reserved word prior can be used on either side of the equality sign. For
example, the last query of Example 5.20.5 can be written as:
select distinct origin ANCESTORS from tree
start with destination = 12
connect by prior origin = destination;
The pseudo-attribute LEVEL can be used to indicate the length of the path
that begins at the starting vertex of the query and ends with the vertex currently
retrieved.
Example 5.20.6 The following query adds the pseudo-attribute LEVEL to the
query of Example 5.20.5 that retrieves the descendants of the vertex 2:
select distinct level, destination as DESCENDANTS from tree
start with origin = 2
connect by origin = prior destination
Observe that the immediate descendants are at level 1 and the next level of
descendants at level 2.
If we retrieve the ancestor of a node as in
select distinct level, origin as ANCESTORS from tree
start with destination = 12
connect by destination = prior origin;
the values of LEVEL reflects the distance (in number of edges) between the vertex
and its various ancestors:
LEVEL ANCESTORS
---------- ---------1
7
2
2
3
0
150
Example 5.20.7 Combining the string function lpad and the pseudo-attribute
LEVEL allows us to display the entire tree using indentations. The query:
select level,lpad(*,2 * level -1)||destination as vertex from tree
start with origin = 0
connect by prior destination = origin;
VERTEX
--------*1
*4
*5
*2
*6
*7
*11
*12
*8
*3
*9
*10
An alternate way for obtaining a description of a tree that shows the paths
that can be used to reach vertices can be obtained using two pseudo-attributes
CONNECT BY ISLEAF and SYS CONNECT BY PATH. CONNECT BY LEAF returns 1 if
the current vertex, (in our case, the destination) of the edge is a leaf and 0,
otherwise. For every edge of the path that joins the starting vertex to the current
node the pseudo-attribute SYS CONNECT BY PATH computes a string specified by
its first argument; entries between successive edges are separated by the string
specified by its second argument.
Example 5.20.8 The query:
select level,destination,
CONNECT_BY_ISLEAF "IsLeaf?",
SYS_CONNECT_BY_PATH((||origin||,||destination||),+) "Path"
from tree
start with origin = 0
connect by prior destination = origin
order by level
will return:
LEVEL
1
1
1
2
2
2
DE
1
2
3
4
7
6
IsLeaf?
0
0
0
1
0
1
Path
+(0,1)
+(0,2)
+(0,3)
+(0,1)+(1,4)
+(0,2)+(2,7)
+(0,2)+(2,6)
5.21
10
9
8
5
11
12
151
1
1
1
1
1
1
+(0,3)+(3,10)
+(0,3)+(3,9)
+(0,2)+(2,8)
+(0,1)+(1,5)
+(0,2)+(2,7)+(7,11)
+(0,2)+(2,7)+(7,12)
Updates in SQL
There are three constructs in SQL that allow us to update the tables of a
relational database: update, insert, and delete.
The update construct modifies components of tuples. It applies to all tuples
of the specified table unless limited by a where clause.
Example 5.21.1 Recall the table EMPHIST introduced in Example 3.3.5. A
script to create and populate the tables discussed in that example is contained
in the script ced.sql that is available in Appendix C.
To give all current employees a 10% raise, we apply the following update
phrase:
update EMPHIST
set salary = 1.1* salary
where term_date is null;
152
we can load this table using data from the existing table GRADES using the
construct:
insert into ASSIGN(empno, cno, sem, year)
select distinct empno, cno, sem, year
from GRADES;
cno
cs110
cs210
cs210
cs240
cs110
cs240
cs310
cs410
sem
Fall
Fall
Spring
Spring
Spring
Spring
Spring
Spring
year
2001
2002
2003
2002
2002
2003
2003
2002
If the components of the tuple to be inserted into a table violate the declaration of the table (e.g., a null value for a not null attribute, or a character
string for a numerical attribute), the DBMS should reject the insertion.
Likewise, the delete construct deletes rows of tables.
Example 5.21.4 To delete the rows of the table ASSIGN that correspond to
course taught by the instructor whose employee number is 234, we write:
delete from ASSIGN where empno = 234;
The directive:
delete from GRADES where grade is null;
153
Example 5.21.5 The following delete eliminates all rows of the table ASSIGN:
delete from ASSIGN;
5.22
Access Rights
The grant operation assigns access rights to users. To delegate access rights to
other users, a user must own these rights. The set of access rights includes
select, insert, update, and delete and refers to the right of executing each
of these operations on a table. Further, update can be restricted to specific
columns.
All these access rights are granted to the creator of a table automatically.
The creator, in turn, may grant access rights to other users or to all users
(designated in SQL as public). The SQL standard envisions a mechanism that
can limit the excessive proliferation of access rights. Namely, a user may receive
the select right with or without the right to grant this right to others by his
own action.
Example 5.22.1 Suppose that the user alex owns the table COURSES and
intends to grant this right to the user whose name is peter. The user alex can
accomplish this by
grant select on COURSES to peter
Now, peter has the right to query the table COURSES but he may not propagate
this right to the user ellie. In order for this to happen, alex would have to
use the directive:
grant select on COURSES to peter
with grant option
Example 5.22.2 If peter owns the table STUDENTS, then he may delegate
the right to query the table and the right to update the columns addr, city and
zip to ellie using the directive:
grant select, update(addr, city, zip) on
STUDENTS to ellie
154
5.23
Views in SQL
Views are virtual tabular variables. This means that in SQL a view is referenced
for retrieval purposes in exactly the same way a tabular variable is referenced.
The only difference is that a view does not have a physical existence. It exists
only as a definition in the database catalog. We refer to real tabular variables
(that is, the tabular variables that have a physical existence in the database) as
base tabular variables.
Views are supported in both SQLPlus and in Transact SQL but not in the
current version (4.1) of MySQL.
To illustrate the notion of view, let us consider the following example.
Example 5.23.1 Suppose that we write:
create view STC as
select STUDENTS.name, GRADES.cno
from STUDENTS, GRADES
where STUDENTS.stno = GRADES.stno;
The select construct contained by this create view retrieves all pairs of
student names and course numbers such that the student whose name is s has
registered for the course c.
When this directive is executed by SQL, no data retrieval takes place. The
database system simply stores this definition in its catalog. The definition of the
view STC becomes a persistent object, that is, an object that exists after our
interaction with the DBMS has ceased. From a conceptual point of view, the
user treats STC exactly like any other tabular variable. Suppose, for instance
that we wish to retrieve the names of students who took cs110. In this case it
is sufficient to write the query:
155
In reality, SQL combines this select phrase with the query just shown and
executes the modified query:
select
The previous example shows that views in SQL play a role similar to the role
played by macros in programming languages.
Views are important for data security. A user who needs to have access only
to list of names of students and the courses they are taking needs to be aware
only of the existence of STC. If the user is authorized to use only select constructs, then the user can ignore whether STC is a table or a view. Confidential
data (such as grades obtained in specific courses) can be completely protected
in this manner. Also, the queries that this limited-access user may write are
simpler and easier to understand. No space is wasted with the view STC, and
the view remains current always, reflecting the contents of the tabular variables
STUDENTS and GRADES.
SQL treats views exactly as it treats the tabular variables as far as retrieval
is concerned. We can also delegate the select privilege to a view in exactly
the same way as we did for a tabular variable. For instance, if the user george
created the view STC, then he can give the select right to vanda by writing:
grant select on STC to vanda;
The purpose of this view is to insure privacy to students. Any user who has
access only to this view can retrieve the student number and name of a student,
but not the address of the student.
There is a fundamental difference between the views introduced in Examples 5.23.1 and 5.23.2, and this refers to the ways in which these two views
behave with respect to updates.
Suppose that the user wishes to insert the pair (7799, Jane Jones) in the
view SNA. The user may ignore entirely the fact that SNA is not a base tabular
variable. On the other hand, the effect on the base tabular variable of this
insertion is unequivocally determined: the system inserts in the tabular variable
STUDENTS the tuple (7799, Jane Jones, null, null, null). On the other hand,
we cannot insert a tuple in a meaningful way in the view STC introduced in
Example 5.23.1. Indeed if we attempt to insert a pair (s, c) in STC, then we have
to define the effect of this insertion on the base tabular variable. This is clearly
156
impossible: we do not know what the student number is, what the identification
of the instructor is, etc. SQL forbids users to update views based on more than
one table (as STC is). Even if such updates would have an unambiguous effect
on the base tabular variable, this rule rejects any such update. Only some views
based on exactly one tabular variable can be updated. It is the responsibility
of the database administrator to grant to the user the right to update a view
only if that view can be updated.
If a view can be updated, then its behavior is somewhat different from the
base tabular variable on which the view is built. An update made to a view
may cause one or several tuples to vanish from the view, whenever we retrieve
the tuples of the view.
Example 5.23.3 Consider the view uppergr defined by:
create view UPPERGR as
select * from GRADES where grade > 75;
If we wish to examine the tuples that satisfy the definition of the view we use
the construction:
select * from UPPERGR;
EMPNO
----------019
019
019
019
019
234
019
019
019
056
056
234
CNO
----cs110
cs110
cs110
cs240
cs240
cs410
cs210
cs210
cs210
cs240
cs240
cs310
SEM
YEAR
GRADE
------ ---------- ---------FALL
1999
80
FALL
1999
95
FALL
1999
100
SPRING
2000
100
SPRING
2000
100
SPRING
2000
80
FALL
2000
90
FALL
2000
90
SPRING
2001
85
SPRING
2001
90
SPRING
2001
80
SPRING
2001
100
makes the first row disappear, since it no longer satisfies the definition of the
view. Indeed, if we use again the same query on UPPERGR, we obtain:
STNO
---------3566
5544
EMPNO
----------019
019
CNO
----cs110
cs110
SEM
YEAR
GRADE
------ ---------- ---------FALL
1999
95
FALL
1999
100
019
019
234
019
019
019
056
056
234
157
cs240
cs240
cs410
cs210
cs210
cs210
cs240
cs240
cs310
SPRING
SPRING
SPRING
FALL
FALL
SPRING
SPRING
SPRING
SPRING
2000
2000
2000
2000
2000
2001
2001
2001
2001
100
100
80
90
90
85
90
80
100
The standard syntax of create view allows us to use the clause with check
option. When this clause is used, every insertion and update done through the
view is verified to make sure that a tuple inserted through the view actually
appears in the view and an update of a row in the view does not cause the row
to vanish from the view.
The syntax of create view is:
create view view as
subselect
[with check option]
A view V can be dropped from a database by using the construct
drop view V;
If we drop a tabular variable from the database, then all views based on that
table are automatically dropped; if we drop a view, then all other views that
use the view that we drop are also dropped.
Views are useful instruments in implementing generalizations. Suppose, that
we began the construction of the college database from the existing tabular
variables UNDERGRADUATES and GRADUATES that modelled sets of entities
having the same name, where
heading (UNDERGRADUATES ) = stno name addr city state zip major
heading (GRADUATES ) = stno name addr city state zip qualdate
Then, the tabular variable STUDENTS could have been obtained as a view
built from the previous two base tabular variables by
create view STUDENTS as
select stno name addr city state zip
from UNDERGRADUATES
union
158
TABLE NAME
STUDENTS
INSTRUCTORS
COURSES
GRADES
ADVISING
user catalog
TABLE TYPE
TABLE
TABLE
TABLE
TABLE
TABLE
5.24
The catalog of ORACLE is a very large tabular variable that can be accessed
through several views defined on this table.
In ORACLE a list of the table owned by the current user is contained by
the view user catalog, also accessible through its synonym cat. A content of this
view is shown in Figure 5.24.
Information that describes space allocation and statistical properties can be
found in the view named USER TABLES, also named TABS. A description of
the attributes of tabular variables and of their domains can be found in the view
USER TAB COLUMNS also accessible as COLS. For example, the query:
select table_name,column_name,data_type from COLS;
COLUMN_NAME
STNO
EMPNO
CNO
CNAME
CR
STNO
EMPNO
CNO
SEM
YEAR
GRADE
EMPNO
NAME
RANK
ROOMNO
TELNO
STNO
NAME
DATA_TYPE
CHAR
CHAR
CHAR
CHAR
NUMBER
CHAR
CHAR
CHAR
CHAR
NUMBER
NUMBER
CHAR
CHAR
CHAR
NUMBER
CHAR
CHAR
CHAR
5.25 Exercises
STUDENTS
STUDENTS
STUDENTS
STUDENTS
ADDR
CITY
STATE
ZIP
159
CHAR
CHAR
CHAR
CHAR
A more complete list of objects that belong to the current user can be found
in the view USER OBJECTS which lists all objects created by the user, including
those mentioned in USER CATALOG, as well as other useful information (such
as the date of creation, the last time when the object was affected by a data
definition statement, the status of the object, etc.)
The definition of views can be accessed by the USER VIEWS catalog view.
Example 5.24.1 The meta-view (view about views) USER VIEWS has the
structure described below:
Name
Null?
------------------------------- -------VIEW_NAME
NOT NULL
TEXT_LENGTH
TEXT
TYPE_TEXT_LENGTH
TYPE_TEXT
OID_TEXT_LENGTH
OID_TEXT
VIEW_TYPE_OWNER
VIEW_TYPE
SUPERVIEW_NAME
Type
-------------VARCHAR2(30)
NUMBER
LONG
NUMBER
VARCHAR2(4000)
NUMBER
VARCHAR2(4000)
VARCHAR2(30)
VARCHAR2(30)
VARCHAR2(30)
The last six attributes are important for object views discussed in Chapter 7.
To extract the definition of the view UPPERGR defined above we write:
select text from user_views where view_name=UPPERGR;
5.25
Exercises
160
5.25 Exercises
161
24. Find the names of students who have failed all their courses (failing is
defined as a grade less than 60).
25. Find the names of students who do not have an advisor.
26. Find the names of instructors who taught every semester when a student
from Rhode Island was enrolled.
27. Find course names of courses taken by every student advised by Prof.
Evans.
28. Find names of students who took every course taught by an instructor
who is advising at least two students.
29. Find names of instructors who teach every student they advise.
30. Find names of students who are taking every course taught by their advisor.
31. Find course numbers of courses taken by every student who lives in Rhode
Island.
32. Find the student numbers of students who took at least two courses.
33. Find the course names of courses in which at least three students were
enrolled.
34. Find the names of instructors who advise at least two students.
35. List all students by name, along with their grade averages.
36. Find student numbers of students for whom the difference between the
highest and the lowest grade is less than 20.
37. Print a report that contains for each course (cno), the number of students
who took the course, the highest, the lowest, and the average grade in the
course.
38. Find the average grade of students who took cs110 at any time. Then,
find students whose grades in cs110 were above the average.
39. Identify those queries that require division among the queries 3 to 34 and
solve those queries using the group by option of SQL.
40. Create views on the college database as specified:
(a) A view that contains the names of the instructors, the courses (cnos)
that they teach, and the average grade in these courses.
(b) A view that shows the names and offices of the instructors.
(c) A view that contains the courses (cnos) , the number of students who
took the courses, the average grade in these courses, and the highest
grade.
(d) A view that contains the names of instructors and the names of the
students that they advise.
(e) A view that shows the data about the students in Massachusetts.
41. Print the contents of the views created in Exercise 40.
42. Determine which of the views created in Exercise 40 can be updated.
43. Using the views created in Exercise 40(a) and 40(c) create a view that
lists the instructors and the total number of students they teach.
44. Solve the following queries:
(a) list names of instructors and the number of courses they taught;
(b) list instructors in the order of the number of courses they taught;
162
(c) list the top three instructors in the order of the number of courses
they taught.
45. Let GRAPH be the table introduced in Example 5.20.3. The degree of a
vertex is the number of edges incident to that vertex.
(a) write an SQL query that yields a list of vertices of a graph arranged
in the decreasing order of their degrees;
(b) list the top 5 vertices of a graph in increasing order of their degrees.
46. For each instructors list the sequence of the numbers of courses that the
instructor taught during each of the semesters that he or she was active.
47. List the top three instructors in the order of the number of students that
they advise.
5.26
Bibliographical Comments