Sie sind auf Seite 1von 17

Overview

Implementing the relational model


SQL
Queries

Relational
Database Systems 1

Data manipulation language (next lecture)

Wolf-Tilo Balke
Joachim Selke

Data definition language (next lecture)

SELECT
INSERT
UPDATE
DELETE
CREATE TABLE
ALTER TABLE
DROP TABLE

Institut fr Informationssysteme
Technische Universitt Braunschweig
www.ifis.cs.tu-bs.de

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.1

Exercise 6.1
Give relational algebra expressions for the following
queries in natural language:
a) Create a list of all hotels that includes the total number
of rooms for each hotel.
hotelName, count roomNo(
hotelNo, hotelNamecount(roomNo) (Hotel Room))

A hotel database:

Hotel(hotelNo, hotelName, city)


Room(roomNo, hotelNo Hotel, type, price)
Booking(hotelNo Hotel/Room, guestNo Guest,
dateFrom, dateTo, roomNo Room)
Guest(guestNo, guestName, guestAddress)
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

b) Whats the average room price at the Balke Inn?


average price(average(price) (
hotelName = Balke Inn(Hotel) Room))
3

Exercise 6.1

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.2

c) How many different people made a reservation for a


room whose price is higher than the average room
price at the respective hotel?
AvgPrice hotelNo, average price(
hotelNoaverage(price) (Hotel Room))
ExpensiveRooms
hotelNo, roomNo(price > average price(Room AvgPrice))
RichGuests = guestNo(Booking ExpensiveRooms)

Describe (using natural language) the relations


that would be produced by the following
tuple relational calculus expressions:
a) { H.hotelName | hotel(H) H.city = Braunschweig }
List the names of all hotels in Braunschweig!

b) { H.hotelName | Hotel(H) R (
Room(R) H.hotelNo = R.hotelNo R.price > 50)}
List the names of all hotels offering a room that
costs more than 50!

Result = count(guestNo)RichGuests
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Of course, this does not work for hotels without any rooms.
However, lets assume that there is no such hotel ...

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.2

Exercise 6.2
d) { H.hotelName, G.guestName, B1.dateFrom, B2.dateFrom |

c) { H.hotelName | Hotel(H) B G (

Hotel(H) Guest(G) Booking(B1) Booking(B2)


H.hotelNo = B1.hotelNo G.guestNo = B1.guestNo
B2.hotelNo = B1.hotelNo B2.guestNo = B1.guestNo
B2.dateFrom B1.dateFrom}

Booking(B) Guest(G) H.hotelNo = B.hotelNo


B.guestNo = G.guestNo
G.guestName = Wolf-Tilo Balke)}
List the names of all hotels in that Wolf-Tilo Balke
has or had a reservation!

List the names of all hotels and guests that


have or had different reservations at the same hotel;
for each hotel and guest, also list all possible pairs
of starting dates for the corresponding reservations.

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.3

b) List all single rooms with a price below 20 per night.


RA: hotelName, city, roomNo(
type = single price < 20 (Room Hotel))

a) List all hotels.

TRC: { h.hotelName, h.city, r.roomNo |


Hotel(h) Room(r) r.type = single
r.price < 20 h.hotelNo = r.hotelNo }

RA: hotelName(Hotel)
TRC: { h.hotelName | Hotel(h) }

DRC: { h2, h3, r1 | r4, h1 (Room(r1, h1, single, r4)


r4 < 20 Hotel(h1, h2, h3)) }

DRC: { h2 | h1, h3 Hotel(h1, h2, h3) }

Exercise 6.3

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

10

Exercise 6.3

c) List the price and type of all rooms at the Balke Inn.
RA: roomNo, type, price(Room hotelName = Balke Inn(Hotel))
TRC: { r.roomNo, r.roomType, r.price | Room(r)
h (Hotel(h) h.hotelName = Balke Inn
h.hotelNo = r.hotelNo) }
DRC: { r1, r3, r4 | h1, h3 (Room(r1, h1, r3, r4)
Hotel(h1, Balke Inn, h3)) }

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.3

Generate the relational algebra, tuple relational


calculus, and domain relational calculus
expressions for the following queries:

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Hint: (hotelNo, guestNo, dateFrom) is the primary key of


Booking, which makes each result line a pair of reservations

11

d) List the details (guestNo, guestName, and guestAddress)


of all guests currently staying at the Balke Inn.

RA: Guest dateFrom TODAY dateTo TODAY


hotelName = Balke Inn(Hotel Booking)
TRC: { g | Guest(g) b, h (Booking(b) Hotel(h)
b.guestNo = g.guestNo b.hotelNo = h.hotelNo
b.dateFrom TODAY b.dateTo TODAY
h.hotelName = Balke Inn) }

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

12

Exercise 6.3

Exercise 6.3

DRC: { g1, g2, g3 | Guest(g1, g2, g3) h1, h3, b3, b4, b5 (
Hotel(h1, Balke Inn, h3)
Booking(h1, g1, b3, b4, b5) b3 TODAY
b4 TODAY) }
e) List the details of all rooms at the Balke Inn,
including the name of the guest currently staying
in the room, if the room is occupied.
RA: roomNo, type, price, guestName(Room
(hotelName = Balke Inn(Hotel)
dateFrom TODAY dateTo TODAY(Booking)
Guest))
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

TRC:
BalkeRoom(r) Room(r) h (Hotel(h)
h.hotelName = Balke Inn h.hotelNo = r.hotelNo)
CurrentBooking(g, r) Guest(g) b (Booking(b)
b.hotelNo = r.hotelNo b.roomNo = r.roomNo
b.dateFrom TODAY b.dateTo TODAY
b.guestNo = g.guestNo)

{ r.roomNo, r.type, r.price, g.guestName | BR(r)


(CB(g, r) (g CB(g, r) g = (null, null, null, null))) }

13

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Exercise 6.3

7.1 From Theory to Practice

DRC:
BalkeRoom(r1, r2, r3, r4) Room(r1, r2, r3, r4)
h3 Hotel(r2, Balke Inn, h3)
CurrentBooking(g2, r1, r2) g1, g3 (Guest(g1, g2, g3)
b3, b4 (Booking(r2, g1, b3, b4, r1)

b3 TODAY b4 TODAY))

{ r1 r3, r4, g2 | r2 BR(r1, r2, r3, r4)

(CB(g2, r1, r2) (g2 CB(g2, r1, r2) g2 = null)) }


Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

In the early 1970s, the relational model


became a hot topic database research
Based on set theory
A relation is a subset of the
Cartesian product over a list of domains

Early query interfaces for the relational model:


Relational algebra
Tuple relational calculus (SQUARE, SEQUEL)
Domain relational calculus (QBE)

Question: How to build a working database


management system using this theory?
15

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.1 From Theory to Practice

16

7.1 From Theory to Practice


The challenge of the System R project was to
create a working prototype system

System R was the first working prototype of


a relational database system (starting 1973)

Theory is good
But developers were willing to sacrifice theoretical beauty
and clarity for the sake of usability and performance

Most design decisions taken during the


development of System R substantially influenced
the design of subsequent systems

Vocabulary change

Questions

How to store and represent data?


How to query for data?
How to manipulate data?
How do you do all this with good performance?
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

14

17

Mathematical terms were too unfamiliar for most people


Table = relation
Row = tuple
Column = attribute
Data type, domain = domain
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

18

7.1 From Theory to Practice

7.1 From Theory to Practice

Design decisions:
During the development of System R, two major
and very controversial decisions had been made

Duplicates
In a relation, there cannot be any duplicate tuples
Also, query results cannot contain duplicates

Allow duplicate tuples


Allow NULL values

Remember: The relational algebra and relational calculi


all had implicit duplicate elimination

Those decisions are still subject to discussions

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

19

7.1 From Theory to Practice

Decision: Dont eliminate duplicates in results


What about the tables?

You want to query for name and birth year


of all students of TU Braunschweig
The result returns roughly 13,000 tuples
Probably there are some duplicates
Its 1973, and your computer has 16 kilobytes of
main memory and a very slow external storage device
To eliminate duplicates, you need to store the result,
sort it, and scan for adjacent duplicate lines

Again: Ensuring that no duplicates end up in the tables


requires some work
Engineers also concluded that there is actually no need in
enforcing the no-duplicate policy
If the user wants duplicates and is willing to deal with all the
arising problems then thats fine

Decision: Allow duplicates in tables


As a result, the theory underlying relational databases
shifted from set theory to multi-set theory

System R engineers concluded that this effort


is not worth the effect
Duplicate elimination in result sets happens only on-request

Straightforward, only notation is more complicated


21

7.1 From Theory to Practice

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

22

7.1 From Theory to Practice

Sometimes, an attribute value is not known or


an attribute does not apply for an entity

Possible solution:
For each domain, define a value indicating that
data is not available, not known, not applicable,

Example: What value should the attribute


universityDegree take for the entity Heinz Mller,
if Heinz Mller does not have any degree?
Example: You regularly observe the weather and
store temperature, wind strength, and
air pressure every hour and then
your barometer breaks... What now?

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

20

7.1 From Theory to Practice

Practical considerations

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

For example, use none for Heinz Mllers degree,


use 1 for missing pressure data, ...
Problem:
You need such a special value for each domain or use case
You need special failure handling for queries, e.g.
compute average of all pressure values that are not 1

23

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

24

7.1 From Theory to Practice

7.1 From Theory to Practice

Again, system designers chose the simplest solution


(regarding implementation): NULL values

Another tricky problem:


How should users query the DB?
Classical answer:

NULL is a special value which is usable in any domain


and represents that data is just there

Relational algebra and relational calculi


Problem: More and more non-expert users

There are many interpretations of what NULL actually means

Systems have some default rules how to deal with


NULL values

More natural query interfaces:

Aggregation functions usually ignore rows with NULL values


(which is good in most, but not all cases)
Three-valued logic
However, creates some strange anomalies
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

QBE (query by example)


SEQUEL (structured English query language)
SQL: the current standard; derived from SEQUEL
25

7.2 SQUARE & SEQUEL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

26

7.2 SQUARE & SEQUEL

The design of relational query languages

Current paradigms at the time:

Donald D. Chamberlin and Raymond F. Boyce


worked on this task
Both of IBM Research in San Jose, California
Main concern: Querying relational databases
is too difficult with current paradigms

Relational algebra
Requires users to define how and in which order data
should be retrieved
The specific choice of a sequence of operations has an
enormous influence on the systems performance

Relational calculi (tuple, domain)


Provide declarative access to data, which is good
Just state what you want and not how to get it
Relational calculi are quite complex:
many variables and quantifiers

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

27

7.2 SQUARE & SEQUEL

SQUARE is a notation for (or interface to) TRC


No quantifiers, implicit notation of variables
Adds additional functionality needed in practice
(grouping, aggregating, among others)
Solves safety problem by introducing the
closed world assumption

Specifying queries as relational expressions


Based directly on tuple relational calculus
Main observations:
Most database queries are rather simple
Complex queries are rarely needed
Quantification confuses people
Under the closed-world assumption,
any TRC expression with quantifiers can be replaced
by a join of quantifier-free expressions
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

28

7.2 SQUARE & SEQUEL

Chamberlin and Boyces first result was a


query language called SQUARE

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

29

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

30

7.2 SQUARE & SEQUEL

7.2 SQUARE & SEQUEL

Retrieve the names of all female students

Get a list of all exam results better than 2.0


along with the according student name

TRC: { t.name | students(t) t.sex = f }


SQUARE: namestudentssex (f)
Conditions
What part of the
result tuples
should be returned?

Attributes with conditions


The range relation of
the result tuples

Get all exam results better than 2.0 in course 101

TRC:
{ t1.name, t2.result | students(t1) exams(t2)
t1.matNr = t2.matNr t2.result < 2.0 }
SQUARE:
matNr examsresult (<2.0)
name result students matNr

TRC:
{ t.result | exams(t) t.crsNr = 101 t.result < 2.0}

Join of two SQUARE queries

Also, , , and can be used


to combine SQUARE queries.

SQUARE: resultexamscrsNr, result (101, <2.0)


Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

31

7.2 SQUARE & SEQUEL

32

7.2 SQUARE & SEQUEL


In 1974, Chamberlin & Boyce proposed SEQUEL

Also, SQUARE is relationally complete

Structured English Query Language


Based on SQUARE

You do not need explicit quantifiers


Everything you need can be done using
conditions and query combining

Guiding principle:

However, SQUARE was not well received

Use natural English keywords to structure queries


Supports fluent vocalization and notation

Syntax was difficult to read and parse, especially


when using text console devices:
name result students matNr
crsNr result (102, <2.0)

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

matNr exams

SQUAREs syntax is too mathematical and artificial


Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

33

7.2 SQUARE & SEQUEL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

34

7.2 SQUARE & SEQUEL


Get all exam results better than 2.0 for course 101

Fundamental keywords

SQUARE:

SELECT: What attributes should be retrieved?


FROM: What relations are involved?
WHERE: What conditions should hold?

resultexamscrsNr result (101, <

2.0)

SEQUEL:

SELECT result FROM exams


WHERE crsNr = 101 AND result < 2.0

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

35

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

36

7.2 SQUARE & SEQUEL

7.2 SQUARE & SEQUEL

Get a list of all exam results better than 2.0,


along with the according student names

IBM integrated SEQUEL into System R


It proved to be a huge success

SQUARE:
name result

students matNr matNr result examsresult (< 2.0)

SEQUEL:
SELECT name, result
FROM students, exams
WHERE students.matNr = exams.matNr
AND result < 2.0
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Structured query language

Patented in 1985 by IBM

37

38

There are three major classes of DB operations:

Since then, SQL has been adopted by all(?)


relational database management systems
This created a need for standardization:

Defining relations, attributes, domains, constraints, ...


Data definition language (DDL)

Adding, deleting and modifying tuples


Data manipulation language (DML)

1986: SQL-86 (ANSI standard, ISO standard)


SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008
The official pronunciation is es queue el

Asking queries
Often part of the DML

SQL covers all these classes


In this lecture, we will use IBM DB2s SQL dialect

However, most database vendors treat the


standard as some kind of recommendation

DB2 is used in our SQL lab


Similar notation in other RDBMS
(at least for the part of SQL we teach you in this lecture)

More on this later (next lecture)


39

7.3 Overview of SQL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

40

7.3 Overview of SQL


Environment
BigCompany database server

According to the SQL standard, relations and other


database objects exist in an environment
Think environment = RDBMS

Catalog
humanResources

Each environment consists of catalogs


Think catalog = database

Schema
people

Each catalog consists of a set of schemas


Think schema = group of tables (and other stuff)

Schema taxes
...

Catalog
production
Schema products
...

Table staff

A schema is a collection of database objects


(tables, domains, constraints, ...)

Table
has Office

Schema
training
...

Each database object belongs to exactly one schema


Schemas are used to group related database objects
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.3 Overview of SQL

7.2 SQUARE & SEQUEL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Unfortunately, the name SEQUEL already


has been registered as a trademark
by the Hawker Siddeley aircraft company
Name has been changed to SQL (spoken: Sequel)

41

Schema testing
...

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

42

7.3 Overview of SQL

7.3 Overview of SQL

When working with the environment,


users connect to a single catalog and
have access to all database objects in this catalog

After connecting to a catalog, database objects can be


referenced using their qualified name
Qualified name = schemaname.objectname

However, when working only with objects from a


single schema, using unqualified names would be nice

However, accessing/combining data objects from different


catalogs usually is not possible
Thus, typically, catalogs are the maximum scope
over that SQL queries can be issued
In fact, the SQL standard defines an additional layer
in the hierarchy on top of catalogs

Unqualified name = objectname

One schema always is defined to be the default schema


SQL implicitly treats objectname as defaultschema.objectname
The default schema can be set to schemaname as follows:
SET SCHEMA schemaname
Initially, after login: default schema = current user name

Clusters are used to group related catalogs


According to the standard, they provide the maximum scope
However, hardly any vendor supports clusters
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Remember to change the default schema accordingly!


43

7.4 SQL Queries

SQL queries are statements that


retrieve information from a DBMS

Simple SELECT
Joins
Set operations
Aggregation and grouping
Subqueries
Writing good SQL code

Simplest case: From a single table,


return all rows matching some given criterion
SQL queries may return multi-sets (bags) of rows
Duplicates are allowed by default
(but can be eliminated on request)
Even just a single value is a row set
(one row with one column)
However, often its just called a result set

45

7.4 SQL Queries

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

46

7.4 Enforcing Sets in SQL

Basic structure of SQL queries:


SELECT <attribute list>
FROM <table list>
WHERE <condition>

SQL performs duplicate elimination


of rows in result set
May be expensive (due to sorting)
DISTINCT keyword is used

Attribute list: Attributes to be returned (projection)


Table list: All tables involved
Condition: A Boolean expression that
intensionally defines the result set
If no condition is provided, it is implicitly replaced by TRUE

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

44

7.4 SQL Queries

SQL queries

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

47

Example:
SELECT DISTINCT name FROM staff
Returns all different names of staff members,
without duplicates

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

48

7.4 Attribute Names

7.4 Attribute Names


Attribute names are qualified or unqualified

The SELECT keyword is sometimes a bit


confusing, since it actually denotes a projection
To return all attributes under consideration,
the wildcard * may be used
Examples:

Unqualified: Just the attribute name


Only possible, if attribute name is unique among
the tables given in the FROM clause

Qualified: tablename.attributename

SELECT * FROM
Return all attributes of the tables in the FROM clause
SELECT movies.* FROM movies, persons WHERE ...
Return all attributes of the movies table
49

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.4 Attribute Names

The result table of an SQL query has no name


But tables can be given alias names to simplify queries
(also called tuple variables or just aliases)
Indicated by the AS keyword

Example:

Example:

SELECT title, genre


FROM movies AS m, genres AS g
WHERE m.id = g.id

SELECT persons.personName AS name


FROM persons, WHERE name = Smith AND ...
personName is now called name in the result schema
name = Smith means persons.personName = Smith

The AS keyword is optional:


51

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.4 Basic Select

FROM movies m, genres g

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

52

7.4 Conditions
,

ALL
expression

AS

DISTINCT

Usually, SQL queries return exactly those tuples


matching a given search condition

column name

*
table name

Indicated by the WHERE keyword


The condition is a logical expression which can be
applied to each row and may have one of three values
TRUE, FALSE, and NULL

attribute names
,

table names

table name

AS
alias name

( query )

WHERE

50

Table names can be referenced in the same way


as attribute names (qualified or unqualified)
However, renaming works slightly different

Remember the renaming operator from


relational algebra
SQL uses the AS keyword for renaming
The new names can also be used in the WHERE clause

FROM

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.4 Table Names

The the result set/table consists of


the attributes given in the SELECT clause
However, result attributes can be renamed

query block
SELECT

Necessary if tables share identical attribute names


If tables in the FROM clause share identical attribute names
and also identical table names, we need even more
qualification:
schemaname.tablename.attributename

search condition

GROUP BY

Again, NULL might mean unknown, does not apply,


does not matter, ...

column name

,
HAVING

search condition
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

53

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

54

7.4 Conditions

7.4 Conditions
Why TRUE, FALSE, and NULL?

Search conditions are conjunctions of predicates

SQL uses so-called ternary (three-valued) logic


When a predicate cannot be evaluated because it
contains some NULL value, the result will be NULL

Each predicate evaluates to TRUE, FALSE, or NULL

Example: powerStrength > 10 evalutes to NULL


iff powerStrength is NULL
NULL = NULL also evaluates to NULL

AND

OR
search condition

predicate

Handling of NULL by the operators AND and OR:

NOT
(

search condition

AND

TRUE

FALSE

NULL

OR

TRUE

FALSE

NULL

NOT

NULL

TRUE

TRUE

FALSE

NULL

TRUE

TRUE

TRUE

TRUE

TRUE

FALSE

FALSE FALSE FALSE FALSE

FALSE

TRUE

FALSE

NULL

FALSE

TRUE

NULL

TRUE

NULL

NULL

NULL

NULL
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

NULL

55

expression

expression

Predicates usually contain expressions

BETWEEN expression AND expression

Column names or constants


Additionally, SQL provides some special expressions

NOT
expression

IS

NULL
NOT

expression

NOT

,
IN

NOT
EXISTS

LIKE expression

expression

56

7.4 Conditions

comparison

expression

NULL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.4 Conditions
predicate

FALSE

expression

ESCAPE

expression

Functions
CASE expressions
CAST expressions
Scalar subqueries

( query )

( query )

expression

comparison

SOME
ANY
ALL

( query )

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

57

7.4 Conditions

58

7.4 Conditions
Simple comparisons:

Expressions can be combined using


expression operators

Valid comparison operators are


=, <, <=, >=, and >
<> (meaning: not equal)

Arithmetic operators:
+, -, *, and / with the usual semantics

Data types of expressions need to be


compatible (if not, CAST has to be used)
Character values are usually compared lexicographically
(while ignoring case)
Examples:

age + 2
price * quantity

String concatenation ||: (also written as CONCAT )


Combines two strings into one
firstName || || lastname || (aka || alias || )
Hello CONCAT World Hello World

powerStrength > 10
name = Magneto
Magneto <= Professor X

Parenthesis:
Used to modify the evaluation order

expression

(price + 10) * 20
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

59

comparison

expression

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

60

7.4 Conditions

7.4 Conditions

BETWEEN predicate:

IS NULL predicate:

X BETWEEN Y AND Z is a shortcut for


Y <= X AND X =< Z
Note that you cannot reverse the order of Z and Y

The only way to check if a value is NULL or not


Returns either TRUE or FALSE
Examples:

X BETWEEN Y AND Z is different from


X BETWEEN Z AND Y
The expression can never be true if Y > Z

realName IS NOT NULL


powerStrength IS NULL
NULL IS NULL

Examples:
year BETWEEN 2000 AND 2008
score BETWEEN targetScore - 10 AND targetScore + 10
expression

expression

BETWEEN expression AND expression

IS

NULL

NOT

NOT

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

61

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.4 Conditions
match
expression

62

7.4 Conditions

pattern

LIKE expression
NOT

ESCAPE

LIKE predicate:

Examples:

escape
expression

address LIKE %City%


Manhattan FALSE
Gotham City Prison TRUE

The predicate is for matching character strings to patterns


Match expression must be a character string
Pattern expression is a (usually constant) string

name LIKE M%_t_


Magneto TRUE
Matt TRUE
Mtt FALSE

May not contain column names

Escape expression is just a single character


During evaluation, the match expression is compared to
the pattern expression with following additions

status LIKE _/_% ESCAPE /

_ in the pattern expression represents any single character


% represents any number of arbitrary characters
The escape character prevents the special semantics of _ and %

Most modern database nowadays also support more powerful


regular expressions (introduced in SQL-99)

1_inPrison TRUE
1inPrison FALSE
%_% TRUE
%%% FALSE
match
expression

pattern

LIKE expression
NOT

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

63

7.4 Conditions

ESCAPE

escape
expression

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

64

7.4 Conditions

IN predicate:

EXISTS predicate:

Evaluates to true if the value of the test expression is


within a given set of values
Particularly useful when used with a subquery (later)
Examples:

Evaluates to TRUE if a given subquery


returns at least one result row
Always returns either TRUE or FALSE

Examples
EXISTS (SELECT * FROM heroes)

name IN (Magneto, Batman, Dr. Doom)


name IN (SELECT title FROM movies)

EXISTS may also be used to express semi-joins

Those people having a film named after them


,
expression

IN
NOT

expression

EXISTS

( query )

( query )

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

65

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

66

7.4 Conditions

7.5 Joins

SOME/ANY and ALL:

Also, SQL can do joins of multiple tables


Traditionally, this is performed by simply stating
multiple tables in the FROM clause

Compares an expression to each value provided by


a subquery
TRUE if

Result contains all possible combinations of all


rows of all tables such that the search condition holds
If there is no WHERE clause, its a Cartesian product

SOME/ANY: At least one comparison returns TRUE


ALL: All comparisons return TRUE

Examples:
result <= ALL (SELECT result FROM results)
TRUE if the current result is the smallest one

result < SOME (SELECT result FROM results)


TRUE if the current result is not the largest one
expression

comparison

SOME
ANY
ALL

( query )
67

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

68

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.5 Joins

7.5 Joins

Example:

Explicit joins are specified in the FROM clause


SELECT * FROM table1 JOIN table2 ON <join condition>
WHERE <some other condition>

SELECT * FROM hero, hasAlias


hero hasAlias

Often, attributes in joined tables have the same names,


so qualified attributes are needed
INNER JOIN and JOIN are equivalent

SELECT * FROM hero, hasAlias


WHERE hero.id = hasAlias.hero_id
hero id = hero_id hasAlias

Besides this common implicit notation of joins,


SQL also supports explicit joins

Explicit joins improve readability of your SQL code!


INNER JOIN
JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN

joined table

Inner joins
Left/right/full outer joins

table name

69

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

table name

ON

condition

70

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.5 Joins

7.5 Joins
Left outer join: List students and their exam results

Natural join: List students and their exam results

lastName, crsNr, result students exams


SELECT lastName, crsNr, result
FROM students AS s LEFT JOIN exams AS e ON s.matNr = e.MatNr

lastName, crsNr, result students exams


SELECT lastName, crsNr, result
FROM students AS s JOIN exams AS e ON s.matNr = e.matNr
students

students

exams

exams

matNr

firstName

lastName

sex

matNr

crsNr

result

matNr

firstName

lastName

sex

matNr

crsNr

result

1005

Clark

Kent

9876

100

3.7

1005

Clark

Kent

9876

100

3.7

2832

Louise

Lane

2832

102

2.0

2832

Louise

Lane

2832

102

2.0

4512

Lex

Luther

1005

101

4.0

4512

Lex

Luther

1005

101

4.0

5119

Charles

Xavier

1005

100

1.3

5119

Charles

Xavier

1005

100

1.3

lastName, crsNr, result students exams

lastName, crsNr, result students exams

lastName

crsNr

result

lastName

crsNr

result

Kent

100

1.3

Kent

100

1.3

Kent

101

4.0

Kent

101

4.0

Lane

102

2.0

Lane

102

2.0

Luther

NULL

NULL

Xavier

NULL

NULL

We lost Lex Luther and Charles Xavier


because they didnt take any exams!
Also information on student 9876 disappears

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

71

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

72

7.5 Joins

7.5 Joins
Full outer join:

Right outer join:


lastName, crsNr, result students exams

lastName, crsNr, result students exams

SELECT lastName, crsNr, result


FROM students s RIGHT JOIN exams e ON s.matNr = e.MatNr
students

SELECT lastName, crsNr, result


FROM students s FULL JOIN exams e ON s.matNr = e.MatNr
students

exams

exams

matNr

firstName

lastName

sex

matNr

crsNr

result

matNr

firstName

lastName

sex

matNr

crsNr

result

1005

Clark

Kent

9876

100

3.7

1005

Clark

Kent

9876

100

3.7

2832

Louise

Lane

2832

102

2.0

2832

Louise

Lane

2832

102

2.0

4512

Lex

Luther

1005

101

4.0

4512

Lex

Luther

1005

101

4.0

5119

Charles

Xavier

1005

100

1.3

5119

Charles

Xavier

1005

100

1.3

lastName, crsNr, result students exams

lastName

crsNr

result

lastName

crsNr

result

Kent

100

1.3

Kent

100

1.3

Kent

101

4.0

Kent

101

4.0

Lane

102

2.0

Lane

102

2.0

Luther

NULL

NULL

NULL

100

3.7

NULL

100

3.7

Xavier

NULLRelationalNULL
Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

73

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.6 Set Operators

( SELECT course, result FROM exams WHERE course= 100


EXCEPT
SELECT course, result FROM exams WHERE result IS NULL)
UNION
literal table
VALUES (100, 2.3), (100, 1.7)

By default, set operators eliminate duplicates unless


the ALL modifier is used
Sets need to be union-compatible to use set operators
query
query-block

literal table

UNION
INTERSECT
EXCEPT

query-block

ALL

query

literal table

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

student

course

result

9876

100

3.7

2832

102

2.0

course

result

1005

101

4.0

100

3.7

1005

100

NULL

100

2.3

6676

102

4.3

100

1.7

3412

NULL

NULL

75

7.7 Column Function

76

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.7 Column Function


Column functions in the SQL standard:

Column functions are used to perform


statistical computations

MIN, MAX, AVG, COUNT, SUM:


Each of these are applied to some other expression
NULL values are ignored
Function columns in result set just get their column number as name

Similar to aggregate function in relational algebra


Column functions are expressions
They compute a scalar value for a set of values

If DISTINCT is specified, duplicates are eliminated in advance


By default, duplicates are not eliminated (ALL)

COUNT may also be applied to *

Examples:

Simply counts the number of rows

Typically, there are many more column functions available in your


RDBMS (e.g. in DB2: CORRELATION, STDDEV, VARIANCE, ...)

Compute the average score over all exams


Count the number of exams each student has taken
Retrieve the best student
...

column
function
expression

function
name

COUNT(*)
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

exams

Row definition must match (data types)

query

query block

Example:

Set union : UNION


Set intersection : INTERSECT
Set difference : EXCEPT

query

74

7.6 Set Operators

SQL also supports the common set operators

lastName, crsNr, result students exams

77

ALL
expression

DISTINCT

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

78

7.7 Column Function

7.7 Grouping

Examples:

Similar to aggregation in relation algebra,


SQL supports grouping

SELECT COUNT(*) FROM heroes


Returns the number of rows of the heroes table

GROUP BY <column names>


Creates a group for each combination of
distinct values within the provided columns
A query containing GROUP BY can access
non-group-by-attributes only by column functions

SELECT COUNT(name), COUNT(DISTINCT name)


FROM heroes
Returns the number of rows in the heroes table for that
name is not null and the number of non-null unique names

SELECT MIN(strength), MAX(strength),


AVG(strength) FROM powers
Returns the minimal, maximal, and average power strength
in the powers table
79

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.7 Grouping

80

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.7 Grouping

Examples:

Examples:

SELECT course, AVG(result), COUNT(*), COUNT(result)


FROM exams GROUP BY course
For each course, list the average result,
the number of results, and the number of non-null results

SELECT course, AVG(result), COUNT(*)


FROM exams WHERE course IS NOT NULL
GROUP BY course
The where clause is evaluated before the groups are formed!

exams

exams

student

course

Result

9876

100

3.7

2832

102

2.0

1005

101

4.0

1005

100

NULL

6676

102

4.3

3412

NULL

NULL

course

100

3,7

101

4.0

102

3.15

NULL

NULL

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

course

Result

9876

100

3.7

2832

102

2.0

1005

101

4.0

1005

100

NULL

6676

102

4.3

3412

NULL

NULL

81

7.7 Grouping

course

100

3.7

101

4.0

102

3.15

82

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.7 Grouping

Additionally, there may be restrictions on


the groups themselves

Examples:

HAVING <condition>
The condition may involve group properties
Only those groups are created that fulfill the
HAVING condition
A query may have a WHERE and a HAVING clause
Also, it is possible to have HAVING without GROUP BY
Then, the whole table is treated as a single group

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

student

83

SELECT course, AVG(result), COUNT(*)


FROM exams WHERE course <> 100
GROUP BY course HAVING count(*) > 1
exams
student

course

Result

9876

100

3.7

2832

102

2.0

1005

101

4.0

1005

100

NULL

6676

102

4.3

3412

NULL

NULL

course

102

3.15

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

84

7.8 Ordering

7.8 Ordering

As last step in the processing pipeline,


(unordered) result sets may be converted into lists

ORDER BY may order ascending or descending


Default: ascending

Impose an order on the rows


This concludes the SELECT statement
ORDER BY keyword

Ordering on multiple columns possible


Columns used for ordering are
references by their name
Example

Please note:
Ordering completely breaks with set calculus/algebra
Result after ordering is a list, not a (multi-)set!

SELECT * FROM results


ORDER BY student, crsNo DESC
Returns all results ordered by student id (ascending)
If student ids are identical, we sort in descending order
by course number

SELECT statement
query

,
ODER BY

ASC

column name
integer

DESC

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

85

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

7.8 Ordering

7.8 Evaluation Order of SQL

When working with result lists, often only the


top-k results are of interest (e.g. k = 10)
How can we limit the number of result rows?

SQL queries are evaluated in this sequence:


5.
1.
2.
3.
4.
6.
7.

Since SQL:2008, the SQL standard offers the


FETCH FIRST clause (supported e.g. in DB2)
Example:
SELECT name, salary FROM salaries
ORDER BY salary FETCH FIRST 10 ROWS ONLY
FETCH FIRST can also be used without ORDER BY
Get a quick impression of the result set
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

87

7.9 Subqueries

SELECT <attribute list>


FROM <table list>
[WHERE <condition>]
[GROUP BY <attribute list>]
[HAVING <condition>]
[UNION/INTERSECT/EXCEPT <query>]
[ORDER BY <attribute list>]
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

88

7.9 Subqueries

In SQL, you may embed a query block within


a query block (so called subquery, or nested query)
Subqueries are written in parenthesis
Literal subqueries can be used as expressions

Subqueries may either be correlated


or uncorrelated
If the WHERE clause of the inner query uses an
attributes within a table declared in the outer query,
the two queries are correlated

Return just one single value

Subqueries may create the test set for the IN-condition


Subqueries may create the test set for the
EXISTS-condition
Each subquery within the table list creates
a temporary source table

The inner query needs to be re-evaluated for every tuple


in the outer query
This is rather inefficient, so avoid correlated subqueries
whenever possible!

Otherwise, the queries are uncorrelated


The inner queries needs to be evaluated just once

Called inline view


Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

86

89

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

90

7.9 Subqueries

7.9 Subqueries
Expressions:

HasAlias hero_id

Hero

HasPower

Power

SELECT hero.* FROM hero, hasPower p


WHERE hero.id = p.hero_id
AND powerStrenth =
(SELECT MAX(powerStrength) FROM hasPower)

aliasName

id

realName

hero_id

power_id

id

name

Select all those heroes having powers with maximal strength


Uncorrelated! Subquery is evaluated only once

IN-condition:

powerStrength

SELECT * FROM hero WHERE id IN


(SELECT hero_id FROM hasAlias
WHERE aliasName LIKE Super%)
Select all those heroes having an alias starting with Super
Uncorrelated

description

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

91

7.9 Subqueries

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

92

7.10 Writing Good SQL Code

EXISTS-condition:

What is good SQL code?

SELECT * FROM hero h


WHERE EXISTS
(SELECT * FROM hasAlias a WHERE h.id = a.hero_id)

Easy to read
Easy to write
Easy to understand!

Select heroes having at least one alias


Correlated! Subquery is evaluated for each tuple in hero

Inline view
SELECT h.realName, a.aliasName
FROM hasAlias a,
(SELECT * FROM heroes WHERE realName LIKE A%) h
WHERE h.id < 100 AND a.hero_id = h.id

There is no official SQL style guide,


but here are some general hints

Get realNamealias pairs for all heroes with a real name staring
with A and an id smaller than 100
Uncorrelated
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

93

7.10 Writing Good SQL Code

SELECT MOVIE_TITLE
FROM MOVIES
WHERE MOVIE_YEAR = 2009

GOOD

SELECT movie_title
FROM movies
WHERE movie_year = 2009

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

94

7.10 Writing Good SQL Code

1. Write SQL keywords in uppercase,


names in lowercase!

BAD

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

2. Use proper qualification!

BAD

SELECT imdbraw.movies.movie_title,
imdbraw.movies.movie_year
FROM imdbraw.movies
WHERE imdbraw.movies.movie_year = 2009

SET SCHEMA imdbraw

GOOD
95

SELECT movie_title, movie_year


FROM movies
WHERE movie_year = 2009
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

96

7.10 Writing Good SQL Code

7.10 Writing Good SQL Code

3. Use aliases to keep your code short


and the result clear!

BAD

SELECT movie_title, movie_year


FROM movies, genres
WHERE movies.movie_id = genres.movie_id
AND genres.genre = Action

GOOD

SELECT movie_title title, movie_year year


FROM movies m, genres g
WHERE m.movie_id = g.movie_id
AND g.genre = Action
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

4. Separate joins from conditions!

BAD

GOOD
97

7.10 Writing Good SQL Code

GOOD

BAD

SELECT movie_title title, movie_year year


FROM movies m
JOIN genres g ON m.movie_id = g.movie_id
JOIN actors a ON g.movie_id = a.movie_id
WHERE g.genre = Action
AND a.person_name LIKE %Schwarzenegger%
99

7.10 Writing Good SQL Code

7. Avoid subqueries (use joins if possible)!

BAD

GOOD

WITH recent_actors (person_id) AS


(SELECT DISTINCT person_id
FROM actors a
JOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)
SELECT DISTINCT person_name name
FROM directors d
WHERE d.person_id IN (SELECT * FROM recent_actors)

SELECT DISTINCT d.person_name name


FROM directors d
JOIN actors a ON d.person_id = a.person_id
JOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

98

6. Extract uncorrelated subqueries!

SELECT movie_title title, movie_year year


FROM movies m, genres g, actors a
WHERE m.movie_id = g.movie_id
AND g.genre = Action
AND m.movie_id = a.movie_id
AND a.person_name LIKE %Schwarzenegger%

Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

SELECT movie_title title, movie_year year


FROM movies m
JOIN genres g ON m.movie_id = g.movie_id
JOIN actors a ON g.movie_id = a.movie_id
WHERE g.genre = Action
AND a.person_name LIKE %Schwarzenegger%

7.10 Writing Good SQL Code

5. Use proper indentation!

BAD

SELECT movie_title title, movie_year year


FROM movies m, genres g, actors a
WHERE m.movie_id = g.movie_id
AND g.genre = Action
AND m.movie_id = a.movie_id
AND a.person_name LIKE %Schwarzenegger%

101

GOOD

SELECT DISTINCT person_name name


FROM directors d
WHERE d.person_id IN
(SELECT DISTINCT person_id
FROM actors a
JOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)

WITH recent_actors (person_id) AS


(SELECT DISTINCT person_id
FROM actors a
JOIN movies m ON a.movie_id = m.movie_id
WHERE movie_year >= 2007)
SELECT DISTINCT person_name name
FROM directors d
WHERE d.person_id IN (SELECT * FROM recent_actors)
Relational Database Systems 1 Wolf-Tilo Balke Institut fr Informationssysteme TU Braunschweig

100

Das könnte Ihnen auch gefallen