Sie sind auf Seite 1von 8

Implementation Techniques for Relational Database Systems

Assignment 2

Q1. Create a selection query whose chosen plan is a file scan.


Query :
explain analyse select * from course;
Query Plan :
"Seq Scan on course (cost=0.00..4.00 rows=200 width=35)
(actual time=0.008..0.021 rows=200 loops=1)"
"Planning Time: 0.035 ms"
"Execution Time: 0.077 ms"
Explnation :
In this case, we are not using any where clause for fetching tuples from relation
“course”. So the query traverses through all the records in all the files. Here sequential
scan is preferred over index scan because an index scan requires several IO operations
for each row (look up the row in the index, then retrieve the row from the heap).
Whereas a sequential scan only requires a single IO for each row - or even less because
a block (page) on the disk contains more than one row, so more than one row can be
fetched with a single IO operation.

Q2. Create a selection query whose chosen plan uses a bit-map index scan. You can
create indices on appropriate relation attributes to create such a case.
Query :
create index i1 on takes(year);
explain analyse select course_id from takes where year=2010;
drop index i1;
Query Plan :
Bitmap Heap Scan on takes (cost=60...339.27 rows=3194
width=4) (actual time=0.937..1.056 rows=3194 loops=1)
Recheck Cond: (year = '2010'::numeric)
Heap Blocks: exact=220"
-> Bitmap Index Scan on i1 (cost=0.00..61.06 rows=3194
width=0) (actual time=0.900..0.972 rows=3194 loops=1)
Index Cond: (year = '2010'::numeric)
Planning Time: 0.552 ms
Execution Time: 2.364 ms

Explanation :
First, the index is created on the ‘year’ attribute of the ‘takes’ relation. This query fetches
all the ‘course_id’ which were offered in the year 2010. So with the help of index on
attribute ‘year’, the bitmap scan got all the matching tuples with ‘year’ = 2010 faster as
compared to sequential scan because bitmap scan skips those pages which does not have
tuples having ‘year’ = 2010.

3) Create a query where PostgreSQL chooses a (plain) index nested loops join.
(NOTE: the nested loops operator has 2 children. The first child is the outer input,
and it may have an index scan or anything else, that is irrelevant. The second child
must have an index scan or bitmap index scan, using an attribute from the first
child.)
Query:
explain analyse
select * from student
where student.id in
(select instructor.id from instructor
where dept_name='Comp. Sci.');

Query Plan
"Nested Loop (cost=0.29..61.29 rows=2 width=24) (actual time=0.043..0.041 rows=2
loops=1)"
-> Seq Scan on instructor (cost=0.00..1.43rows=2 width=5) (actual time=0.040..0.044
rows=2 loops=1)"
Filter: ((dept_name))::text = ‘Comp.Sci.’::text)
Rows Removed by Filter: 48
-> Index Scan using student_pkey on student (cost=0.28..8.29 rows=1 width=24)
(actual time=0.019..0.026 rows=1 loops=2)"
Index Cond: ((id))::text = (instructor.id)::text)
"Planning Time: 1.644 ms"
"Execution Time: 0.133 ms"

Explanation:
The sequential scan is used by this query on outer input ‘instructor’ relation whereas
index scan is used on inner input ‘student’ relation. This query uses nested loop join
operation to match all the tuples having same ‘id’ attribute in both ‘instructor’ and
‘student’ relation.

Q4. Create an index as below, and see the time taken:create index i1 on takes(id,
semester, year); Similarly see how long it takes to drop the above index using: drop
index i1;
Query:
create index i1 on takes(id, semester, year);
Time: 174 msec
Query: drop index i1;
Time: 39 msec

Q5. Create a table takes2 with the same schema as takes but no primary keys or
foreign keys. Find how long it takes the execute the query insert into takes2 select *
from takes Also, find the query plan for the above insert statement.
Query :
create table takes2
(ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4,0),
grade varchar(2)
);
insert into takes2 select * from takes

Query Time:
Execution Time :
Create Table: 310.532 ms
Insert Into Table: 115.902 ms

Query Plan:
Insert on takes2 (cost=0.00..462.00 rows=30000 width=24) (actual
time=79.204..67.778 rows=0 loops=1)
-> Seq Scan on takes (cost=0.00..462.00 rows=30000 width=24) (actual
time=0.042..5.872 rows=30000 loops=1)
Planning Time: 0.247 ms
Execution Time: 79.433 ms

Q6) Next drop the table takes2 (and its rows, as a result), and create it again, but
this time with a primary key. Run the insert again and measure how long it takes to
run. Give its query plan, and explain why the time taken is different this time.
Query:
create table takes
(ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4,0),
grade varchar(2),
primary key (ID, course_id, sec_id, semester, year)
);
insert into takes2 select * from takes

Query Time:
Execution Time :
Create Table: 203.424 ms
Insert Into Table: 400.252 ms

Query Plan :
Insert on takes2 (cost=0.00..660.00 rows=30000 width=24) (actual
time=360.545..360.545 rows=0 loops=1)
-> Seq Scan on takes (cost=0.00..660.00 rows=30000 width=24) (actual
time=0.034..6.926 rows=30000 loops=1)
Planning Time: 0.198 ms
Execution Time: 390.447 ms

Explanation:
The given query is taking more time as compared to query in Q6 because this relation is
having a primary key and every time when we are inserting a tuple in it , it is checking
primary key constraint.

Q7) Consider the following nested subquery: select count(*) from course c where
not exists (select * from takes t where t.course_id=c.course_id)
What is the plan is chosen by PostgreSQL. Report the plan and actual execution
costs. Explain what is happening?
Query:
explain analyse select count(*) from course c where not exists (select * from
takes t where t.course_id = c.course_id);

Query Plan:
Aggregate (cost=254.37..254.38 rows=1 width=8) (actual time=325.593..325.593
rows=1 loops=1)
-> Nested Loop Anti Join (cost=0.29..254.08 rows=115 width=0) (actual
time=1.401..319.440 rows=115 loops=1)
-> Seq Scan on course c (cost=0.00..4.00 rows=200 width=4) (actual
time=0.005..0.093 rows=200 loops=1)
-> Index Only Scan using takes_pkey on takes t (cost=0.29..237.10 rows=353
width=4) (actual time=1.595..1.595 rows=0 loops=200)
Index Cond: (course_id = (c.course_id)::text)"
Heap Fetches: 85
Planning Time: 0.241 ms
Execution Time: 322.55 ms

Explanation:
It finds the number of courses which has been taken by anystudent. The inner query
finds in the takes table a course id(Scanned on index). The outer query finds the count of
those course id's which are present in the takes table(Sequential scan). For the presence
of not exists and an outer and inner query a nested loop anti join happens (as it requires
no tuples to be present in the inner query).

Q8) As above, but with the query select count(*) from course c where c.course_id
not in (select course_id from takes t)
Query :
explain analyse select count(*) from course c where c.course_id not in (select
course_id from takes t)
Query Plan:
Aggregate (cost=546.55..459.98 rows=1 width=8) (actual time=180.255..180.255
rows=1 loops=1)
-> Seq Scan on course c (cost=565.00..569.60 rows=100 width=0)
(actual time=156.240..156.345 rows=115 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 85
SubPlan 1
->Seq Scan on takes t (cost=0.00..510.00 rows=30000 width=4) (actual
time=45.965..85.851 rows=30000 loops=1)
Planning Time: 306.054 ms
Execution Time: 165.786 ms

Explanation:
In the given query, for every tuples in outer relation ‘course’, the inner query is getting
all ‘course_id’ from the ‘takes’ relation using sequential scan on the relation ‘takes’ and
it is also getting all the ‘course_id’ from relation ‘course’ by sequentially scaning the
outer relation ‘course’. The final output of this query is the count of all those courses
which exists in relation ‘course’ but not in relation ‘takes’.

Das könnte Ihnen auch gefallen