Sie sind auf Seite 1von 21

DB2

The Art of Query Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Query Processing Overview

DB2

SELECTING
Table scan
Table scan or probe via bitmap or RRN list
Index scan or probe

JOINING
Index scan or probe
Hash table probe
Sorted list probe

GROUPING
Index scan or probe
Hash table scan

ORDERING
Index scan or probe
Sorted list scan

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Selection Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Selection Optimization

DB2

The query optimizer's goal is to eliminate I/O as soon as possible


Local selection usually occurs before any other process

Indexes and column stats are used for statistics


A good indexing strategy provides the optimizer and DB engine
with multiple methods and strategies
A poor indexing strategy eliminates methods and strategies,
causing more I/O
Different methods and strategies may be chosen based on the
expected selectivity of the query
Selection implementation methods
Table scan
Table scan or probe via bitmap or RRN list
Index scan or probe
Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Join Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Joins Common Terms


Term

DB2

Meaning

Join Position
Join Dial

Position in which table is being joined.


Same as Join Position.

Join Order

The order of all of the tables used to process the


join. (Dial1 Dial2 Dial3 Dial4)

Average Duplicates

Average number of rows for each distinct value.


Statistic derived from an index or SQE column
stat.

Join Fan-out / Fan-in

The number of join combinations that can be


expected for each join value.
The join increases or decreases the number of
rows returned.

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Join Types and Execution Flow


Table B

Cascade

Table A

Table C

Table D

Join order: B,C,A,D

*
Dial 3

Dial 1

Dial 2

Dial 4

...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D


WHERE A.KEY=D.KEY and B.KEY=C.KEY and C.KEY=A.KEY...

Table C

Table B

Star
or
1 to many

Table A

Dial 2

Join order: B,C,A,D

*
Table D
Dial 3

Dial 1

...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D


WHERE B.KEY1=A.KEY and B.KEY2=C.KEY and B.KEY3=D.KEY...

Dial 4

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Join Order
Assume average duplicates of 1 for headers and 3 for details...
SELECT *
FROM ORDER_HEADERS A,
ORDER_DETAILS B
WHERE A.ORDER_KEY = B.ORDER_KEY
AND CUSTOMER_NO = 112358

What
What is
is the
the total
total cost
cost
of
of selecting
selecting from,
from, and
and
joining
joining to,
to, each
each dial?
dial?

Headers

Details

Join Order 1
(A, B)
Join fan out is 1 to 3
Dial 1

Join Order 2
(B, A)
Join fan out is 1 to 1

Dial 2
Details

Headers

Dial 1

Dial 2

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Join Support for SQL


Joins act like "expensive" selection
Must perform I/O to test, instead of logically eliminating data

For left outer and exception join, tables are joined from left to right
Right outer joins are implemented by conversion into left outer joins
Index only access not available for CQE

For inner join, optimizer not biased toward using specified join
order
Must use QAQQINI file settings to influence join order chosen by optimizer
FORCE_JOIN_ORDER set to *YES, *SQL, *PRIMARY can force join order
Not recommended

Multiple join types supported for a single query


Join implementation methods
Nested loop via index
Nested loop via hash table
Nested loop via sorted list
Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Nested Loop Join via Index


Step 1

Step 2

Step 3

Select row
and build
key

Position into
index

Random read
row from
table

Table 2

Index 3

Table 3

What
Whattype
typeof
ofaccess
accessis
isused
used
for
forTable
Table33and
andTable
Table1...?
1...?

Step 4

Step 5

Build key and


position into
index

Random
read row
from table

Index 1

Table 1

Repeat
Steps 2 - 3
SELECT *
FROM TABLE_1,
TABLE_2,
TABLE_3
WHERE FKEY1 = PKEY3
AND FKEY2 = PKEY3

until key not found

Repeat
Steps 4 - 5
until key not found

Copyright 2006 - IBM Corporation - Systems and Technology Group

Nested Loop Join via Index

DB2

Creating a Temporary Index for the Join Criteria


CQE
SQE (V5R4)

If an index over the join columns of the secondary table does


not exist, one is created.
Advantages:
Since local selection might be performed ahead of the join (during index
creation) the temporary index generally is smaller so there are less index
pages to be faulted in
Table access is tuned automatically (SQE)

Disadvantages:
Creating a temporary index is very CPU and resource intensive
If index is built using host variable selection, then the query is not
reusable

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join via Multiple Key Column Probe

DB2

Join is processed using index probe on equal selection


predicates just like local selection
SELECT EMPLOYEE.NAME, EMPLOYEE.TITLE
FROM EMPLOYEE,
SALES
WHERE EMPLOYEE.EMPNUM = 112358
AND EMPLOYEE.EMPNUM
= SALES.EMPNUM join predicate
AND SALES.STORE
= 13
non-join predicate
AND SALES.DATE
= '2004/03/01'
non-join predicate

Index with three keys: STORE, DATE, and EMPNUM (in any
order) can be used to satisfy both the join and non-join selection
predicates on the table SALES in one query step.
Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Nested Loop Join via Hash


Table 2

IX 2
Keyed
selection
via index

Hash Table
Probe
hash
tables

Table 1

Hashing
Algorithm

Part 1
Hashing
Algorithm

Select rows
and build hash
tables

Only data that


meets selection
criteria

Hashing
Algorithm

Hash Table

Part 2
Select Table 1
rows and join
to hash tables

Result

Skip seq
selection
via EVI
and bitmap

EVI 3

Bitmap

Table 3

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Join Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Optimization

DB2

The main optimization strategy for a join query is the reordering


of the tables
This minimizes the join fanout and that in turn minimizes I/Os

Reordering of tables is allowed only on inner joins


Left Outer or Exception joins cannot be reordered
The DB2 UDB for iSeries optimizer uses a greedy join
algorithm to determine the most efficient table order
Both CQE and SQE have additional join order optimizations
Largest primary
Largest secondary
Star schema join

Query rewrite strategies can minimize the effects of a poor join


order
Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Optimization Tips

DB2

Join query tuning


At a minimum, make sure there are radix indexes built over all the
join columns
Might have indexes built over both join columns and selection
columns, this allows for multi-key joins
Create EVIs over foreign key columns to take advantage of index
ANDing via dynamic bitmaps and RRN lists
Star schema join
Look-ahead predicate generation (LPG)

Indexes are used to provide the optimizer statistics, as well as


implementation methods

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Optimization Tips

DB2

Transitive closure over join predicates


The CQE query optimizer does not do transitive closure over join
predicates
The query optimizer does transitive closure over selection predicates
For example:
given: A.JCOL1 = B.JCOL1 and A.JCOL1 = 123
the query optimizer does determine that B.JCOL1 = 123
Duplicating the join predicates over all tables might help CQE
For example:
given: A.JCOL1 = B.JCOL1 and B.JCOL1 = C.JCOL1
the query optimizer does not determine that A.JCOL1 = C.JCOL1
Thus, code all combinations (A=B, B=C, A=C)

The SQE query optimizer does provide transitive closure over


join predicates
This is a form of query rewrite
Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Optimization Summary

DB2

Nested loop join via index

Multi-key join: probe index with both selection and join columns
Temporary index creation (CQE)
Typically small table to large table
Parallelism (SQE only)

Nested loop join via hash

No radix index required or available


Typically large table to small table
Not used inside a UNION (CQE)
Parallelism

Nested loop join via sorted list


SQE only
Used when join condition is an inequality and an index does not exist
Hash join is only possible with equal condition

Mixture of join methods may be used in the same SQL request


Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Grouping Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Group-By Optimization

DB2

The query optimizer chooses between index grouping and hash


grouping to aggregate data
Indexes and column stats are used for grouping statistics
(number of groups, number of rows per group)
Keys over grouping column(s)
Average duplicates statistics
Determine the hash table size and number of hash points

Query attributes which may affect which method is used


Index Group by
First I/O
ALWCPYDTA(*NO), ALWCPYDTA(*YES)

Hash Group by
All I/O
ALWCPYDTA(*OPTIMIZE)

Copyright 2006 - IBM Corporation - Systems and Technology Group

10

DB2

Index Group-by - Min / Max Skipping


CREATE INDEX X1 ON SALES
(STATE, SALES) <---- ascending
:
SELECT STATE, MIN(SALES)
FROM SALES
WHERE STATE IN ( 'ARIZONA', 'CALIFORNIA')
GROUP BY STATE

Index
Index probe
probe on
on ARIZONA
ARIZONA
and
and the
the minimum
minimum or
or first
first value
value
for
for SALES
SALES by
by traversing
traversing the
the
radix
radix tree
tree in
in ascending
ascending order,
order,
then
then on
on to
to California
California

STATE
Alabama
Alabama
Alabama
Alaska
Alaska
Alaska
Alaska
Arizona
Arizona
Arizona
Arizona
Arizona
Arkansas
Arkansas
Arkansas
California
California

SALES
110.00
150.00
375.00
10.00
55.00
120.00
400.00
50.00
80.00
210.00
360.00
540.00
5.00
25.00
90.00
30.00
75.00

CUSTOMER
Jones
Smith
Doe
Johnson
Smith
Alexander
Lee
White
Doe
Brown
Jacobson
Milligan
Weatherby
Smith
Pippen
Lee
Wayne

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Index Group-by - Min / Max Skipping


CREATE INDEX X2 ON SALES
(STATE, SALES DESC) <---- descending (CQE)
:
SELECT STATE, MAX(SALES)
FROM SALES
WHERE STATE IN ( 'ARIZONA', 'CALIFORNIA')
GROUP BY STATE

Index
Index probe
probe on
on ARIZONA
ARIZONA
and
and the
the maximum
maximum or
or first
first value
value
for
for SALES
SALES by
by traversing
traversing the
the
radix
radix tree
tree in
in descending
descending order,
order,
then
then on
on to
to California
California

STATE
Alabama
Alabama
Alabama
Alaska
Alaska
Alaska
Alaska
Arizona
Arizona
Arizona
Arizona
Arizona
Arkansas
Arkansas
Arkansas
California
California

SALES
375.00
150.00
110.00
400.00
120.00
55.00
10.00
540.00
360.00
210.00
80.00
50.00
90.00
25.00
5.00
75.00
30.00

CUSTOMER
Jones
Smith
Doe
Johnson
Smith
Alexander
Lee
White
Doe
Brown
Jacobson
Milligan
Weatherby
Smith
Pippen
Lee
Wayne

Copyright 2006 - IBM Corporation - Systems and Technology Group

11

Grouping Optimization

DB2

Columns in equal predicates of WHERE clause can be implicitly


added to or eliminated from GROUP BY
GROUP BY columns allowed to "move around"
Allows index to be used for both selection and grouping
Unique indexes used to guarantee one result record
GROUP BY can be ignored allowing more optimization possibilities
Only if columns in equal predicates compose a unique key

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Ordering Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

12

Ordering Optimization

DB2

The query optimizer chooses between index ordering and sort


Optimizer costs the use of each method and picks the fastest
Query attributes affect which method is used
Index Order by
First I/O
ALWCPYDTA(*NO), ALWCPYDTA(*YES)

Sort
All I/O
ALWCPYDTA(*OPTIMIZE)

Copyright 2006 - IBM Corporation - Systems and Technology Group

Ordering Optimization

DB2

Columns in equal predicates of WHERE clause can be implicitly


added to or eliminated from ORDER BY
Allows index to be used for both selection and ordering
Unique indexes used to guarantee one result record
ORDER BY can be ignored allowing more optimization possibilities
Only if columns in equal predicates compose a unique key

Copyright 2006 - IBM Corporation - Systems and Technology Group

13

DB2

Putting it all together

Copyright 2006 - IBM Corporation - Systems and Technology Group

The Big Picture

DB2

Query
Query plan
plan is
is executed
executed
from
from left
left to
to right,
right,
bottom
bottom to
to top
top

Start here

Copyright 2006 - IBM Corporation - Systems and Technology Group

14

A Closer Look

DB2

If the join result is


"true", then proceed up
the tree

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Lab 5 - Putting it all together

Copyright 2006 - IBM Corporation - Systems and Technology Group

15

DB2

Look-ahead Predicate Generation

Copyright 2006 - IBM Corporation - Systems and Technology Group

Look-ahead Predicate Generation (LPG)

DB2

A strategy to generate local selection predicates for one table,


from one or more other tables
Limited, directed use by CQE
Naturally considered by SQE (as of V5R3)

Using the (generated) local selection predicates, more options


are available for data access and data processing
Minimizes the effects of a poor join order
Opportunity for additional indexing
Can a have a very positve affect on query performance!
An example of query rewrite

Copyright 2006 - IBM Corporation - Systems and Technology Group

16

Look-ahead Predicate Generation (LPG)


Select
from small_table s,
big_table b
where (s.store = 'Store 1'
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
s.store = 'Store 4')
and s.storekey = b.storekey

Generated local
selection
provided by
optimizer

Generate list of
distinct key
values based
on the join
column

LPG

Big_Table

Small_Table

Storekeys
00001
00002
00003
00004

Storekey
Storekey

Join Condition

DB2

Store

Dial 2

Dial 1

Local selection
provided by query

No local selection
provided by query
Copyright 2006 - IBM Corporation - Systems and Technology Group

Look-ahead Predicate Generation (LPG)


Select
from small_table s,
big_table b
where (s.store = 'Store 1'
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
s.store = 'Store 4')
and s.storekey = b.storekey

DB2

Query
Rewrite
with LPG

LPG

Select
from small_table s,
big_table b
where (s.store = 'Store 1'
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
s.store = 'Store 4')
and s.storekey = b.storekey
and b.storekey in (00001, 00002, 00003, 00004)

Copyright 2006 - IBM Corporation - Systems and Technology Group

17

DB2

Look-ahead Predicate Generation (LPG)

Results

Results

Join

Join

Big 1

Temp IX

Hash 2

Hash 1

Small 2

Small 2

Big 1

Local
select

Local
select

Local
select

Copyright 2006 - IBM Corporation - Systems and Technology Group

Star / Snowflake Schema Join via LPG

DB2

Assist in identifying a narrow range of rows in a large fact table


When the query specifies no local selection on the fact table

Enhancement for multidimensional queries using Star or


Snowflake Schema database model(s)
Fact table(s) supported by Dimension table(s)
Selection on fact table is derived from local selection on some or all the
dimension tables (LPG - look-ahead predicate generation)
Select a narrow range of rows in the fact table by finding the intersection
of some or all of the dimension table's local selection and join keys

Leverages...

Specific join order (fact table as dial 1, joined to dimension table(s) CQE)
Hash join
Skip sequential or clustered I/O access method
EVIs and dynamic bitmaps
Symmetric Multiprocessing and database parallelism

ibm.com/servers/enable/site/education/abstracts/16fa_abs.html
Copyright 2006 - IBM Corporation - Systems and Technology Group

18

DB2

Analyzing and Tuning Queries

Copyright 2006 - IBM Corporation - Systems and Technology Group

The Process

DB2

Proactive and reactive analysis


There are really only two reactive methods to analyze and
optimize a query:
Model the Environment
In this method you reduce the sizes of the tables so you can better
experiment with different implementations
Reduces exposure to production data and workload
Does not accurately reflect the statistical information found from the
normal test environment

Feedback
Determine what index you believe should be used for the query and then
create that index to see if the Query Optimizer will choose the new index
Exposes production data and workloads to changes
Reflects actual behavior based upon production data

These methods are used as part of an iterative process


Copyright 2006 - IBM Corporation - Systems and Technology Group

19

DB2

Art
What can be done...?
Change the request or SQL coding
Change the design or influence the implementation
Database design
Tuning "knobs" and indexes
Upgrade OS to obtain new features

Change the resource performance


Work Management
Additional or upgraded hardware
SMP and database parallelism

Change the response time expectations

Copyright 2006 - IBM Corporation - Systems and Technology Group

DB2

Art
Environments
A few long running or complex requests
Dedicate all resources
SMP database parallelism
Highly tuned

Many quick, small or medium ad-hoc requests


Share resources (like OLTP)
Little or no SMP database parallelism
Unpredictable - no opportunity to tune many requests

Mixture
Separate environments
Separate systems or logical partitions

Copyright 2006 - IBM Corporation - Systems and Technology Group

20

Art - Consider the entire request

DB2

SELECTING
Table scan
Table scan or probe via bitmap or RRN list
Index scan or probe

JOINING
Index scan or probe
Hash table probe
Sorted list probe

GROUPING
Index scan or probe
Hash table scan

ORDERING
Index scan or probe
Sorted list scan

Copyright 2006 - IBM Corporation - Systems and Technology Group

21

Das könnte Ihnen auch gefallen