DB2 Query Optimization Class 3 - Art PDF

DB2
The Art of Query Optimization
Copyright 2006 - IBM Corporation - Systems and Technology Group
Query Processing Overview
DB2
SELECTING
Table scan
Table scan or probe via bitmap or RRN list
Index scan or probe
JOINING
Index scan or probe
Hash table probe
Sorted list probe
GROUPING
Index scan or probe
Hash table scan
ORDERING
Index scan or probe
Sorted list scan
DB2
Selection Processing and Optimization
Selection Optimization
DB2
The query optimizer's goal is to eliminate I/O as soon as possible

Local selection usually occurs before any other process
Indexes and column stats are used for statistics

A good indexing strategy provides the optimizer and DB engine
with multiple methods and strategies
A poor indexing strategy eliminates methods and strategies,
causing more I/O
Different methods and strategies may be chosen based on the
expected selectivity of the query
Selection implementation methods
Table scan
Index scan or probe
DB2
Join Processing and Optimization
Joins Common Terms

Term
DB2
Meaning
Join Position
Join Dial
Position in which table is being joined.

Same as Join Position.
Join Order
The order of all of the tables used to process the

join. (Dial1 Dial2 Dial3 Dial4)
Average Duplicates
Average number of rows for each distinct value.

Statistic derived from an index or SQE column
stat.
Join Fan-out / Fan-in
The number of join combinations that can be

expected for each join value.
The join increases or decreases the number of
rows returned.
DB2
Join Types and Execution Flow

Table B
Cascade
Table A
Table C
Table D
Join order: B,C,A,D
*
Dial 3
Dial 1
Dial 2
Dial 4
...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D

WHERE A.KEY=D.KEY and B.KEY=C.KEY and C.KEY=A.KEY...
Table C
Table B
Star
or
1 to many
Table A
Dial 2
Join order: B,C,A,D
*
Table D
Dial 3
Dial 1
...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D

WHERE B.KEY1=A.KEY and B.KEY2=C.KEY and B.KEY3=D.KEY...
Dial 4
DB2
Join Order
Assume average duplicates of 1 for headers and 3 for details...
SELECT *
FROM ORDER_HEADERS A,
ORDER_DETAILS B
WHERE A.ORDER_KEY = B.ORDER_KEY
AND CUSTOMER_NO = 112358
What
What is
is the
the total
total cost
cost
of
of selecting
selecting from,
from, and
and
joining
joining to,
to, each
each dial?
dial?
Headers
Details
Join Order 1
(A, B)
Join fan out is 1 to 3
Dial 1
Join Order 2
(B, A)
Join fan out is 1 to 1
Dial 2
Details
Headers
Dial 1
Dial 2
DB2
Join Support for SQL

Joins act like "expensive" selection
Must perform I/O to test, instead of logically eliminating data
For left outer and exception join, tables are joined from left to right
Right outer joins are implemented by conversion into left outer joins
Index only access not available for CQE
For inner join, optimizer not biased toward using specified join
order
Must use QAQQINI file settings to influence join order chosen by optimizer
FORCE_JOIN_ORDER set to *YES, *SQL, *PRIMARY can force join order
Not recommended
Multiple join types supported for a single query

Join implementation methods
Nested loop via index
Nested loop via hash table
Nested loop via sorted list
DB2
Nested Loop Join via Index

Step 1
Step 2
Step 3
Select row
and build
key
Position into
index
Random read
row from
table
Table 2
Index 3
Table 3
What
Whattype
typeof
ofaccess
accessis
isused
used
for
forTable
Table33and
andTable
Table1...?
1...?
Step 4
Step 5
Build key and

position into
index
Random
read row
from table
Index 1
Table 1
Repeat
Steps 2 - 3
SELECT *
FROM TABLE_1,
TABLE_2,
TABLE_3
WHERE FKEY1 = PKEY3
AND FKEY2 = PKEY3
until key not found
Repeat
Steps 4 - 5
until key not found
Nested Loop Join via Index
DB2
Creating a Temporary Index for the Join Criteria

CQE
SQE (V5R4)
If an index over the join columns of the secondary table does

not exist, one is created.
Advantages:
Since local selection might be performed ahead of the join (during index
creation) the temporary index generally is smaller so there are less index
pages to be faulted in
Table access is tuned automatically (SQE)
Disadvantages:
Creating a temporary index is very CPU and resource intensive
If index is built using host variable selection, then the query is not
reusable
Join via Multiple Key Column Probe
DB2
Join is processed using index probe on equal selection

predicates just like local selection
SELECT EMPLOYEE.NAME, EMPLOYEE.TITLE
FROM EMPLOYEE,
SALES
WHERE EMPLOYEE.EMPNUM = 112358
AND EMPLOYEE.EMPNUM
= SALES.EMPNUM join predicate
AND SALES.STORE
= 13
non-join predicate
AND SALES.DATE
= '2004/03/01'
non-join predicate
Index with three keys: STORE, DATE, and EMPNUM (in any
order) can be used to satisfy both the join and non-join selection
predicates on the table SALES in one query step.
DB2
Nested Loop Join via Hash

Table 2
IX 2
Keyed
selection
via index
Hash Table
Probe
hash
tables
Table 1
Hashing
Algorithm
Part 1
Hashing
Algorithm
Select rows
and build hash
tables
Only data that

meets selection
criteria
Hashing
Algorithm
Hash Table
Part 2
Select Table 1
rows and join
to hash tables
Result
Skip seq
selection
via EVI
and bitmap
EVI 3
Bitmap
Table 3
DB2
Join Optimization
Join Optimization
DB2
The main optimization strategy for a join query is the reordering

of the tables
This minimizes the join fanout and that in turn minimizes I/Os
Reordering of tables is allowed only on inner joins

Left Outer or Exception joins cannot be reordered
The DB2 UDB for iSeries optimizer uses a greedy join
algorithm to determine the most efficient table order
Both CQE and SQE have additional join order optimizations
Largest primary
Largest secondary
Star schema join
Query rewrite strategies can minimize the effects of a poor join

order
Join Optimization Tips
DB2
Join query tuning

At a minimum, make sure there are radix indexes built over all the
join columns
Might have indexes built over both join columns and selection
columns, this allows for multi-key joins
Create EVIs over foreign key columns to take advantage of index
ANDing via dynamic bitmaps and RRN lists
Star schema join
Look-ahead predicate generation (LPG)
Indexes are used to provide the optimizer statistics, as well as

implementation methods
Join Optimization Tips
DB2
Transitive closure over join predicates

The CQE query optimizer does not do transitive closure over join
predicates
The query optimizer does transitive closure over selection predicates
For example:
given: A.JCOL1 = B.JCOL1 and A.JCOL1 = 123
the query optimizer does determine that B.JCOL1 = 123
Duplicating the join predicates over all tables might help CQE
For example:
given: A.JCOL1 = B.JCOL1 and B.JCOL1 = C.JCOL1
the query optimizer does not determine that A.JCOL1 = C.JCOL1
Thus, code all combinations (A=B, B=C, A=C)
The SQE query optimizer does provide transitive closure over

join predicates
This is a form of query rewrite
Join Optimization Summary
DB2
Nested loop join via index
Multi-key join: probe index with both selection and join columns
Temporary index creation (CQE)
Typically small table to large table
Parallelism (SQE only)
Nested loop join via hash
No radix index required or available

Typically large table to small table
Not used inside a UNION (CQE)
Parallelism
Nested loop join via sorted list

SQE only
Used when join condition is an inequality and an index does not exist
Hash join is only possible with equal condition
Mixture of join methods may be used in the same SQL request

DB2
Grouping Processing and Optimization
Group-By Optimization
DB2
The query optimizer chooses between index grouping and hash

grouping to aggregate data
Indexes and column stats are used for grouping statistics
(number of groups, number of rows per group)
Keys over grouping column(s)
Average duplicates statistics
Determine the hash table size and number of hash points
Query attributes which may affect which method is used

Index Group by
First I/O
ALWCPYDTA(*NO), ALWCPYDTA(*YES)
Hash Group by
All I/O
ALWCPYDTA(*OPTIMIZE)
10
DB2
Index Group-by - Min / Max Skipping

CREATE INDEX X1 ON SALES
(STATE, SALES) <---- ascending
:
SELECT STATE, MIN(SALES)
FROM SALES
WHERE STATE IN ( 'ARIZONA', 'CALIFORNIA')
GROUP BY STATE
Index
Index probe
probe on
on ARIZONA
ARIZONA
and
and the
the minimum
minimum or
or first
first value
value
for
for SALES
SALES by
by traversing
traversing the
the
radix
radix tree
tree in
in ascending
ascending order,
order,
then
then on
on to
to California
California
STATE
Alabama
Alabama
Alabama
Alaska
Alaska
Alaska
Alaska
Arizona
Arizona
Arizona
Arizona
Arizona
Arkansas
Arkansas
Arkansas
California
California
SALES
110.00
150.00
375.00
10.00
55.00
120.00
400.00
50.00
80.00
210.00
360.00
540.00
5.00
25.00
90.00
30.00
75.00
CUSTOMER
Jones
Smith
Doe
Johnson
Smith
Alexander
Lee
White
Doe
Brown
Jacobson
Milligan
Weatherby
Smith
Pippen
Lee
Wayne
DB2
Index Group-by - Min / Max Skipping

CREATE INDEX X2 ON SALES
(STATE, SALES DESC) <---- descending (CQE)
:
SELECT STATE, MAX(SALES)
FROM SALES
WHERE STATE IN ( 'ARIZONA', 'CALIFORNIA')
GROUP BY STATE
Index
Index probe
probe on
on ARIZONA
ARIZONA
and
and the
the maximum
maximum or
or first
first value
value
for
for SALES
SALES by
by traversing
traversing the
the
radix
radix tree
tree in
in descending
descending order,
order,
then
then on
on to
to California
California
STATE
Alabama
Alabama
Alabama
Alaska
Alaska
Alaska
Alaska
Arizona
Arizona
Arizona
Arizona
Arizona
Arkansas
Arkansas
Arkansas
California
California
SALES
375.00
150.00
110.00
400.00
120.00
55.00
10.00
540.00
360.00
210.00
80.00
50.00
90.00
25.00
5.00
75.00
30.00
CUSTOMER
Jones
Smith
Doe
Johnson
Smith
Alexander
Lee
White
Doe
Brown
Jacobson
Milligan
Weatherby
Smith
Pippen
Lee
Wayne
11
Grouping Optimization
DB2
Columns in equal predicates of WHERE clause can be implicitly

added to or eliminated from GROUP BY
GROUP BY columns allowed to "move around"
Allows index to be used for both selection and grouping
Unique indexes used to guarantee one result record
GROUP BY can be ignored allowing more optimization possibilities
Only if columns in equal predicates compose a unique key
DB2
Ordering Processing and Optimization
12
Ordering Optimization
DB2
The query optimizer chooses between index ordering and sort

Optimizer costs the use of each method and picks the fastest
Query attributes affect which method is used
Index Order by
First I/O
ALWCPYDTA(*NO), ALWCPYDTA(*YES)
Sort
All I/O
ALWCPYDTA(*OPTIMIZE)
Ordering Optimization
DB2
Columns in equal predicates of WHERE clause can be implicitly

added to or eliminated from ORDER BY
Allows index to be used for both selection and ordering
Unique indexes used to guarantee one result record
ORDER BY can be ignored allowing more optimization possibilities
Only if columns in equal predicates compose a unique key
13
DB2
Putting it all together
The Big Picture
DB2
Query
Query plan
plan is
is executed
executed
from
from left
left to
to right,
right,
bottom
bottom to
to top
top
Start here
14
A Closer Look
DB2
If the join result is

"true", then proceed up
the tree
DB2
Lab 5 - Putting it all together
15
DB2
Look-ahead Predicate Generation
Look-ahead Predicate Generation (LPG)
DB2
A strategy to generate local selection predicates for one table,

from one or more other tables
Limited, directed use by CQE
Naturally considered by SQE (as of V5R3)
Using the (generated) local selection predicates, more options

are available for data access and data processing
Minimizes the effects of a poor join order
Opportunity for additional indexing
Can a have a very positve affect on query performance!
An example of query rewrite
16

Select
from small_table s,
big_table b
where (s.store = 'Store 1'
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
s.store = 'Store 4')
and s.storekey = b.storekey
Generated local
selection
provided by
optimizer
Generate list of
distinct key
values based
on the join
column
LPG
Big_Table
Small_Table
Storekeys
00001
00002
00003
00004
Storekey
Storekey
Join Condition
DB2
Store
Dial 2
Dial 1
Local selection
provided by query
No local selection
provided by query

Select
from small_table s,
big_table b
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
DB2
Query
Rewrite
with LPG
LPG
Select
from small_table s,
big_table b
or
s.store = 'Store 2'
or
s.store = 'Store 3'
or
and b.storekey in (00001, 00002, 00003, 00004)
17
DB2
Results
Results
Join
Join
Big 1
Temp IX
Hash 2
Hash 1
Small 2
Small 2
Big 1
Local
select
Local
select
Local
select
Star / Snowflake Schema Join via LPG
DB2
Assist in identifying a narrow range of rows in a large fact table

When the query specifies no local selection on the fact table
Enhancement for multidimensional queries using Star or

Snowflake Schema database model(s)
Fact table(s) supported by Dimension table(s)
Selection on fact table is derived from local selection on some or all the
dimension tables (LPG - look-ahead predicate generation)
Select a narrow range of rows in the fact table by finding the intersection
of some or all of the dimension table's local selection and join keys
Leverages...
Specific join order (fact table as dial 1, joined to dimension table(s) CQE)
Hash join
Skip sequential or clustered I/O access method
EVIs and dynamic bitmaps
Symmetric Multiprocessing and database parallelism
ibm.com/servers/enable/site/education/abstracts/16fa_abs.html
18
DB2
Analyzing and Tuning Queries
The Process
DB2
Proactive and reactive analysis

There are really only two reactive methods to analyze and
optimize a query:
Model the Environment
In this method you reduce the sizes of the tables so you can better
experiment with different implementations
Reduces exposure to production data and workload
Does not accurately reflect the statistical information found from the
normal test environment
Feedback
Determine what index you believe should be used for the query and then
create that index to see if the Query Optimizer will choose the new index
Exposes production data and workloads to changes
Reflects actual behavior based upon production data
These methods are used as part of an iterative process

19
DB2
Art
What can be done...?
Change the request or SQL coding
Change the design or influence the implementation
Database design
Tuning "knobs" and indexes
Upgrade OS to obtain new features
Change the resource performance

Work Management
Additional or upgraded hardware
SMP and database parallelism
Change the response time expectations
DB2
Art
Environments
A few long running or complex requests
Dedicate all resources
SMP database parallelism
Highly tuned
Many quick, small or medium ad-hoc requests

Share resources (like OLTP)
Little or no SMP database parallelism
Unpredictable - no opportunity to tune many requests
Mixture
Separate environments
Separate systems or logical partitions
20
Art - Consider the entire request
DB2
SELECTING
Table scan
Index scan or probe
JOINING
Index scan or probe
Hash table probe
Sorted list probe
GROUPING
Index scan or probe
Hash table scan
ORDERING
Index scan or probe
Sorted list scan
21

DB2 Query Optimization Class 3 - Art PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DB2 Query Optimization Class 3 - Art PDF

Hochgeladen von

Copyright:

Verfügbare Formate

DB2

The Art of Query Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Query Processing Overview

Copyright 2006 - IBM Corporation - Systems and Technology Group

Selection Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

The query optimizer's goal is to eliminate I/O as soon as possible

Indexes and column stats are used for statistics

Join Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

Joins Common Terms

Position in which table is being joined.

The order of all of the tables used to process the

Average number of rows for each distinct value.

Join Fan-out / Fan-in

The number of join combinations that can be

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Types and Execution Flow

Join order: B,C,A,D

...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D

Join order: B,C,A,D

...FROM TABLEA A, TABLEB B, TABLEC C, TABLED D

Copyright 2006 - IBM Corporation - Systems and Technology Group

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Support for SQL

Multiple join types supported for a single query

Nested Loop Join via Index

Build key and

until key not found

Copyright 2006 - IBM Corporation - Systems and Technology Group

Nested Loop Join via Index

Creating a Temporary Index for the Join Criteria

If an index over the join columns of the secondary table does

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join via Multiple Key Column Probe

Join is processed using index probe on equal selection

Nested Loop Join via Hash

Only data that

Copyright 2006 - IBM Corporation - Systems and Technology Group

Copyright 2006 - IBM Corporation - Systems and Technology Group

The main optimization strategy for a join query is the reordering

Reordering of tables is allowed only on inner joins

Query rewrite strategies can minimize the effects of a poor join

Join Optimization Tips

Join query tuning

Indexes are used to provide the optimizer statistics, as well as

Copyright 2006 - IBM Corporation - Systems and Technology Group

Join Optimization Tips

Transitive closure over join predicates

The SQE query optimizer does provide transitive closure over

Join Optimization Summary

Nested loop join via index

Nested loop join via hash

No radix index required or available

Nested loop join via sorted list

Mixture of join methods may be used in the same SQL request

Grouping Processing and Optimization

Copyright 2006 - IBM Corporation - Systems and Technology Group

The query optimizer chooses between index grouping and hash

Query attributes which may affect which method is used

Copyright 2006 - IBM Corporation - Systems and Technology Group

Index Group-by - Min / Max Skipping