Sie sind auf Seite 1von 4

DB2 Optimizer: Beyond the Basics

Workshop
Chair(s):

Zoran Kulina,Malcolm Singh

Date:

Tuesday, November 4th (Afternoon)

Room:

Primrose

Level:

Intermediate

Theme:

Information Management Technologies

Pre-requisites:

Basic knowledge of SQL and relational database concepts is desired. Some


familiarity with DB2 optimizer is beneficial, although not required as optimizer
fundamentals will be reviewed.

Format:

Speakers

Description:

What is DB2 optimizer?


The DB2 optimizer is the query compiler component responsible for selecting
optimal access plan for SQL statements. An access plan specifies an order of
operations required to execute a statement. We can loosely compare the optimizer
to a GPS unit. A GPS device uses information in its database to calculate the fastest
path from point A to B. It evaluates thousands of combinations to determine the best
route. Similarly, the optimizer uses cost-based model and table statistics to generate
the execution path that uses the least amount of system resources to execute the
given query (and this usually means the fastest execution time). The optimizer has
come a long way since the first release of DB2 UDB. It has become a complex
component that offers many options to manipulate the selection of SQL access
plans. The goal of this workshop was to discuss the optimizer concepts and present
techniques for influencing its behaviour for the purpose of improving SQL
performance.
Why study the optimizer?
Query execution times and resource usage weigh in heavily on the overall database
server performance. As the demand for data increases, so will the pressure to keep
database performance at acceptable levels. Allocating additional computing
resources or procuring more powerful hardware to satisfy the growing application
needs is not always the available or effective course of action. It is no longer
sufficient to write SQL statements that just get the job done. Developers must be
able to code efficient queries. Likewise, database administrators are expected to be
familiar with the mechanics of data access and are often called upon to tune poorly
performing SQL. Understanding how the optimizer works and how to influence its
behaviour, and knowing how queries access data empowers application developers
and database administrators to write better SQL and improve query performance.
This is especially important in data warehousing environments that are running
resource-intensive OLAP queries.
What was this workshop about?
This presentation workshop was a continuation of DB2 LUW Optimizer: A Beginner's
Guide, first presented at CASCON 2010. Whereas the 2010 workshop was
delivered in the hands-on format, this one was presented as a speaker-and-demo
session. The workshop objective was to go beyond the basics and provide
participants with more in-depth knowledge of the optimizer concepts. It was also
intended to serve as a stepping stone for further study of advanced optimizer
techniques, which will be covered by a future instalment of this workshop. The
workshop briefly revisited optimizer basics, initially explored in the Beginner's Guide,

before introducing additional concepts. Topics covered included join operators and
methods, column group statistics, and filter factor computations. Optimizer
techniques for data partitioning methods (DPF, multi-dimensional clustering tables
and range-partitioned tables) were outside the workshop scope. A future instalment
of this workshop is intended to cover data partitioning and parallelism. As the LUW
acronym implies, the focus of this workshop was DB2 for midrange platforms (Linux,
Unix, Windows). The workshop covered the following topics:
1. Introduction to the workshop database
Understanding the optimizer and access plan optimization is easier when walking
through the process firsthand. With this in mind we used a sample database during
the presentation to demonstrate some of the optimizer concepts. In this section, we
briefly introduced the workshop audience to the sample database. The database
was similar to the one used in the first instalment of this workshop; however, it was
slightly modified to facilitate the demonstration of column group statistics. We also
discussed the format and scope of the presentation.
2. Optimizer fundamentals (refresher)
Although this years workshop primarily focused on intermediate and advanced
topics, any discussion of the optimizer has to mention its central concepts. To this
end, we briefly reviewed the optimizer fundamentals that were covered in the
Beginner's Guide workshop in 2010. First, we described the optimizer cost model
and the notion of timeron. We explained that when choosing the optimal access
plan, the optimizer considers the I/O and CPU cost of each alternative. Next, we
talked about the phases of the optimizer and the trade-off between the compilation
time and the degree of optimization. Finally, we touched upon table and index
statistics (cardinality and frequency distributions), and why their accuracy is critical
to calculating the cost of access plan. We also briefly mentioned indexes, the lack of
which is the most common cause of sub-optimal SQL in addition to outdated catalog
statistics.
3. Join operators and methods
Join is one of the pivotal concepts of query access plan optimization as almost any
query in a data warehouse environment references more than one table. Depending
on the attributes of join predicates and the join cost as estimated based on catalog
statistics, the optimizer chooses one of the following join methods: nested-loop,
merge or hash join. In this unit we talked about these joins methods, predicates and
operators. We discussed the pros and cons of each join method, and the impact of
various predicate types. Moreover, we examined the strategies used by the
optimizer for selecting optimal joins (greedy and dynamic programming join
enumerations). We highlighted the potential performance penalties resulting from
choosing a sub-optimal join method. Although optimizer profiles are outside the
scope of this presentation, we briefly outlined how it is possible to override the
optimizers join selection.
4. Column group statistics
The optimizer relies on cardinality estimates to choose an optimal access plan.
Unless trained otherwise, the optimizer treats all predicates as independent of each
other even when they are statistically correlated. This could lead the optimizer to
underestimate cardinality and select a sub-optimal plan. Column group statistics
enable the optimizer to consider statistical correlation between predicates. The
optimizer uses multi-column statistics to determine the combined cardinality and
then adjusts the estimate to account for correlation between columns. With this
mind, we discussed the purpose and benefits of column group statistics. We
described why statistics collected on a group of related columns allow the optimizer
to more accurately compute cardinality estimates for queries that reference those
columns. We also explored when and how to collect column group statistics using

the runstats utility and indexes.


5. Filter factor
Filter factor is defined as a fractional number between 0 and 1 that represents the
estimated proportion of rows in a table for which a predicate is true. We can also
think of filter factor as the percentage of rows allowed to filter through a given
predicate, as estimated by the optimizer. In this unit we discussed how the optimizer
computes filter factor and how it is related to cardinality estimates. We further
examined the negative impact of inaccurate filter factor calculations on access plan
selection. We also explained how to use the selectivity override technique to
override the optimizers filter factor computations.

Agenda:

1. Introduction to the workshop database


2. Optimizer fundamentals (refresher)
3. Join operators and methods
4. Column group statistics
5. Filter factor

Workshop Speaker(s)

Malcolm Singh
Institution

IBM Canada Ltd.

Bio

Malcolm Singh is a Software Development Analyst at the IBM Canada Lab. He


works in the Information Management division within the IBM Software Group.
Malcolm started his career at IBM as an intern working on DB2 for Linux,
UNIX, and Windows (DB2 LUW). After graduating he returned to IBM to
continue working on DB2 LUW, where he gained extensive knowledge of both
DB2 LUW and Database Theory. His focus is now directed towards IBM's Data
Warehousing Solutions, which includes IBM PureData Systems for Analytics
(PureData System for Operation Analytics and PureData System for
Analytics.)

Topic
Zoran Kulina
Institution

IBM Canada Ltd.

Bio

Zoran Kulina is a technical consultant in the field of database software


development and data center management. Zoran has extensive background
in relational database technology (predominantly DB2 on Unix), both as a
software engineer and database administrator. Prior to taking up the current
assignment at IBM Global Services, Zoran has spent over a decade at IBM
Software Group in multiple development and technical support roles. Zoran is
a former member of IBM's DB2 development and support teams. He has also
participated in the development, integration and delivery of large database
applications in the finance industry and public sector. Zoran's current focus is
on distributed data warehousing systems, with emphasis on database
performance tuning, disaster recovery planning and high-availability
enablement.

Topic

Das könnte Ihnen auch gefallen