Beruflich Dokumente
Kultur Dokumente
Workshop
Chair(s):
Date:
Room:
Primrose
Level:
Intermediate
Theme:
Pre-requisites:
Format:
Speakers
Description:
before introducing additional concepts. Topics covered included join operators and
methods, column group statistics, and filter factor computations. Optimizer
techniques for data partitioning methods (DPF, multi-dimensional clustering tables
and range-partitioned tables) were outside the workshop scope. A future instalment
of this workshop is intended to cover data partitioning and parallelism. As the LUW
acronym implies, the focus of this workshop was DB2 for midrange platforms (Linux,
Unix, Windows). The workshop covered the following topics:
1. Introduction to the workshop database
Understanding the optimizer and access plan optimization is easier when walking
through the process firsthand. With this in mind we used a sample database during
the presentation to demonstrate some of the optimizer concepts. In this section, we
briefly introduced the workshop audience to the sample database. The database
was similar to the one used in the first instalment of this workshop; however, it was
slightly modified to facilitate the demonstration of column group statistics. We also
discussed the format and scope of the presentation.
2. Optimizer fundamentals (refresher)
Although this years workshop primarily focused on intermediate and advanced
topics, any discussion of the optimizer has to mention its central concepts. To this
end, we briefly reviewed the optimizer fundamentals that were covered in the
Beginner's Guide workshop in 2010. First, we described the optimizer cost model
and the notion of timeron. We explained that when choosing the optimal access
plan, the optimizer considers the I/O and CPU cost of each alternative. Next, we
talked about the phases of the optimizer and the trade-off between the compilation
time and the degree of optimization. Finally, we touched upon table and index
statistics (cardinality and frequency distributions), and why their accuracy is critical
to calculating the cost of access plan. We also briefly mentioned indexes, the lack of
which is the most common cause of sub-optimal SQL in addition to outdated catalog
statistics.
3. Join operators and methods
Join is one of the pivotal concepts of query access plan optimization as almost any
query in a data warehouse environment references more than one table. Depending
on the attributes of join predicates and the join cost as estimated based on catalog
statistics, the optimizer chooses one of the following join methods: nested-loop,
merge or hash join. In this unit we talked about these joins methods, predicates and
operators. We discussed the pros and cons of each join method, and the impact of
various predicate types. Moreover, we examined the strategies used by the
optimizer for selecting optimal joins (greedy and dynamic programming join
enumerations). We highlighted the potential performance penalties resulting from
choosing a sub-optimal join method. Although optimizer profiles are outside the
scope of this presentation, we briefly outlined how it is possible to override the
optimizers join selection.
4. Column group statistics
The optimizer relies on cardinality estimates to choose an optimal access plan.
Unless trained otherwise, the optimizer treats all predicates as independent of each
other even when they are statistically correlated. This could lead the optimizer to
underestimate cardinality and select a sub-optimal plan. Column group statistics
enable the optimizer to consider statistical correlation between predicates. The
optimizer uses multi-column statistics to determine the combined cardinality and
then adjusts the estimate to account for correlation between columns. With this
mind, we discussed the purpose and benefits of column group statistics. We
described why statistics collected on a group of related columns allow the optimizer
to more accurately compute cardinality estimates for queries that reference those
columns. We also explored when and how to collect column group statistics using
Agenda:
Workshop Speaker(s)
Malcolm Singh
Institution
Bio
Topic
Zoran Kulina
Institution
Bio
Topic