Sie sind auf Seite 1von 69

IBM STG Technical Conference

System and Technology Group

DB2 UDB for iSeries: V5R4 Overview


v

www-03.ibm.com/servers/eserver/iseries/db2/

Simona Pacchiarini Simona_Pacchiarini@it.ibm.com


based on material by

Mike Cain - DB2 for i5/OS Center of Competency Doug Mack - DB2 for i5/OS Product Marketing Kent Milligan - DB2 for i5/OS Technology Team

2007 IBM Corporation 2006 IBM Corporation

DB2 for i5/OS Strategic Initiatives


Openness - Industry Standard Support Accomodate ISVs Portability/Compatibility Flexibility Continued LEADERSHIP in database technologies Consistency across DB2 family Shared R & D across IBM Labs Continued Leveraging of iSeries Strengths Availability Scalability Usability - Total Cost of Ownership Application Flexibility
The industry trend toward offthe-shelf software results in a move to SQL ...
100% 80% 60% 40% 20% 0% 1988

iSeries Customers using SQL

1992

1996

2000

2004

DB2 for i5/OS Microsoft SQL Server 2000 Oracle 10g 0 10 20 30 40 50 60 70 80 90 100

2003 Core Items

SQL 2003 Core Standard 100% Complete with V5R4!!


2006 IBM Corporation 2

This chart shows the DB2 UDB for iSeries strategic directions. As you progress through this course, you will be able to relate the V5R4 enhancements back to these strategic initiatives. SQL is the industry standard for database access and programming. While the heritage of application development on the iSeries platform has been to use native interfaces, more and more application developers are using SQL as a programming language as the result of the DB2 UDB for iSeries strategic investment in SQL. You can see the trend toward greater use of SQL in the upper graph on this chart. SQL is the strategic database interface for DB2 UDB for iSeries to accommodate all the porting of third-party applications and tools, a. Almost all of which third-party solutions utilize SQL-based interfaces for their data access. As you can see on the lower right-hand graph, DB2 UDB for iSeries actually is the first DBMS to fully comply with the Core level of the SQL 2003 Standard. The beauty of the IBM DB2 UDB family is that we can all share from this, and DB2 UDB for iSeries will continue to leverage the technology leadership position of IBM and maintain close compatibility with the other DB2 UDB family members. Additionally, DB2 UDB for iSeries will continue to leverage and build on the strengths the iSeries. Those strengths include reliability, /availability, total cost of ownership, scalability, and usability.

2003 ANS/ISO Standard Limits


Database Constructs Length of an identifier (in bytes) Length of CHARACTER datatype (in bytes) Decimal precision of NUMERIC datatype Decimal precision of DECIMAL datatype Decimal precision of INTEGER datatype Decimal precision of SMALLINT datatype Binary precision of FLOAT datatype Binary precision of REAL datatype Binary precision of DOUBLE PRECISION datatype Columns in a table Values in an INSERT statement Set clauses in an UPDATE statement Length of a row Columns in a UNIQUE constraint Length of a UNIQUE constraint Length of foreign key column list Columns in a GROUP BY clause Length of GROUP BY column list Sort specifications in ORDER BY clause Length of ORDER BY column list Columns in a referential integrity constraint Tables referenced in a SQL statement Cursors simultaneously open Items in a SELECT list FIPS 18 240 15 15 9 4 20 20 30 100 100 20 2000 6 120 120 6 120 6 120 6 15 10 100 No limit
OPEN_CURSORS

Oracle 10g 30 2000 38 38 38 38 126 63 126 1000 1000 1000 2 gigabytes 32


.5 DB_BLOCK_SIZE .5 DB_BLOCK_SIZE

Microsoft SQL Server 2005 128 8000 38 38 10 5 53 24 53 1024 1024 1024 8 megabytes 16 900 900 Limited by length 8060 Limited by length 8060 16 256 2 gig 4096

DB2 for i5/OS V5R4 128 32766 63 63 10 5 53 24 53 8000 8000 8000 3.5 gigabytes 120 32K 32K 120 32K 10000 32K 120 1000 No limit 8000

DB2 for LUW V9 128 254 31 31 10 5 53 24 53 1012 1012 1012 2 gigabytes 64 8192 8192 1012 32677 1012 32677 64 No limit No limit 1012

DB2 for z/OS V9 128 255 31 31 10 5 53 21 or 24 53 750 750 750 2 gigabytes 64 2000 2000 750 4000 750 4000 64 225 No limit 750

255
DB_BLOCK_SIZE

255
DB_BLOCK_SIZE

32

1000

Largest limit
2006 IBM Corporation 3

What makes DB2 for i5/OS unique?


"Heritage" Objects Library Physical File
Member

"SQL" Objects Schema (Collection) Table Index View


Create Table (SQL) Create DB Object Access DB Object Access Data with an SQL Interface (SQL, ODBC, JDBC, OpenQueryFile)

Logical File
Create Physical File (Data Descr. Specifications) Access Data with Native HLL I/O

DB2
64 Bit Power4 RISC Single Level Storage
IOP IOP IOP IOP IOP IOP IOP

Cust #1

Cust #2

Cust #3

Cust #4

Cust #5

Cust #6

Cust #7

2006 IBM Corporation

DB2 for i5/OS has many programming and operational interfaces available for application developers, DBAs, and operations/support personnel. However, the database engine is still the same regardless of the interface being used. With this architecture, you actually have MORE flexibility in how you use the database and what skills are available in the marketplace to help develop or support. DDS (Data Description Specifications) provide one option to define database tables and indexes. RPG is a common programming language in the System i world. These interfaces, commonly referred to as native interfaces, were in place before SQL became an industry standard and continue to be used heavily in a System i environment. DB2 for i5/OS also supports SQL interfaces, and you can create tables and indexes through SQL, generally considered the open standard for database development. DB2 for i5/OS also supports many different high level languages (e.g., Cobol, JAVA), 4GL and CASE tools, and web or client/server based interfaces (JDBC, ODBC, Microsofts .NET framework) to allow a tremendous amount of flexibility while still leveraging the underlying advantages of the single RDBMS!

DB2 for i5/OS as the database server of choice


I5/OS JDBC CLI* Embedded DRDA Native/DDM i5/OS

AIX JDBC DRDA iSeries Toolbox JDBC DB2 Connect DB2 Information Integrator iSeries Toolbox JDBC DB2 Connect iSeries ODBC iSeries Access DB2 Connect DB2 Information Integrator

Linux JDBC ODBC DRDA Windows JDBC ODBC OLE DB .NET DRDA
* i5/OS PASE only supports CLI

DB2 for I5/OS

2006 IBM Corporation

Any interface allowing SQL can be used to interface DB2 for i5/OS

Etherogenous Data Integration: WebSphere Federation Server

Enterprise Warehouse

DataJoiner FedSVR
Operational Data

Local Data Mart

Heterogeneous Database Access and Data Replication External Data read, write, update, insert, delete White Paper: http://www-1.ibm.com/servers/enable/site/education/ibo/record.html?hetdata Web site: http://www-306.ibm.com/software/data/integration/
2006 IBM Corporation 6

Data Integration is a MAJOR issue with companies many customers nowadays have data spread across a variety of different format sources. We can provide them with an IBM solution that will make our data mgmt friends happy, and that is WebSphere (DB2) Federation Server. With this product, I could write an RPG application that accesses data in SQLServer or Oracle.i.e., thus deploying the great i5/OS power also towards data originated on other sources/platforms.

What does DB2 i5/OS have that NO ONE ELSE has ?


Self Configuring
No Installation required No Tablespace, data partitioning, management of buffers No Data Hot Spots Automatic leveraging of Virtualization Engine Technologies
QUERY SQL M E M O R Y

Self Healing
Systems Managed Access Path Protection Self Managed Database Logging Auto Index Rebalancing Database Locking Issues
IOP IOP
Cust #1 Cust #2

Single Level Storage


IOP IOP IOP IOP IOP

Cust #3

Cust #4

Cust #5

Cust #6

Cust #7

Self Tuning
Auto Tuner, Sophisticated Cost Based Query Optimizer Automatic Index Advisor, Auto Stats Collection

Self Protecting
Object Based Kernel in i5/OS Refer to Virus got you down?
http://skyviewpartners.com/java-skyviewp/pdf/Virus-Got-You-Down.pdf

"DB2 UDB for iSeries: Autonomic Computing FAQs SystemSales and PartnerInfo DB2 UDB for iSeries Sales Kit
2006 IBM Corporation 7

So what makes DB2 UDB running in i5/OS so unique? The key benefit of this DB2 family member is its ability to leverage the self-managing, or autonomic computing features of i5/OS. Because DB2 is built into i5/OS, it takes advantage of architectural benefits of System i, including Single Level Store, the object based implementation of i5/OS, and strong I/O architecture through the use of I/O processors. It is for these reasons that historically the AS/400 could tout no DBA required. While this is really still true in many cases, even in the most demanding situation, the amount of DBA effort required is 1/3rd of other database management systems. The most demanding situations that will require DBA skills for DB2 for i5/OS includes heavy SQL based

Ease of Database Management


Typical DBA Tasks
Manage DASD Space Allocation Review Table Space Allocations & Extents Review & Balance Indexes Application Rebinding Maintain Database Integrity Update Database Statistics Synchronized OS & DB User Security Reload Data for Hardware & Software Upgrades Load Data into Data Base Build and Manage DB Backup & Recovery Create & Review Indexes for Tables Performance Analysis & Tuning (DB & System) Create and Maintain DB Schema Automated DB Performance Profiling Advanced DB Performance Analysis & Tuning Data Replication & Consolidation

DB2 UDB for iSeries


Automated Automated Automated Automated Automated Automated Automated Automated Parallel Data Import/Export iSeries Navigator iSeries Navigator iSeries Navigator iSeries Navigator & 3rd Party Tools (ERwin, PowerDesigner, Embarcedero,etc.) insure/SQL iSeries Navigator - insure/SQL Multiple IBM & 3rd Party Products

2006 IBM Corporation

Availability of DBA skills on the marketplace is a subject that requires some further investigation. For instance, keep in mind that (a) many typical DBA tasks required in other database products are not required on DB2 for i5/OS, (b) DBAs have many skills that are transferable to DB2 for i5/OS, and (c) the DBA community has evolved because of the need for this specific skill on other database management systems. As noted, many of the skills of a DBA are common across all relational databases. This includes knowledge of SQL, relational data modeling, indexing, and query optimization. With the exception of query optimization, learning these skills on DB2 for i5/OS is a matter of learning new user interfaces. Online tutorials exist for this via the DB2 for i5/OS home page: ibm.com/iseries/db2. Query optimization on DB2 for i5/OS will require some education for the DBA . IBM has developed a four day class called DB2 for i5/OS SQL Performance Tuning and Monitoring Workshop. For details, refer to: ibm.com/iseries/service/igs/db2performance.html. Lastly, it is important to note that many of the tasks to manage other databases are not required with DB2 for i5/OS. These tasks may include creation and monitoring of tablespaces, logs, and cache area disk usage, re-balancing of indexes, running utilities to gather statistics, and partitioning of data. The System i architecture and sophisticated cost-based optimizer of DB2 for i5/OS automates these processes.

DB2 for i5/OS V5R4 Enhancements


Application Flexibility & Portability Free Format RPG & SQL Enhanced SQL Standards support Scalar FullSelect Standard Flagger Improved DB2 Family Compatibility Recursive Expressions OLAP Support RowNumber & Rank ANS Timestamp Format TDES Encryption 2 MB SQL Statements & 32K keys OnDemand & Availability Cross-Reference Files Robustness Automatic Journaling Enhancements SMAPP for EVIs Parallel Rollback Unlock Performance SQL Query Engine enhancements LIKE & LOB Support Sensitive Cursors Autonomic Indexes Faster SQL Procedural Language Enhanced MQT Optimization Faster XML Extenders Usability iSeries Navigator Enhancements DB2 On Demand Performance Center SQE Plan Cache Analyzer DB2 Health Center Resource Governor Governor Exit Point DB2 Content Manager

2006 IBM Corporation

In V5R4, DB2 for i5/OS continues its position as a leader in ease of use and integration. V5R4 contains many new capabilities that continue to make DB2 UDB for iSeries a database that is simple to use for programmers, analysts, and administrators. iSeries Navigator delivers the most simplification with the addition of the DB2 On Demand Performance Center that will dramatically simplify SQL performance analysis for database analysts and administrators. iSeries Programmers will also be able to lead a simpler life with a richer SQL toolbox. SQL programmers will be able to solve a wider range of business problems with additions such as Recursive SQL, Triple DES encryption, and the integration of SQL with Free Format RPG syntax. DB2 for i5/OS, already proven to be a leader in data warehousing through recent benchmarks, is enhancing the SQL capabilities it has to tackle the problems that affect businesses today. With the addition of OLAP functions, Recursive SQL and improved Materialized Query Table support, the database can now take on an even broader range of workloads. Performance and scalability are improved on a number of fronts boosted by the SQL Query Engine (SQE) handling more DB2 requests and its Autonomic Index capability. SQL Procedures, functions and triggers will receive a significant performance boost with V5R4.

DB2 for i5/OS: footprints

http://www-03.ibm.com/servers/eserver/iseries/db2/sqe.html http://www-03.ibm.com/servers/enable/site/education/abstracts/4aea_abs.html
2006 IBM Corporation 10

V5R1 Database Architecture


ODBC/ /JDBC JDBC/ /ADO ADO/ /DRDA DRDA/ /XDA XDA ODBC Network Network

HostServer Server Host Static Native Native (RecordI/O) I/O) (Record


Compiled embedded statements

CLI/ /JDBC JDBC CLI Dynamic


Prepare every time

Extended Dynamic
Prepare once and then reference

SQL SQL Optimizer Optimizer

The optimizer and database engine are separated at different layers of the operating system

SLIC

DB2UDB UDB DB2 (DataStorage Storage& &Management) Management) (Data

2006 IBM Corporation

11

To fully understand the implementation of query management and processing in DB2 UDB for iSeries on i5/OS V5R2 and subsequent releases, it is important to see how the queries were implemented in releases of i5/OS previous to V5R2. The figure above shows a high-level overview of the architecture of DB2 UDB for iSeries before i5/OS V5R2. The optimizer and database engine are implemented at different layers of the operating system. The interaction between the optimizer and the database engine occurs across the Machine Interface (MI).

V5R2, V5R3, and V5R4 Database Architecture


ODBC/ /JDBC JDBC/ /ADO ADO.NET .NET/ /DRDA DRDA/ /XDA XDA ODBC Network Network

HostServer Server Host Static Native Native (RecordI/O) I/O) (Record


Compiled embedded statements

CLI/ /JDBC JDBC CLI Dynamic


Prepare every time

Extended Dynamic
Prepare once and then reference

SQL SQL Optimizer

The optimizer and database engine merged to form the SQL Query Engine and much of the work was moved to SLIC

SLIC SLIC

DB2 UDB (Data Storage & Management)

2006 IBM Corporation

12

This figure shows an overview of the DB2 for i5/OS architecture on i5/OS V5R3 and V5R4, and where each SQE component fits. Some components have been move at a lower level, below the Machine Interface.

V5R2, V5R3 and V5R4 Database Architecture

Optimizer
Query Dispatcher CQE Optimizer SQE Optimizer

DB2 UDB (Data Storage & Management)


SLIC SQE Optimizer CQE Database Engine SQE Primitives SQE Statistics Manager

The optimizer and database engine merged to form the SQL Query Engine and much of the work was moved to SLIC

2006 IBM Corporation

13

The functional separation of each SQE component is clearly evident. In line with design objectives, this division of responsibility enables IBM to more easily deliver functional enhancements to the individual components of SQE, as and when required. Notice that most of the SQE Optimizer components are implemented below the MI. This translates into enhanced performance efficiency. A new component, the Query Dispatcher, has been introduced. This is in charge of dispatching a query to the traditional CQE optimizer or to the new one, SQE, according to the clauses and conditions of the request.

The Query Dispatcher


Determines which engine will optimize and process the query request
Only SQL queries are considered for the SQL Query Engine OPNQRYF, QUERY, Query API do NOT use the SQL Query Engine

Initial step for all query optimization that occurs on the system The use of SQE is being phased in over several releases
functionalities may depend on level of fixes (DB Group PTFs) installed

V5R2

V5R2+

V5R3

V5R4

SQL Dispatched to SQE

2006 IBM Corporation

14

The function of the Dispatcher is to route the query request to either CQE or SQE, depending on the attributes of the query. All queries are processed by the Dispatcher and you cannot bypass it. Currently, the Dispatcher will route an SQL statement to CQE if it find that the statement references or contains any of the following: INSERT WITH VALUES statement or the target of an INSERT with subselect statement NLSS or CCSID translation between columns Lateral correlation Logical files Datalink columns Tables with Read Triggers User-defined table functions Read-only queries with more than 1000 dataspaces or updateable queries with more than 256 dataspaces. DB2 Multisystem tables non-SQL queries, for example the QQQQry API, Query/400, or OPNQRYF

The Query Dispatcher


V5R2

SQE

Read only SQL (subSELECT of an INSERT supported) Single Table (i.e. no joins) AND, OR, IN predicates (no LIKE support) Joins SMP requested

V5R3
All of the above plus VIEWs UNIONs SubQueries INSERT, UPDATE, DELETE Star Schema Join (STAR_JOIN without QAQQINI support)

2006 IBM Corporation

15

Goal is to use SQE if at all possible. Conditions where SQE can be used varies according to the OS level.

The Query Dispatcher


V5R4
All of the above plus LIKE predicates LOB columns

SQE

ALWCPYDTA(*NO) & SENSITIVE Cursors

Remaining SQE restrictions:


User Defined Table Functions (UDTFs) Translation (UPPER, LOWER, etc.) Alternate sort sequences Logical File references Derived key and Select/Omit Logical Files on the table queried Non-SQL interfaces

2006 IBM Corporation

16

See previous chart more queries allowed into SQE as of i5/OS V5R4.

The Query Dispatcher


Back up to CQE if any of the following are encountered:
Select/omit logical file Logical file over multiple members Derived key (s)
Native logical files that perform some intermediate mapping of the fields referenced in the key. Common ones are renaming fields, adding a translate or only selecting a subset of the columns. Specifying an Alternate Collating Sequence (ACS) on a field used for a key will also make a derived key (an implied map occurs within the index).

Sort Sequence (NLSS) specified for index or logical file


Probably the trickiest one to detect for users. The index is built while an NLSS table is specified in the query environment. Can effect SQL create indexes.

Cost to back up and revert to CQE adds about 15% to the total optimization time QAQQINI option to ignore S/O and derived keyed logical files
IGNORE_DERIVED_INDEXES *YES

2006 IBM Corporation

17

The Dispatcher also has the built-in capability to re-route an SQL query to CQE that was initially routed to SQE. Unless the IGNORE_DERIVED_INDEX option with a parameter value of *YES is specified, a query will typically be reverted back to CQE from SQE whenever the Optimizer processes table objects that have any of the following logical files or indexes defined: Logical files with the SELECT/OMIT DDS keyword specified Non-standard indexes or derived keys, for example logical files specifying the DDS keywords RENAME or Alternate Collating Sequence (ACS) on any field referenced in the key Sort Sequence NLSS specified for the index or logical file As new functionality is added in the future, the Dispatcher will route more queries to SQE and increasingly fewer to CQE. It is possible to implement a QAQQINI option to allow the optimizer to IGNORE logical files with select/omit keyword, hence allowing the query to be processed by SQE even when such a logical file exists.

SQE - Ignore Derived Index


SQL SQL Request Request

QAQQINI
IGNORE_DERIVED_INDEX IGNORE_DERIVED_INDEX = = *YES *YES

SQE? SQE?
NO YES!

HLL HLL Native Native I/O I/O Physical Physical File File

Index Index

Keyed Keyed Logical Logical File File

Keyed Keyed S/O S/O Logical Logical File File

Derived Index Present Ignored

2006 IBM Corporation

18

Setting to *YES the IGNORE_DERIVED_INDEX option of the QAQQINI file will allow the SQE optimizer to ignore the derived
index and process the query. The resulting query plan will be created without any regard to the existence of the derived index(s). The index types that are ignored include: Keyed logical files defined with select or omit criteria and with the DYNSLT keyword omitted Keyed logical files built over multiple physical file members (V5R2 restriction, not a restriction for V5R3) Keyed logical files where one or more keys reference an intermediate derivation in the DDS. Exceptions to this are: 1. when the intermediate definition is defining the field in the DDS so that shows up in the logical's format and 2. RENAME of a field (these two exceptions do not make the key derived) Keyed logical files with K *NONE specified. Keyed logical files with Alternate Collating Sequence (ACS) specified SQL indexes created when the sort sequence active at the time of creation requires a weighting (translation) of the key to occur. This is true when any of several non-US language IDs are specified. It also occurs if language ID shared weight is specified, even for language US.

What are Statistics, and why are they important?


Table and column statistics describe the data to the DBMS Query optimizers use statistics to understand the data If the correct statistics are collected and available, the cost based optimizer will more accurately estimate the number of rows to process Better estimates will allow for better query optimization and the selection of the best query plan

2006 IBM Corporation

19

Biggest issue with poor plans after upgrading is no stats and no indexes. Statistical information is constructed of single columns of a table and stored internally as part of the table. By default, this information is collected automatically by the system, but you can manually control the collection of statistics. Unlike indexes, however, statistics are not maintained immediately as data in the tables change.

Statistics
All query optimizers rely upon statistics to make plan decisions
DB2 for i5/OS has always relied upon indexes as its source for column statistics Other databases rely upon manual stats collection for their source

SQE offers a new, hybrid approach


In addition to indexes, single column statistics can be gathered and used Automatically collected for cases where indexes do not already exist Column statistics stored with table object Column statistics do not have any maintenance overhead when row is inserted, updated or deleted Automatically refreshed as they become stale

2006 IBM Corporation

20

On many platforms, statistics collection is a manual process that is the responsibility of the database administrator. With iSeries servers, the database statistics collection process is handled automatically, and only rarely is it necessary to update statistics manually. The Statistics Manager does not actually run or optimize the query. It controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The answers can either be derived from table header information, from existing indexes, or from single-column statistics. The Statistics Manager must always provide an answer to the questions from the Optimizer. It uses the best method available to provide the answers. For example, it may use a single-column statistic or perform a key range estimate over an index. Along with the answer, the Statistics Manager returns a confidence level to the optimizer that the optimizer may use to provide greater latitude for sizing algorithms. If the Statistics Manager provides a low confidence in the number of groups that are estimated for a grouping request, then the optimizer may increase the size of the temporary hash table allocated.

SQE Automatic Statistics Considerations


Column statistics are collected after query optimization (reactive) When an index is not available to answer a stats question, the query might run significantly longer on the first execution
The SQE Optimizer may interpret the same set of default statistics differently than CQE, resulting in a different plan Plans are re-optimized on-demand as the new stats become available

This warm-up effect can be prevented by proactively creating indexes


Columns most at risk are those with no indexes or those that are not a part of the leading keys of the index Local selection columns Joining columns Grouping columns Ordering columns

2006 IBM Corporation

21

What are Indexes, and why are they important?


Indexes logically organize and reference rows in a table Indexes describe the data to the DBMS If the correct indexes are available, the cost based optimizer will have more options available for implementation
Based on the range and cardinality of the data Based on the selectivity of the query

More options should allow for better overall query performance Indexes are maintained as the underlying data changes
Column statistics are not maintained, only refreshed

2006 IBM Corporation

22

An SQL index is a subset of the data in the columns of a table that are logically arranged in either ascending or descending order. Each index contains a separate arrangement. These arrangements are used for ordering (ORDER BY clause), grouping (GROUP BY clause), and joining. An SQL index is a keyed logical file. The index is used by the system for faster data retrieval. Creating an index is optional. You can create any number of indexes. You can create or drop an index at any time. The index is automatically maintained by the system. If you are trying to decide whether to use statistics or indexes to provide information to the Statistics Manager, keep the following differences in mind. One major difference between indexes and column statistics is that indexes are permanent objects that are updated when changes to the underlying table occur, while column statistics are not. If your data is constantly changing, the Statistics Manager may need to rely on stale column statistics. However, maintaining an index after each change to the table might take up more system resources than refreshing the stale column statistics after a group of changes to the table have occurred. Another difference is the effect that the existence of new indexes or column statistics has on the Optimizer. When new indexes become available, the Optimizer will consider them for implementation. If they are candidates, the Optimizer will re-optimize the query and try to find a better implementation. However, this is not true for column statistics. When new or refreshed column statistics are available, the Statistics Manager will interrogate immediately. Reoptimization will occur only if the answers are significantly different from the ones that were given before these refreshed statistics. This means that it is possible to use statistics that are refreshed without causing a reoptimization of an access plan. When trying to determine the selectivity of predicates, the Statistics Manager considers column statistics and indexes as resources for its answers in the following order: Try to use a multi-column keyed index when ANDed or ORed predicates reference multiple columns If there is no perfect index that contains all of the columns in the predicates, it will try to find a combination of indexes that can be used. For single column questions, it will use available column statistics If the answer derived from the column statistics shows a selectivity of less than 2%, indexes are used to verify this answer Accessing column statistics to answer questions is faster than trying to obtain these answers from indexes. Column statistics can only be used by SQE. For CQE, all statistics are retrieved from indexes. Finally, column statistics can be used only for query optimization. They cannot be used for the actual implementation of a query, whereas indexes can be used for both.

SQE Plan Cache


Incorporates Self-Managing Technology
Cache is automatically maintained to keep most active queries available for reuse Plans are optimized on-demand as new stats or indexes become available Foundation for a self-learning query optimizer to interrogate the plans to make wiser costing decisions

Caches all access plans optimized by the SQE Optimizer


Allows more reuse of existing plans regardless of interface for identical SQL statements Works in conjunction with the System Wide Statement Cache and the SQL programs, packages and service programs

Cache is cleared during an IPL New interface to observe and analyze SQE plan information

2006 IBM Corporation

23

The Plan Cache is a repository that contains the access plans for queries that were optimized by SQE. Access plans generated by CQE are not stored in the Plan Cache; instead, they are stored in SQL Packages, the system-wide statement cache, and job cache). The purposes of the Plan Cache are to: Facilitate the reuse of a query access plan when the same query is re-executed Store runtime information for subsequent use in future query optimizations Once an access plan is created, it is available for use by all users and all queries, regardless of where the query originates. Furthermore, when an access plan is tuned, when creating an index for example, all queries can benefit from this updated access plan. This eliminates the need to reoptimize the query, resulting in greater efficiency. There is a separate plan cache for each IASP for the system, so varying the IASP will also cause the plan cache to be cleared.

SQE Plan Cache Example


SQL PGM-A

Plan Cache
Statement 1
SQE Plan Cache SQE stores all query plans in a centralized system wide plan cache. Along with the plans, Plan Ymeta SQE keeps data and runtime information for the plans (i.e. queries). The plan cache is Plan Z considered temporary storage.

Plan X

SQL PKG-1 Statement 3 Statement 4

Statement 2

SQL PKG-2 Statement 3 Statement 5

2006 IBM Corporation

24

As shown in this chart, the Plan Cache is interrogated each time a query is executed in order to determine if a valid access plan exists that satisfies the requirements of the query. If a valid access plan is found, it is used to implement the query. Otherwise a new access plan is created and stored in the Plan Cache for future use. The Plan Cache is automatically updated with new query access plans as they are created, or is updated for an existing plan (the next time the query is run) when new statistics or indexes become available. The Plan Cache is also automatically updated by the database with runtime information as the queries are run. Each plan cache entry contains the original query, the optimized query access plan and cumulative runtime information gathered during the runs of the query. In addition, several instances of query runtime objects are stored with a plan cache entry. These runtime objects are the real executables and temporary storage containers (hash tables, sorts, temporary indexes, and so on) used to run the query. All systems are currently configured with the same size Plan Cache, regardless of the server size or the hardware configuration. When the Plan Cache exceeds its designated size, a background task is automatically scheduled to remove plans from the Plan Cache. Access plans are deleted based upon the age of the access plan, how frequently it is being used and how much cumulative resources (CPU/IO) were consumed by the runs of the query. The total number of access plans stored in the Plan Cache depends largely upon the complexity of the SQL statements that are being executed. In certain test environments, there have been typically between 10,000 to 20,000 unique access plans stored in the Plan Cache. The Plan Cache is cleared when a system Initial Program Load (IPL) is performed. Multiple access plans can be maintained for a single SQL statement. Although the SQL statement itself is the primary hash key to the Plan Cache, different environmental settings can cause different access plans to be stored in the Plan Cache. Examples of these environmental settings include: Different SMP Degree settings for the same query Different library lists specified for the query tables Different settings for the job's share of available memory in the current pool Different ALWCPYDTA settings Currently, the Plan Cache can maintain a maximum of 3 different access plans for the same SQL statement. As new access plans are created for the same SQL statement, older access plans are discarded to make room for the new access plans. There are, however, certain conditions that can cause an existing access plan to be invalidated. Examples of these include: Specifying REOPTIMIZE_ACCESS_PLAN(*YES) or (*FORCE) in the QAQQINI table or in Run SQL Scripts Deleting or recreating the table that the access plan refers to Deleting an index that is used by the access plan

SQE Plan Cache


Better centralized management of plan information:
Plans are stored in a compressed mode Plans stored independent of job information for better sharing of plans Access is optimized to minimize contention on plan entries across the system Out of date plans are cycled from the cache as more space is needed Multiple plans can be maintained for identical SQL statements (library list or environmental changes)

Repository of information that is used for feedback and automatic column stats generation Enabling auto stats collection causes the Statistics Manager to interrogate the Plan Cache looking for plans where stats would have been helpful

2006 IBM Corporation

25

Specific details about the current implementation of the Plan Caches: 1. The plan cache is initially created with a maximum size of 256 million bytes (not 256 Meg). This will contain the original query that was optimized as well as the optimized query plan. There are other objects stored with the query that do not get rolled into this total 2. All systems are currently configured for this same size plan cache regardless of the size of the hardware. 3. When the plan cache exceeds its size, then a background task takes charge to remove old plans. During this window while the background task is processing, the plan cache may exceed it allocated size. 4. Then plan cache can hold many different plans. The total number will depend upon the complexity of the SQL statements that are being run. On the systems that we have been currently monitoring (i.e. single table queries) the plan cache has typically been storing around 6,000 unique plans. 5. The size of the plan varies, but in its compressed mode it is still significantly smaller then the size of the plans that CQE still keeps around. 6. Multiple plans can be maintained for a single SQL statement. The SQL statement is the primary hash key for the plan cache so identical SQL statements will all be stored together, but different environmental setting can cause us to keep different plans in the cache that match the specific environment. Examples of this would include: 1. Two plans where the SMP Degree settings are different 2. Two plans where the library list has changed for the tables 3. Two plans with different settings for this jobs share of memory available in the current pool 4. Two plans with different ALWCPYDTA settings (etc)

Query Optimization Feedback


Visual Explain Indexes Advised SQE Plan Cache SQE Plan Cache Snapshots Detailed DB Monitor Data SQL request Summarized DB Monitor Data Debug Job Log Messages

Query Optimization
Print SQL Information Messages

2006 IBM Corporation

26

Not much user interaction can be accomplished for either the validation or execution of the query. The user can have a real effect during the query optimization. Will focus on query optimization phase. The DB2 UDB for AS/400 optimizer is "cost based" with full query rewrite capability Stats provided by the DB are used to come up with the best cost or best plan (least costly, less time) Up to V5R4 we essentially had 4 different sources of information to be used for query optimization:

SQE Autonomic Indexes


New V5R4 feature SQE optimizer watches query requests and learns Autonomic capability to create an index based on:
Observing n number of queries Assessing the benefits of creating and using an index Optimizers own index advice

Indexes are:
Temporary Maintained while any cursor is open
Maintenance is delayed when all cursors are closed

Shared between queries and jobs Primarily created on smaller tables Not used for statistics

Allows additional queries to use SQE (ex. Sensitive cursors) Allows DB2 for i5/OS to tune itself
2006 IBM Corporation 27

SQL Query Engine

Using V5R4 i5/OS means using the new DB2 for i5/OS SQL Query Engine Preparing for the OS upgrade will provide the best experience In other words getting the most out of DB2 for i5/OS enhancements

2006 IBM Corporation

28

SQE can perform significantly better than CQE, more and more queries can take advantage of it.

Database Parallelism within i5/OS


Two types of parallelism: Parallel I/O
Free and always available Enabled system wide via sysval QQRYDEGREE (*IO) or (*OPTIMIZE) Enabled for single job via CHGQRYA command or QAQQINI PARALLEL_DEGREE keyword

Parallel CPU
Fee, requires 5722SS1 Opt.26 DB2 UDB Symmetric Multiprocessing Enabled system wide via sysval QQRYDEGREE (*OPTIMIZE) Enabled for single job via CHGQRYA command or QAQQINI PARALLEL_DEGREE keyword

2006 IBM Corporation

29

There are two types of parallel processing available. The first is a parallel I/O that is available at no charge. The second is DB2 UDB Symmetric Multiprocessing, a feature that you can purchase. You can turn parallel processing on and off. Even though parallelism has been enabled for a server or given job, the individual queries that run in a job might not actually use a parallel method. This might be because of functional restrictions, or the optimizer might choose a non-parallel method because it runs faster. Because queries being processed with parallel access methods aggressively use main storage, CPU, and disk resources, the number of queries that use parallel processing should be limited and controlled. The QQRYDEGREE system value ca be used to control parallel processing for a server. Query parallel processing can also be cotrolled at the job level using the DEGREE parameter of the Change Query Attributes (CHGQRYA) command or in the QAQQINI file, or using the SET_CURRENT_DEGREE SQL statement.

DB2 for i5/OS introduced parallel database functionality, also known as SMP (Symmetric Multi-Processing) capability in V3R1 (1995), and has enhanced those functions significantly since then. The DB2 for i5/OS SMP feature is an Operating System add-on that can significantly improve performance of database tasks by breaking the processing across multiple CPUs. By doing this, SMP optimizes the available resources to complete the database task in a cost-effective manner. Many other databases use the notion of data partitioning to address scalability

DB2 Symmetric Multiprocessing


Parallel Methods
Parallel table scan Parallel index scan Parallel hash join Parallel hash group by Parallel index build Parallel data load Parallel index ANDing/ORing of dynamic bit maps Parallel index maintenance Parallel Encoded Vector Index Parallel I/O
Cust #1 Cust #2 Cust #3 Cust #4 Cust #5 Cust #6 Cust #7 IOP IOP

QUERY SQL

M E M O R Y

Single Level Storage


IOP IOP IOP IOP IOP

Parallel recovery Parallel Encoded Vector Indexing Parallel Reorganize

2006 IBM Corporation

30

Parallelism within the database is available across a broad range of tasksbut ONLY if the DB2 Symmetric Multi-processing feature of OS/400 is installed and activated (via system value or user profile. Again, this is a huge performance benefit in servers with multiple CPUs configured in the server or partition, because of the ability of the database to spread tasks simultaneously across these processors for any given query/database load/index build/re-org,etc.

Indexing strategy

http://www-03.ibm.com/servers/enable/site/education/abstracts/indxng_abs.html http://www-03.ibm.com/servers/enable/site/education/abstracts/16fa_abs.html
2006 IBM Corporation 31

DB2 for i5/OS


Two types of indexing technologies are supported
Radix Index Encoded Vector Index

Each type of index has specific uses and advantages Respective indexing technologies complement each other Indexes can be used for statistics and implementation Indexes can provide RRNs and/or data Indexes are scanned or probed
Probe can only occur on contiguous, leading key columns Scan can occur on any key column Probe and scan can be used together
2006 IBM Corporation 32

Using Indexes - Probe v Scan


Index Key Columns (ITEM_NO, COLOR, SIZE) Probe (key positioning) with leading, n contiguous key columns 1 1+2 1+2+3 Scan (test) with any other key columns 2 3 2+3

ITEM_NO 001 001 003 004

COLOR BLUE RED BLACK GREEN

SIZE SMALL LARGE SMALL MEDIUM

...WHERE COLOR = 'BLACK AND ITEM_NO = 003 ...WHERE SIZE = 'MEDIUM' ...WHERE ITEM_NO = 001 AND SIZE = LARGE
2006 IBM Corporation 33

The optimizer is able to perform radix index probe over any combination of the primary key columns, plus one additional secondary key column. Therefore it is important that the first secondary key column be the most selective secondary key column. The optimizer will use radix index scan with any of the remaining secondary key columns. While radix index scan is not as fast as radix index probe it can still reduce the number of keys selected. Hence, secondary key columns that are fairly selective should be included. It is up to the user to determine the true selectivity of any secondary key columns and to determine whether those key columns should be included when creating the index. When building the index the primary key columns should be the leftmost key columns followed by any of the secondary key columns the user chooses and they should be prioritized by selectivity. Note: After creating the suggested index and executing the query again, it is possible that the query optimizer will choose not to use the suggested index. The CQE optimizer when suggesting indexes only considers the selection criteria and does not include join, ordering, and grouping criteria. The SQE optimizer includes selection, join, ordering, and grouping criteria when suggesting indexes. You can access index advisor information in many different ways. These include: The index advisor interface in iSeries Navigator SQL performance monitor Show statements Visual Explain interface Querying the Database monitor view 3020 - Index advised.

Radix Index
Index tree structure Key values are compressed
Common patterns are stored once Unique portion stored in leaf pages Positive impact on size and depth of the index tree

Algorithm used to find values


Binary search Modified to fit the data structure

Maintenance
Index data is automatically spread across all available disk units Tree is automatically rebalanced to maintain an efficient structure

Temporary indexes
Considered a temporary data structure to assist the DB engine Maintained temporary indexes available in SQE V5R4

2006 IBM Corporation

34

An SQL index (or keyed sequence access path) is a permanent object that is created over a table and used by the optimizer to provide a sequenced view of the data for a scan or probe operation. The rows in the tables are sequenced in the index based upon the key columns specified on the creation of the object. When the key columns are matched up by the optimizer to a query, it gives the optimizer the ability to use the radix index to help satisfy any selection, ordering, grouping or join requirements. Typically the use of an index operation will also include a Table Probe operation to provide access to any columns needed to satisfy the query that cannot be found as index keys. If all of the columns necessary to satisfy the query request for a table can be found as keys of an index, then the Table Probe is not required and the query uses Index Only Access. Avoiding the Table Probe can be an important savings for a query. The I/O associated with a Table Probe is typically the more expensive synchronous random I/O.

Radix Index
Database Table 001 002 003 004 005 ARKANSAS MISSISSIPPI MISSOURI IOWA ARIZONA

ROOT ROOT

Test Test Node Node

MISS MISS

AR AR IZONA IZONA 005 005 KANSAS KANSAS 001 001

IOWA IOWA 004 004

ISSIPPI ISSIPPI 002 002

OURI OURI 003 003

ADVANTAGES: Very fast access to a single key value Also fast for small, selected range of key values (low cardinality) Provides order

DISADVANTAGES: Table rows retrieved in order of key values (not physical order) which equates to random I/Os No way to predict which physical index pages are next when traversing the index for large number of key values

2006 IBM Corporation

35

This is a visual representation of a Binary radix Index structure.

Index Probe Example


Given an index on table EMPLOYEE keyed on STATE... SELECT * FROM EMPLOYEE WHERE STATE = IOWA
EMPLOYEE Index
STATE RRN

EMPLOYEE Table
RRN STATE

Perform Perform a a probe probe into into the the range range using using the the local local selection selection value(s) value(s)

... IOWA (004) IOWA (007) IOWA (010) IOWA (017) KANSAS (011) MISSISSIPPI (002) MISSISSIPPI (013) MISSOURI (003) ...

001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017

ARKANSAS MISSISSIPPI MISSOURI IOWA ARIZONA MONTANA IOWA NEBRASKA NEBRASKA IOWA KANSAS WISCONSIN MISSISSIPPI WISCONSIN WISCONSIN ARKANSAS IOWA
36

2006 IBM Corporation

Table scan (arrival sequence processing) is very efficient on the AS/400 due to the independent I/O subsystem, large memory and single level storage and automatic data stripping. SQLxxxx messages are used to illustrate feedback from the optimization. SQLxxxx messages are used for PRTSQLINF and allow the meaningful 1st level text to be shown in the spool file and the message text to be translatable MRI.

Encoded Vector Index (EVI)


Index for delivering fast data access in analytical and reporting environments
Advanced technology from IBM Research Used to produce dynamic bitmaps and RRN lists Fast access to statistics to improve query optimizer decision making

Not a tree structure Can only be created through an SQL interface or iSeries Navigator GUI
CREATE ENCODED VECTOR INDEX SchemaName/IndexName ON SchemaName/TableName (ColumnName) WITH n DISTINCT VALUES;

2006 IBM Corporation

37

An encoded vector index is a permanent object that provides access to a table by assigning codes to distinct key values and then representing those values in a vector. See next chart for a visual representation of an EVI Even though an encoded vector index is used to represent the values stored in a table, the index itself cannot be used to directly gain access to the table. Instead, the encoded vector index can only be used to generate either a temporary row number list or a temporary row number bitmap. These temporary objects can then be used in conjunction with a Table Probe to specify the rows in the table that the query needs to process. The main difference with the Table Probe associated with an encoded vector index (versus a radix index) is that the paging associated with the table can be asynchronous. The I/O can now be scheduled more efficiently to take advantage of groups of selected rows. Large portions of the table can be skipped over where no rows are selected. EVI can be created via SQL or iNav GUI.

Encoded Vector Index (EVI)


Symbol Table Key Value Code Arizona Arkansas Wisconsin Wyoming 49 50 7 252 30111 83000 340 2760 1 2 First Row 1 5 Last Row 80005 99760 Count 5000 7300 Vector 1 17 5 9 2 7 49 49 5

RRN 1 2 3 4 5 6 7 8 9

Symbol table contains information for each distinct key value - Each key value is assigned a unique code (key compression) - Code is 1, 2, or 4 bytes depending on number of distinct key values Rather then a bit array for each distinct key value, use one array of codes
2006 IBM Corporation 38

An encoded vector index is a permanent object that provides access to a table by assigning codes to distinct key values and then representing those values in a vector. The size of the vector will match the number of rows in the underlying table. Each vector entry represents the table row number in the same position. The codes generated to represent the distinct key values can be 1, 2 or 4 bytes in length, depending upon the number of distinct values that need to be represented. Because of their compact size and relative simplicity, the EVI can be used to process large amounts of data very efficiently.

Bitmap / RRN List Example


Given an EVI on table EMPLOYEE keyed on STATE... ...WHERE STATE = ILLINOIS
Set bits in bitmap or return RRN list

Vector Symbol Table

Bitmap

Key
Binary search symbol table for key(s) and code(s)

Code

ARIZONA ARKANSAS CALIFORNIA COLORADO ILLINOIS IOWA KANSAS MISSISSIPPI ...

1 2 3 4 5 6 7 8 ...

Scan vector for code(s)

1 17 5 9 2 7 49 49 5 . . . . . . .

0 0 1 0 0 0 0 0 1 . . . . . . .

Row 1 2 3 4 5 6 7 8 9 ...

2006 IBM Corporation

39

Picture used to describe EVI used to create a dynamic bitmap

Index ANDing / ORing Example


State Workdept

EVI

EVI

SELECT * FROM EMPLOYEE WHERE STATE = MINNESOTA' AND WORKDEPT IN ( 'B01', C01, 'E01')
Final RRN list

Intermediate RRN list

Intermediate RRN list

3 5 10 15 1000 AND 1005 (Merge) 1007 3001 3050


State

3 7 10 27 1000 1010 2035 3001 4100


Workdepts

3 10 1000 3001

Represents Represents all all the the local selection selection

2006 IBM Corporation

40

If the optimizer decides to use an EVI to process the query, the database engine uses the vector to build the dynamic bitmap (or a list of selected row ids) that contains one bit for each row in the table, the bit being turned on for each selected row. Like a bitmap index, these intermediate dynamic bitmaps (or lists) can be AND'ed and OR'ed together to satisfy an ad hoc query.

This picture is used to describe EVIs generated RRN list ANDed to produce a final RRN list that represents all the local selection.

Index ANDing / ORing Example 1


State Workdept

Radix
Intermediate Bitmap

EVI

SELECT * FROM EMPLOYEE WHERE STATE = IOWA' OR WORKDEPT IN ( 'B01', C01, 'E01')
Final Bitmap

Intermediate Bitmap

State

0 0 0 1 0 0 0 0 0 . . . 1 . . .

OR
(Merge)

Workdepts

0 0 1 0 0 0 0 0 1 . . . 1 . . .

0 0 1 1 0 0 0 0 1 . . . 1 . . .

Represents Represents all all the the local selection selection

2006 IBM Corporation

41

Zigzag for ANDing is when the starting (or continuing) point for one bitmap/rrnlist comes from the other rrnlist/bitmap. So it essentially leap frogs through the bitmaps/rrn lists. Consequently ANDing of multiple 'not so selective' indexes can be combined to generate a pretty fast plan. Zigzag for ORing is not nearly as interesting. Its a merge, so the OR processing walks both sides, taking the next (ordered) RRN from the source bitmaps/rrn list. Therefore each side is walked and a merge is done on the 'ordered' rrns to get the outputted ordered. The merge is done so that the subsequent table probes are still skip

DB2 for i5/OS


cardinality The number of elements in a set.

High cardinality = large distinct number of values Low cardinality = small distinct number of values

In general
A radix index is best when accessing a small set of rows and the key cardinality is high An encoded vector index is best when accessing a set of rows and the key cardinality is low Understanding the data and query are key
2006 IBM Corporation 42

Indexing Advice from the Optimizer


Both CQE and SQE provide index creation advice CQE
Basic advice Radix index only Based on table scan and local selection columns only Temporary index creation information also provides insight CQE Visual Explain will try and tie pieces together to advice a better index

SQE
Robust advice Radix and EVI indexes Based on all parts of the query Multiple indexes can be advised for the same query Some limitations

2006 IBM Corporation

43

SQE index advice was enhanced in V5R3 (with latest DB PTFs) and V5R4

Index Advised System wide


New V5R4 feature System wide index advice
Data is placed into a DB2 table (QSYS2/SYSIXADV) Autonomic No overhead

CQE and SQE support


CQE only provides basic advice based on local selection predicates SQE provides complex advice based on all parts of the query
Not complete, but much better

GUI interface via iSeries Navigator


Advice for System, or Schema, or Table

System only adds (summary) rows, user must manage the data
Options to clear or prune

Can create indexes directly from GUI


Additional indexing analysis might be required to determine the optimal index

2006 IBM Corporation

44

Autonomic Index Creation


Optimizer can have the DB Engine create a temporary index Both full and sparse indexes can be created Temporary indexes are not used for statistics Temporary indexes are maintained CQE
Temporary indexes are not reused and not shared Usually a bottleneck in query performance Can impact overall system performance Can increase the amount of temporary storage used

SQE
New feature in V5R4 Temporary indexes are reused and shared Creation is based on watching the query requests over time Creation is based on optimizers own index advice Temporary index maintenance is delayed when all associated cursors are closed

2006 IBM Corporation

45

A cooler SQE enhancement in V5R4 is extra intelligence in the SQE query optimizer to improve performance with the automatic creation of indexes known as autonomic indexes. The optimizer sits in the background and looks for frequently executed queries that could benefit from an index being created. If such a query is identified, then an Autonomic Index is created by DB2 UDB. This Autonomic Index (AI) is created as a temporary index that is maintained by DB2 UDB as the underlying table data is changed similar to a permanent SQL index. The AI, however, will disappear once the system is shut down or restarted. i5 shops that make heavy use of commitment control will also be able to recover faster with the parallel rollback option and the ability to customize commitment control disk behavior to get the best performance for their business. ============ Autonomic Indexes can be disabled with CACHE_RESULTS & ALLOW_TEMPORARY_INDEXES INI options Autonomic Index is not used for stats & can disappear if all the owning plans are removed from the Plan Cache. Can be viewed with Index Evaluator.

Why create indexes?


The goals of creating indexes are: 1. Provide the optimizer the statistics needed to understand the data, based on the query 2. Provide the optimizer implementation choices, based on the selectivity of the query

Accurate statistics means accurate costing Accurate costing means optimal query plan Optimal query plans means best performance
2006 IBM Corporation 46

Indexing Strategy Case Study Results


select distinct returnflag from part_orders order by returnflag

36.37 secs V 0.07 secs

2006 IBM Corporation

47

Providing the Optimizer with the correct index can greatly improve performance i.e. attain lower response times

Index Evaluator (Show Indexes)

Meta data for perm and temp indexes

2006 IBM Corporation

48

It is possible to check what indexes are actually in plan on any table using the iNav GUI.

Materialized Query Tables & Look Ahead Predicate Generation

http://www-03.ibm.com/servers/enable/site/education/abstracts/438a_abs.html http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzajq/rzajqlpg.htm
2006 IBM Corporation 49

Materialized Query Tables (MQTs) Overview


I

V5R3 DB2 UDB for iSeries Enhancement


Also known as "materialized views" and "automatic summary tables" Ability to create MQTs at GA Ability to implicitly recognize and use MQTs via PTF April 29, 2005 MQTs are not automatically maintained by the system User maintained only

I I

V5R4 removes some restrictions on use (i.e. better matching query MQT) MQTs contain:
Query definition Results of the query

SQE optimizer can rewrite the user's query to use the MQT
Query uses MQT's pre-aggregated, pre-calculated, pre-joined data Can significantly increase query performance Can significantly decrease resource utilization

I I I

Default behavior is: do not use MQTs QAQQINI file options to enable/disable support More information at: ibm.com/servers/enable/site/education/ibo/view.html?wp
2006 IBM Corporation 50

Materialized Query Tables

T1

T2

T3

uery No Q ite r Rew


SQL Queries

Base Tables

DB2 Query Optimizer


Query Rewrit e
M1

MQT

2006 IBM Corporation

51

MQT Creation Syntax


CREATE TABLE - MQT name - AS - grouping columns - aggregate/summary columns FROM - table(s) WHERE - selection columns GROUP BY - grouping columns ORDER BY - ordering columns) (SELECT DATA INITIALLY IMMEDIATE DATA INITIALLY DEFERRED REFRESH DEFERRED REFRESH IMMEDIATE ENABLE QUERY OPTIMIZATION DISABLE QUERY OPTIMIZATION MAINTAINED BY USER MAINTAINED BY SYSTEM
Query that defines and populates MQT

Data is inserted into the MQT when it is created Data is not inserted into the MQT when it is created Data in the MQT can be refreshed at any time using the REFRESH TABLE statement Not currently supported

MQT can be used for query optimization MQT cannot be used for query optimization but the MQT can still be queried directly MQT is maintained by the user with INSERT, DELETE, UPDATE, or REFRESH TABLE Not currently supported
2006 IBM Corporation 52

MQT Creation Syntax


ALTER TABLE - table name ADD MATERIALIZED QUERY (SELECT - grouping columns - aggregate/summary columns FROM - table(s) WHERE - selection columns GROUP BY - grouping columns ORDER BY - ordering columns) DATA INITIALLY IMMEDIATE DATA INITIALLY DEFERRED REFRESH DEFERRED REFRESH IMMEDIATE ENABLE QUERY OPTIMIZATION DISABLE QUERY OPTIMIZATION MAINTAINED BY USER MAINTAINED BY SYSTEM
Table is altered to be recognized as MQT Query that defines and populates MQT

Data is inserted into the MQT when it is created Data is not inserted into the MQT when it is created Data in the MQT can be refreshed at any time using the REFRESH TABLE statement Not currently supported

MQT can be used for query optimization MQT cannot be used for query optimization but the MQT can still be queried directly MQT is maintained by the user with INSERT, DELETE, UPDATE, or REFRESH TABLE Not currently supported
2006 IBM Corporation 53

DB2 Family & MQTs Stage 2 V5R4


MQT Example:
CREATE TABLE Example_MQT AS (SELECT Geography, Region, Year, Month, SUM(Revenue) AS Total_Revenue, SUM(Quantity) AS Total_Quantity, COUNT(*) AS Rows_per_Group FROM Example_Table GROUP BY Geography,Region,Year,Month) DATA INITIALLY IMMEDIATE REFRESH DEFERRED ENABLE QUERY OPTIMIZATION MAINTAINED BY USER

Optimizer could use the MQT instead of fully executing the following query:
SELECT Geography, Year, SUM(Revenue) AS Total_Revenue, SUM(Quantity) AS Total_Quantity, FROM Example_Table WHERE Year IN (2004, 2005) GROUP BY Geography, Year; white paper: ibm.com/servers/enable/site/education/abstracts/438a_abs.html

User responsible for keeping MQT data current and activating optimizer MQT awareness with QAQQINI options

2006 IBM Corporation

54

Materialized Query Tables (MQTs) are another feature that only can be used by SQE. Just like Instead of Triggers, MQTs were introduced after the GA of V5R3 via PTF and the capabilities have been enhanced in V5R4. A materialized query table (MQT) is a DB2 table that contains the results of a query, along with the querys definition. Similar to autonomic indexes, they can be used by SQE to automatically improve query performance. When an SQL query is run, the SQE query optimizer checks to see if there are any MQTs on the system that either the match the query being run or partially match. When the query optimizer does find a match or partial match, it will rewrite the query and just use the contents of the MQT as a short cut to speed up the delivery of the query results. MQTs are most beneficial in data warehousing and business intelligence environments since the data does not change as frequently as a transactional system.

Look-ahead Predicate Generation (LPG)


A strategy to generate local selection predicates for one table, from one or more other tables Using the (generated) local selection predicates, more options are available for data access and data processing Minimizes the effects of a poor join order Opportunity for additional indexing Can a have a very positive affect on query performance! An example of query rewrite

2006 IBM Corporation

55

A special type of transitive closure called look ahead predicate generation (LPG) may be costed for joins. In this case, the optimizer attempts to minimize the random I/O costs of a join by pre-applying the results of the query to a large fact table. LPG will typically be used with a class of queries referred to as star join queries, however it can possibly be used with any join query. Look at the following query: SELECT * FROM EMPLOYEE,EMP_ACT WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO AND EMPLOYEE.EMPNO ='000010' The optimizer may decide to internally modify the query to be: WITH HT AS (SELECT * FROM EMPLOYEE WHERE EMPLOYEE.EMPNO='000010') SELECT * FROM HT, EMP_ACT WHERE HT.EMPNO = EMP_ACT.EMPNO AND EMP_ACT.EMPNO IN (SELECT DISTINCT EMPNO FROM HT) The optimizer places the results of the "subquery" into a temporary hash table. The hash table of the subquery can be applied in one of two methods against the EMP_ACT (fact) table:

Look-ahead Predicate Generation (LPG)


Select from small_table s, big_table b where (s.store = 'Store 1' or s.store = 'Store 2' or s.store = 'Store 3' or s.store = 'Store 4') and s.storekey = b.storekey

Generated local selection provided by optimizer

LPG

Generate list of distinct key values based on the join column

Big_Table
Storekey

Small_Table
Storekey

Join Condition

Store

Storekeys 00001 00002 00003 00004

Dial 2

Dial 1

No local selection provided by query

Local selection provided by query

2006 IBM Corporation

56

Star / Snowflake Schema Join via LPG


Final Final bitmap bitmap or or RRN RRN list list represents represents local local selection selection

Item_Dim keylist
Itemkey

Final Bitmap or RRN list

3 Bitmaps or RRN lists ANDed together Dynamic Bitmap or RRN list

Fact_Table Store_Dim keylist


Storekey Itemkey Itemkey Storekey Datekey Storekey

EVI 1
Dynamic Bitmap or RRN list

EVI 2
Dynamic Bitmap or RRN list

EVI 3
Datekey

Date_Dim keylist
Datekey

Sales Quantity

Keylist Keylist used used as as local local selection selection (LPG) (LPG)

Select store, item, date, weather, sum(sales), sum(quantity) from item_dim, store_dim, date_dim, fact_table where itemkey in (...keylist...) and storekey in (...keylist...) and datekey in (...keylist...) and itemkey=itemkey, storekey=storekey, datekey=datekey group by store, item, date, weather

2006 IBM Corporation

57

CQE: Current maximum for key list is 30,000 keys or 5MB, whichever is smaller. If the keys are not represented in the DLIST then the benefits of the index ANDing and skip sequential access method on the fact table will be minimized. Those keys will not be used for local selection and bitmap optimization.

Ease of Use & Management Improvements

2006 IBM Corporation

58

iSeries Navigator Enhancements


Health Center Import/Export Table actions
CLRPFM command CPYF command INZPFM command

EDTRBDAP support Show current SQL


Statement name Program or package name Open information

Delimiter option on Creates

2006 IBM Corporation

59

As mentioned earlier, the iSeries Navigator V5R4 enhancements contain some of the most notable DB2 enhancements. Ignoring the DB2 OnDemand Performance Center & Health Center, most of the enhancements listed on this chart are pretty minor. With many of these enhancements, just adding a graphical interface for system commands such as Copy File (CPYF). Onto the more significant enhancements ============ Delimiter option on creates allows the user to control the naming mode that iNav uses on Creates since the SQL & System naming mode have differences in object ownership and authorities.

Enhanced Index Advice


System-Wide Index Advisor - Always Active Advice can be viewed at database, schema or table level

Index Advice, also enhanced (SQE Only) SQE Optimizer will also examine Join, Grouping, Ordering criteria Available on V5R3 with latest DB Group PTF

2006 IBM Corporation

60

The Database component of iSeries Navigator is the major driver of DB2 UDB simplification in V5R4 with the addition of new tools and streamlining of existing tools. This new combination of tools is known as the DB2 OnDemand Performance Center. The system-wide Index Advisor is one of the new tools in V5R4. Index advice from the DB2 UDB query optimizer is not a new capability in V5R4. Feedback on suggested indexes in past releases could be found in optimizer debug messages within a joblog or a database monitor collection. However, this feedback was only available if someone manually started a database monitor or turned on debug messaging. Even if this was done, an analyst had to deal with the complexities of extracting the advised index details of various joblogs and database monitor files. The new Index Advisor eliminates these hurdles with a click of mouse. No user intervention such as starting a database monitor is needed since the query optimizer automatically logs index advice for all queries to a repository in V5R4. Just right click on your Database name in the iSeries Navigator tree and select the Index Advisor task and youre given the output displayed here. Even better is the fact that index advice provided in V5R4 by the SQL Query Engine (SQE) query optimizer is more intelligent and complete. The Index Advice given in past releases only focused on filtering criteria of a query and ignored the join, grouping and ordering criteria. Again, this design required manual intervention from an analyst to take the DB2 index advice and then review the associated query to supplement the DB2 index advice. Look closely at the Index Advisor output and you will also see that the optimizer also provides advice on the type of index to create. Some queries may benefit from a traditional radix index while others would benefit from an encoded vector index structure. =============== SQE still does not advise indexes for IN & OR predicates

SystemWide Advisor Repository - QSYS2/SYSIXADV Index Advice can be done at Database, Schema & Table level

Live DB2 Performance Analysis via SQL Plan Cache

Always On - no database monitor overhead


2006 IBM Corporation 61

Another tool that is always up and running like the Index Advisor is the SQL Plan Cache toolset. The SQL Plan Cache is an internal repository used by SQE to store the access plans and associated statistics for SQL statements that are currently running or recently have been executed on your server. The big benefit with this Plan Cache analysis toolset is that for the first time you can perform detailed SQL performance analysis without the overhead of running a database monitor. A detailed database monitor collection can eat up disk space in a hurry and noticeably slow your disk response times. Gone are the days when a user calls about poor performance and your only response for an SQL-based application was starting a database monitor and having the user rerun the report or application. Now, when a performance issue is raised on V5R4, you can just jump right in and analyze the contents of the SQL Plan Cache to see if a long-running SQL statement is the cause of the performance problems. Using the SQL Plan Cache tool is as simple as rightclicking on the SQL Plan Cache Snapshots component and selecting the SQL Plan Cache -> Show Statements task. This action will produce the screen show here. One feature of this Plan Cache viewer not to miss is the statement filters that are available on the left hand-side of the dialog. These filters allow you to intelligently subset the SQL statements that you want to analyze as opposed to scrolling through a random list of thousands of SQL statements. For instance, you can pick the Top 10 statements that are executed most often or take the most time. You can also choose to just analyze the statements currently running or the SQL statements run by a specific user. Furthermore, once you have selected an SQL statement to analyze you can seamlessly launch Visual Explain for that statement from the same window. You will need to remember that the SQL Plan Cache is fixed in size, so some SQL statements may get purged out of the cache if they have not been recently executed. In these situations, the database monitor tool will have to be used. If you want to archive the contents of the cache into a permanent object, you can create an SQL Plan Cache Snapshot. Effectively this operation takes the contents of the SQL plan cache and converts it into a database monitor output file. Once the snapshot is created, you can run reports to analyze the contents just like a database monitor collection.

SQE Plan Cache


Plan Cache is cleared out by IPL, so a snapshot is a nice way of saving the contents of the Plan Cache for further analysis.

Exit point (QIBM_QWC_PWRDWNSYS ) can be used to call a program each time PWRDWNSYS and ENDSBS commands are issued. The program is called before the system actually powers down. iSeries customers could automate capturing a plan cache snapshot prior to each IPL by setting up this exit point and having exit program call the stored procedure QSYS2/DUMP_PLAN_CACHE. This way, they do not have to worry about doing it manually. Set it up once and it does it every time.

2006 IBM Corporation

62

Visual Explain & DB Monitors


Detailed information on the highlighted step

Graphical representation of how the SQL statement was executed

2006 IBM Corporation

63

Advanced Database Monitor Filtering

2006 IBM Corporation

64

Database monitors have existed on DB2 UDB for iSeries for many releases, but the usability of these monitors takes a big step forward in V5R4 with addition of both pre- and post-filters. These filters are very similar to the filters that were available with the SQL Plan Cache viewer. Database monitor filters are important for a couple of reasons. First, the filters can minimize the overhead and disk space consumed by a database monitor collection. As a case in point, a customer recently ran a database monitor collection on all jobs on there system for 15 minutes. An SQL-based ERP application that was running on the system at the time resulted in almost 3 million rows of detailed monitor data. Applying one of the filters available in V5R4 (see Fig 3) such as only collecting monitor data for a certain user or only for SQL statements referencing a specified table could have significantly reduced the amount of data that has to be collected by DB2 UDB. These filters are made available when you start a new SQL Performance Monitor collection via the graphical interface or the STRDBMON CL command.

Database Monitor Analysis Simplified


Dashboard Summary

Drill-Thru Analysis

2006 IBM Corporation

65

The filters discussed on the previous chart eliminate the number of rows in a database monitor collection, but there are still hundreds of fields in a single database monitor row and many monitor row types that need to be sorted thru to identify and fix performance problems. This maze of data made it a challenge to quickly look at a database monitor collection and determine if there were any issues unless you were a DB2 performance expert. Thus, a dashboard summary was added in V5R4 to address this issue. On the first analysis of a database monitor collection, DB2 for iSeries will automatically summarize some of the key performance indicators in the collection and then present the results in the summary window depicted at the top of this chart. This summary will allow an analyst to quickly scroll through some high-level DB2 performance indicators to determine if more detailed analysis is needed. V5R4 also addresses another issue with the database monitor analysis reports by providing drill-through analysis. The reports could be used to find an SQL statement with performance issues. However, if more detailed research of this statement was needed with a tool such as Visual Explain, the user had to exit out of the tool and then manually copy that SQL statement for Visual Explain analysis. As you can see with the bottom figure, an analyst can easily just rightclick to drill into more detail with Visual Explain. This will be a great enhancement for improving the efficiency of your database analysts and administrators.

Database Monitor Analysis Simplified


Monitor Comparison Utility

2006 IBM Corporation

66

Often times the performance of an application can change over time as the server configuration changes and new versions are installed. Unfortunately neither IBM or its customers are perfect when it comes to making changes sometimes have an unexpected negative impact on performance. With the new monitor comparison utility, users can utilize database monitor collections to help determine what change is causing performance issues. This technique requires that a customer collects and saves database monitor collections when a critical DB2 workload or report is performing and running well on their server. Once performance of this workload or report changes, a new database monitor collection can be compared with the good performance monitor collection to help identify the differences causing performance problems. This can be a long-running process with large database monitor collections, so plan accordingly.

Additional Management Enhancements


DB2 Health Center
Monitor database metrics Check which objects are nearing DB2 limits

Index Evaluator (first available on V5R3)


Native last-used date added in V5R4

MQT Evaluator OnDemand SQL Analysis enhanced Show Current SQL


Statement name Program or package name Open Information
2006 IBM Corporation 67

One final iSeries Navigator addition to highlight is the DB2 Health Center. The Health Center allows you to monitor and perform a checkup on how your well database objects are doing compared to different DB2 UDB limits such as the maximum number of rows allowed in a table or maximum size of an index object. In addition, you have the ability to change the thresholds on when to mark a DB2 object as being close to a limit and the ability to save these checkups to a permanent file to enable historical comparisons. This feature is launched by right-clicking on your named database and selecting the Health Center task. The Index & MQT Evaluators allow you to determine how valuable an index or MQT is by allowing you to check how often its used by the DB2 Query Optimizer. This can be helpful for administrators and analysts trying to cleanup unused DB2 objects on their system. To access these evaluators just right-click on the table object and select the appropriate Show task. If you want to determine if an SQL statement is behind a slow-moving application that an end user is complaining about, than iSeries Navigator Show Current SQL is a great tool to use. The tool has been available since V5R2, but in V5R4 the tool returns more attributes about the SQL request such as program name to help you more quickly identify the issue. This tool can be accessed by rightclicking on the Databases icon in the iSeries Navigator tree.

Additional Education
Lecture on DB2 for i5/OS Overview of V5R4 DB2 for i5/OS new features and functions Review "why System i for BI" Introduction to query optimization and execution Review indexing technology and index strategies Review statistics technology and statistics strategies Review SQL interfaces and best practices for performance Review SQL performance monitoring and analysis tools Relatore MIKE CAIN, Rochester DB2 for i5/OS Competency Center assistente Simona Pacchiarini, System i PreSales DATA: 5 giugno CODICE CORSO: OLB04IT DURATA: 2 giorni LINGUA: inglese con affiancamento PREZZO : euro 490,00 + IVA a partecipante SEDE: IBM Segrate (prenotata aula Volta) Iscrizioni alla pagina http://www-05.ibm.com/services/learning/it/ta-iris.nsf/(ExtCourseNr)/OLB04IT Lab with DB2 for i5/OS Brief review of query optimizer and analysis tools Use V5R4 iSeries Navigator monitoring and analysis tools Work through exercises to understand and tune SQL requests Relatore MIKE CAIN, Rochester DB2 for i5/OS Competency Center assistente Simona Pacchiarini, System i PreSales DATA: 7 giugno CODICE CORSO: OLB05IT DURATA: 1 giorno LINGUA: inglese con affiancamento PREZZO : euro 300,00 + IVA a partecipante SEDE: Milano, Via Frigia 27 Iscrizioni alla pagina http://www-05.ibm.com/services/learning/it/ta-iris.nsf/(ExtCourseNr)/OLB05IT
2006 IBM Corporation 68

Additional Information
DB2 for i5/OS home page - ibm.com/iseries/db2 DB2 Navigator lab, downloadable
ftp://www.redbooks.ibm.com/redbooks/2006_System_i5_Forum_Session_files/2006DBL01/

Education Resources - Classroom & Online


http://www.iseries.ibm.com/db2/gettingstarted.html http://ibm.com/servers/enable/site/education/ibo/view.html?oc#db2

DB2 for i5/OS Publications


White Papers: ibm.com/servers/enable/site/education/ibo/view.html?wp#db2 Online Manuals: http://www.iseries.ibm.com/db2/books.htm Porting Help: http://ibm.com/servers/enable/site/db2/porting.html DB2 for i5/OS Redbooks (http://ibm.com/redbooks)
Stored Procedures, Triggers, and UDFs on DB2 UDB for iSeries (SG24-6503) Preparing for the V5R2 SQL Query Engine (SG24-6598) Modernizing iSeries Application Data Access (SG24-6393) SQL Performance Diagnosis on DB2 UDB for iSeries (SG24-6654) The Ins and Outs of XML and DB2 for i5/OS (SG24-7258) DB2 UDB for iSeries Administration: The Graphical Way on V5R3 (SG24-6092)

2006 IBM Corporation

69

Das könnte Ihnen auch gefallen