Sie sind auf Seite 1von 11

Copyright IBM Corporation 2012 Trademarks

Improve index analysis with DB2 10.1 Page 1 of 11


Improve index analysis with DB2 10.1
Taking advantage of usage lists
Paul Turpin (paul.turpin@aciworldwide.com)
Senior Software Engineer
ACI Worldwide
01 November 2012
One of the most challenging parts of index design is verifying that IBM DB2 is using a
particular index in a way that was intended. Explain output for an SQL operation can show
whether an index is being utilized, but it can be extremely difficult and time consuming to
generate Explain data for every SQL statement found in an application. Not to mention the
fact that looking for index usage in the volumes of Explain data that such an endeavor might
produce would be extremely tedious. Furthermore, many applications use dynamic SQL that is
generated on-the-fly, which means that one or more SQL statements coded in an application
can have hundreds or even thousands of variations. And each variation can result in a different
data access path, with some utilizing available indexes and others relying solely on table scans.
This article is designed to introduce you to an object and a set of new table functions that can
be used to analyze index usage in DB2 10.1 for Linux, UNIX, and Windows databases.
Why use an index?
Indexes serve two major purposes. They provide a logical ordering of the rows in a table, and
they speed up data access by providing a fast, efficient method for locating specific rows of data.
When properly defined, indexes can also guarantee uniqueness of records stored in a table. But,
this functionality does not come without a cost, as additional overhead is needed to maintain
indexes when INSERT, UPDATE, and DELETE operations are performed. And in some environments,
such as On-Line Transaction Processing (OLTP) environments, this extra overhead can reduce
performance to unacceptable levels. Thus, there is a delicate balance between using indexes to
speed up data access and the impact to performance that is introduced by having indexes. And it
is critical, particularly in OLTP environments, to ensure that any indexes that exist are being used
effectively.
Monitoring index usage with db2pd
The DB2 Problem Determination tool (db2pd) is a handy diagnostic tool that can be used to obtain
specific information about a DB2 environment. Unlike other monitoring tools, db2pd attaches to
DB2 shared memory sets to obtain information; because it doesn't go through the DB2 engine,
developerWorks ibm.com/developerWorks/
Improve index analysis with DB2 10.1 Page 2 of 11
db2pd does not have to compete for database resources, making it very lightweight and efficient.
And since db2pd works directly with memory, it can retrieve data quickly in a very non-intrusive
manner.
More than 50 commands and options can be used to control the type of information that db2pd
collects, and one option in particular, -tcbstats, can be used to tell db2pd to capture only
information about index usage. For example, to use db2pd to obtain information about how indexes
defined for a database named SAMPLE are being utilized, you would execute a command as
shown in Listing 1.
Listing 1. Use db2p command
db2pd -db sample -tcbstats index
Analyzing the output produced by this command can be difficult, because indexes are referenced
by their index identifier (IID) rather than by name. The format of the output produced is also
challenging to work with; there is a technique for parsing -tcbstats output to extract index usage
data, however, the process is a bit convoluted and can easily be broken if the db2pd output format
changes.
The technique consists of parsing the -tcbstats output to extract a subset of index usage
information, and loading the information retrieved into a table that has been assigned a name
like INDEX_USAGE. Explain information is then generated for a number of SQL queries, and the
INDEX_USAGE table is later joined with the SYSCAT.TABLES and SYSCAT.INDEXES system
catalog views and one or more Explain tables to determine which SQL statements are utilizing
which indexes. Unfortunately, it's easy to miss important SQL statements with this technique.
Another issue is that, because data retrieved by db2pd is stored in various DB2 memory sets, the
data goes away when the database being examined is deactivated. Consequently, index usage
information that has been collected over days or weeks can disappear in seconds. The information
about index usage that is returned by db2pd has changed over time, and with DB2 10.1, index
usage information is only collected for DML operations.
Monitoring index usage with administrative table functions
In DB2 9.7, improvements were made to the ways in which information about index
usage information can be obtained. For example, a LASTUSED column was added to the
SYSCAT.INDEXES system catalog view. And because this information is stored in a table, as
opposed to DB2 memory sets, the information is retained long after a database is deactivated.
More importantly, this information is maintained by a background daemon process named
db2lused, and is updated every fifteen minutes. The MON_GET_INDEX() table function was also
added, greatly simplifying the extraction of index usage information.
This function returns information about indexes that have been used since the last database
activation. In fact, the same information that is returned by the db2pd -tcbstats index command
can be obtained using the MON_GET_INDEX() table function. A query that uses this function to obtain
information about how indexes being used looks similar to what is shown in Listing 2.
ibm.com/developerWorks/ developerWorks
Improve index analysis with DB2 10.1 Page 3 of 11
Listing 2. Query to obtain information using indexes
SELECT SUBSTR(S.INDSCHEMA, 1, 10) AS indschema,
SUBSTR(S.TABNAME, 1, 30) AS table_name,
SUBSTR(S.INDNAME, 1, 30) AS index_name,
T.INDEX_SCANS,
T.INDEX_ONLY_SCANS
FROM TABLE(MON_GET_INDEX('','', -2)) AS t, SYSCAT.INDEXES AS s
WHERE T.TABSCHEMA = S.TABSCHEMA
AND T.TABNAME = S.TABNAME
AND T.IID = S.IID
AND S.INDSCHEMA NOT LIKE 'SYS%'
ORDER BY S.TABNAME, S.INDNAME DESC
This particular query will show the number of index scans where the results of a query were
satisfied by using an index to locate data in a table, and index-only scans (where the results of a
query were satisfied by accessing the index only) for each index in a database, sorted by table
name, and then by index name. This information is quite valuable, but with this information alone, it
can be difficult to determine what SQL statement(s) actually caused an index to be used.
Monitoring index usage with usage lists
In DB2 10.1, a new database object known as a usage list was introduced to help with table and
index usage monitoring. As the name implies, a usage list is a database object that is used to
record information about DML statements that reference a particular table or index. Each entry
in a usage list contains information about the number of times a particular DML operation was
performed, as well as aggregate statistics that indicate how each statement affected the table or
index during execution. Usage lists are created by executing the CREATE USAGE LIST command;
the basic syntax for this command is shown in Listing 3.
Listing 3. CREATE USAGE LIST basic syntax
CREATE USAGE LIST [ULName]
FOR [TABLE | INDEX] [ObjectName]
<LIST SIZE [100 | NumEntries]>
<WHEN FULL [WRAP | DEACTIVATE]>
<[INACTIVE | ACTIVE] ON START DATABASE>
Where:
ULName identifies the name that is to be assigned to the usage list being created.
ObjectName identifies by name, the existing table or index the usage list is to be created for.
NumEntries identifies the maximum number of entries that can be stored in the usage list
being created.
Note: Parameters or options shown in normal brackets ([ ]) are required; parameters/options
shown in angle brackets (< >) are not.
Once a usage list has been created and activated, two monitor table functions that were also
introduced in DB2 10.1 - MON_GET_TABLE_USAGE_LIST() and MON_GET_INDEX_USAGE_LIST() - can be
used to obtain and format any usage information collected.
developerWorks ibm.com/developerWorks/
Improve index analysis with DB2 10.1 Page 4 of 11
An example of how to use a usage list to monitor index usage
To demonstrate how usage lists can be used to monitor index usage, look at some example
scenarios that were developed using the PRODUCT table that is part of the SAMPLE database
that ships with DB2 for Linux, UNIX, and Windows. This database can be constructed by executing
the command db2sampl.
If you examine the physical design of the SAMPLE database carefully, you will discover that an
index named PK_PRODUCT is the only REGULAR index that has been defined on the PRODUCT
table. Any of the DB2 commands/SQL statements shown in Listing 4 can be used to determine
what columns were used to define the PK_PRODUCT index.
Listing 4. Sample DB2 commands/SQL statements
DESCRIBE INDEXES FOR TABLE product SHOW DETAIL
SELECT SUBSTR(INDNAME, 1, 20) AS index_name, COLNAMES
FROM SYSCAT.INDEXES
WHERE TABNAME = 'PRODUCT'
db2look -d SAMPLE -e -tw PRODUCT -o PRODUCT.DDL
To create an index usage list named PK_PRODUCT_USE for the PK_PRODUCT index, simply
execute the SQL statement shown in Listing 5.
Listing 5. Sample SQL statement for index usage
CREATE USAGE LIST pk_product_use FOR INDEX PK_PRODUCT
LIST SIZE 500
WHEN FULL WRAP
This particular statement sets the maximum number of entries allowed in the usage list being
created to 500, and tells DB2 that when the usage list is full, it is to wrap and overwrite earlier
entries that were stored in the list.
Before a usage list can be used to record information about DML statements that reference a table
or index, it must first be activated. If the ACTIVE ON START DATABASE clause was specified with the
CREATE USAGE LIST statement that was used to create it, a usage list will be activated automatically
when the database the list is stored in is activated or the first time a connection to the database is
established. Otherwise, a usage list can be activated by executing the SET USAGE LIST statement.
The statement shown in Listing 6 can be used to activate the PK_PRODUCT_USE usage list that
was created previously.
Listing 6. Activate PK_PRODUCT_USE list
SET USAGE LIST pk_product_use STATE ACTIVE
Beginning with DB2 10.1, the ALTER SQL statement can be used to change the definition of a
usage list that was created previously, for example, the statement shown in Listing 7 will change
the number of entries allowed in the specified usage list to 1,000.
ibm.com/developerWorks/ developerWorks
Improve index analysis with DB2 10.1 Page 5 of 11
Listing 7. Use ALTER SQL statement
ALTER USAGE LIST pk_product_use LIST SIZE 1000
On the other hand, the statement shown in Listing 8 will change the behavior of the usage list such
that when the list becomes full, DB2 will deactivate it, thereby preventing data that has already
been collected from being overwritten.
Listing 8. Statement changing behavior of usage list
ALTER USAGE LIST pk_product_use WHEN FULL DEACTIVATE
To obtain a list of all usage lists that have been defined for a particular database, including those
that have been assigned a state of Released, which indicates that memory associated with the
usage list has been released, query the SYSCAT.USAGELISTS system catalog table with a SELECT
statement that looks similar to Listing 9.
Listing 9. Using the SELECT statement
SELECT * from SYSCAT.USAGELISTS
To determine the status of one or more usage lists that are not in the Released state, construct a
query that calls the MON_GET_USAGE_LIST_STATUS() monitor function instead, as shown in Listing
10.
Listing 10. Monitor function
SELECT * FROM TABLE(MON_GET_USAGE_LIST_STATUS(NULL, NULL, -2))
Once a usage list for an index has been activated, it will begin collecting information about DML
statements that access the index for which it was defined. And, as was mentioned previously,
information stored in a usage list for an index can be obtained by constructing a query that utilizes
the MON_GET_INDEX_USAGE_LIST() table function. For example, the query shown in Listing 11 could
be used to obtain information stored in the PK_PRODUCT_USE usage list you created previously.
Listing 11. Obtain information in PK_PRODUCT_USE usage list
SELECT NUM_REFERENCES, EXECUTABLE_ID
FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT_USE', -2))
The second parameter passed to the MON_GET_INDEX_USAGE_LIST() table function is the name
of a specific usage list; if you specify the NULL value for this parameter, the function will return
information for all usage lists that have been defined and are active. Thus, if you have multiple
usage lists defined, and one or more are active, and you want to obtain information from all of
them simultaneously, then you can do so by executing a query that looks similar to Listing 12.
Listing 12. Obtain information simultaneously
SELECT
SUBSTR(USAGELISTNAME,1,20) AS usagelistname,
SUBSTR(INDNAME,1,20) AS indname, NUM_REFERENCES, EXECUTABLE_ID
FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2))
developerWorks ibm.com/developerWorks/
Improve index analysis with DB2 10.1 Page 6 of 11
There is additional information that can be returned by the MON_GET_INDEX_USAGE_LIST() table
function, such as the number of logical index reads and the number of physical index reads, but
when trying to determine whether or not a particular index is being utilized, this information is not
needed.
Notice that in both examples shown, EXECUTABLE_ID values were retrieved. Every entry
in the package cache is assigned a unique EXECUTABLE_ID. Therefore, by examining the
EXECUTABLE_ID it is possible to determine which SQL statements are associated with
the use of a particular index. Joining the results of the MON_GET_INDEX_USAGE_LIST() and
MON_GET_PKG_CACHE_STMT() table functions on the EXECUTABLE_ID column is a powerful way
to determine which SQL statements are causing a particular index to be utilized by the DB2
Optimizer.
Going back to the PRODUCT table in the SAMPLE database, if the SQL statements shown in
Listing 13 are executed, the DB2 Optimizer should elect to use the PK_PRODUCT index to deliver
optimum query performance.
Listing 13. Use PK_PRODUCT index
SELECT pid FROM PRODUCT WHERE pid = '100-201-01';
SELECT pid FROM PRODUCT WHERE pid = '100-201-01';
SELECT pid FROM PRODUCT WHERE pid = '100-103-01';
Assuming a usage list has been created and activated for the PK_PRODUCT index, the following
query, which joins the MON_GET_PKG_CACHE_STMT() and MON_GET_INDEX_USAGE_LIST() table
functions on the EXECUTABLE_ID column, should produce a report that verifies that the SQL
statements just executed are indeed using the PK_PRODUCT index, as shown in Listing 14.
Listing 14. Query joining table functions
SELECT
SUBSTR(i.INDNAME,1,14) AS index_name,
SUBSTR(t.STMT_TEXT,1,50) AS sql_statment, t.NUM_EXECUTIONS
FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t,
TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT', -2)) AS i
WHERE t.EXECUTABLE_ID = i.EXECUTABLE_ID
ORDER BY NUM_EXECUTIONS DESC;
And if that is the case, the output produced should look similar to Listing 15.
Listing 15. Example output
INDEX_NAME SQL_STATEMENT NUM_EXECUTIONS
-------------- -------------------------------------------------- --------------
PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-201-01' 2
PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-103-01' 1
In this example, the results are sorted so that the most frequently used statements are presented
first. To limit the results to just the top 20 statements, add the FETCH FIRST 20 ROWS ONLY clause to
the query used.
ibm.com/developerWorks/ developerWorks
Improve index analysis with DB2 10.1 Page 7 of 11
There are two things to keep in mind when using the Package Cache in this manner. First, you
need to make sure that the value of the pckcachesz database configuration parameter has
been set high enough to store all the SQL statements that you plan to test. In this example, the
pckcachesz configuration parameter was set to 10000, because only a few simple SQL statements
were being evaluated. In your environment, it may be necessary to set this parameter much
higher.
Second, during testing, make sure that a dynamic flush package cache operation is NOT
performed. Otherwise, you run the risk of losing the SQL statement text data you were hoping to
capture.
Previously, it was mentioned that a usage list is associated with a single table or index. However,
often it is important to look at index usage for all indexes that have been defined for a given table.
There is a small amount of overhead associated with managing multiple usage lists, but having
multiple usage lists available is much more efficient, in terms of person-hours required, if you want
to examine at all the index usage at one time. Therefore, it is always better to create a usage list
for each index for a given table.
Suppose three additional indexes were created for the PRODUCT table in the SAMPLE database,
using the SQL statements shown in Listing 16.
Listing 16. Add three indexes
CREATE INDEX product_gi1 ON PRODUCT (name ASC);
CREATE INDEX product_gi2 ON PRODUCT (promostart ASC, promoend ASC);
CREATE INDEX product_gi3 ON PRODUCT (price ASC);
The system catalog for the SAMPLE database can help us generate the CREATE USAGE LIST and
ACTIVATE USAGE LIST statements that are needed to create and activate usage lists for each of
these new indexes. For instance, the queries shown in Listing 17 could be used to create the
necessary DDL.
Listing 17. Queries to create DDL
SELECT
'CREATE USAGE LIST '
CONCAT INDNAME
CONCAT ' FOR INDEX '
CONCAT RTRIM(INDSCHEMA) CONCAT '.'
CONCAT INDNAME
CONCAT ' LIST SIZE 500 WHEN FULL WRAP ;'
FROM SYSCAT.INDEXES i
WHERE TABNAME = 'PRODUCT'
AND INDSCHEMA NOT LIKE 'SYS%' i
AND INDEXTYPE = 'REG'
ORDER BY INDNAME;
SELECT
'SET USAGE LIST '
CONCAT INDNAME
CONCAT ' STATE ACTIVE;'
FROM SYSCAT.INDEXES i
WHERE TABNAME = 'PRODUCT'
developerWorks ibm.com/developerWorks/
Improve index analysis with DB2 10.1 Page 8 of 11
AND INDSCHEMA NOT LIKE 'SYS%'
AND INDEXTYPE = 'REG'
ORDER BY INDNAME;
When these two queries are executed, they generate DDL that can be used to both create and
activate four usage lists - one for each index that has been defined for the PRODUCT table.
Similar queries can be used to drop the usage lists as well.
Once the appropriate usage lists have been created and activated, a test designed to execute all
of the SQL statements against the table that you want to gather index usage information for should
be developed and run. For example, the SQL statements (in addition to the previous three) shown
in Listing 18 could be used to evaluate how the new indexes that were defined for the PRODUCT
table earlier are being utilized.
Listing 18. An example of using SQL statements to evaluate the new indexes
SELECT pid FROM product
WHERE name = 'SNOW SHOVEL, BASIC 22 INCH';
SELECT pid FROM product
WHERE name = 'SNOW SHOVEL, SUPER DELUXE 26 INCH';
SELECT pid, name FROM product WHERE price = 19.99;
SELECT pid, name FROM product WHERE price = 49.99;
SELECT pid, name, price FROM product WHERE promostart = '11/19/2004';
SELECT pid, name, price FROM product
WHERE promostart = '12/22/2005' AND promoend = '02/22/2006';
Once these SQL statements are executed, the query, as shown in Listing 19, can be used to
determine which, if any, of the new indexes were chosen by the DB2 Optimizer.
Listing 19. Query to determine which new indexes were chosen by the DB2
Optimizer
SELECT
SUBSTR(i.INDNAME,1,12) AS index_name,
SUBSTR(t.STMT_TEXT,1,59) AS sql_statment, t.NUM_EXECUTIONS
FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t,
TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2)) AS i
WHERE t.EXECUTABLE_ID = i.EXECUTABLE_ID
ORDER BY NUM_EXECUTIONS DESC
And when this query is executed, the output produced should look similar to Listing 20.
Listing 20. Output of the executed query
INDEX_NAME SQL_STATEMENT NUM_EXECUTIONS
------------ ----------------------------------------------------------- --------------
PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-201-01' 2
PRODUCT_GI2 SELECT pid, name, price FROM product WHERE promostart = '11 1
PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-103-01' 1
PRODUCT_GI1 SELECT pid FROM product WHERE name = 'SNOW SHOVEL, BASIC 22 1
PRODUCT_GI3 SELECT pid, name FROM product WHERE price = 49.99 1
PRODUCT_GI1 SELECT pid FROM product WHERE name = 'SNOW SHOVEL, SUPER DE 1
PRODUCT_GI2 SELECT pid, name, price FROM product WHERE promostart = '12 1
PRODUCT_GI3 SELECT pid, name FROM product WHERE price = 19.99 1
ibm.com/developerWorks/ developerWorks
Improve index analysis with DB2 10.1 Page 9 of 11
Because the second parameter of the MON_GET_INDEX_USAGE_LIST() table function is set to NULL,
the function will return information for all active index usage lists. Assuming all SQL statements
that query a given table have been executed, the results produced by this query should show how
all the indexes available are being utilized. In this case, all of the queries used were designed
such that the DB2 Optimizer would utilize every index available. However, in a real-world situation,
testing would most likely reveal unused indexes or indexes that are being used inefficiently.
Conclusion
As DB2 has evolved over the years, it has become easier to get index usage information. And with
DB2 10.1, tracking down the exact SQL statements that are using a particular index has gotten
much easier. By using index usage lists and the new MON_GET_INDEX_USAGE_LIST() table function
combined with the older MON_GET_PKG_CACHE_STMT() table function, it is now possible to quickly
evaluate whether the DB2 Optimizer is taking advantage of indexes that have been defined.
developerWorks ibm.com/developerWorks/
Improve index analysis with DB2 10.1 Page 10 of 11
Resources
Learn
Visit the developerWorks Information Management zone to find more resources for DB2
developers and administrators.
Stay current with developerWorks technical events and webcasts focused on a variety of
IBM products and IT industry topics.
Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and
tools as well as IT industry trends.
Follow developerWorks on Twitter.
Watch developerWorks on-demand demos ranging from product installation and setup demos
for beginners, to advanced functionality for experienced developers.
Get products and technologies
Build your next development project with IBM trial software, available for download directly
from developerWorks.
Evaluate IBM products in the way that suits you best: Download a product trial, try a product
online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox
learning how to implement Service Oriented Architecture efficiently.
Discuss
Participate in the discussion forum for this content.
ibm.com/developerWorks/ developerWorks
Improve index analysis with DB2 10.1 Page 11 of 11
About the author
Paul Turpin
Paul Turpin is a Senior Software Engineer at ACI Worldwide. He has been working
with DB2 for over 17 years and has been a DB2 DBA for more than 15 years. Paul
holds the IBM Certified Database Administrator - DB2 10.1 for Linux, UNIX, and
Windows certification, as well as the IBM Certified Database Administrator - DB2 10
for z/OS certification. He serves on the Board of Directors for the International DB2
Users Group (IDUG) and has been an IBM Champion since 2008.
Copyright IBM Corporation 2012
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Das könnte Ihnen auch gefallen