You are on page 1of 42

International Oracle Users Group Live 2004

Technical Session #549:

Understanding Indexes
Tim Gorman Principal - SageLogix, Inc.
Email: tim@sagelogix.com
www.SageLogix.Com

Agenda
B*Tree index architecture Issues with B*Tree indexes in Oracle
Real issues
Sparsely-populated indexes Contention on INSERT Uneven data distribution Low data cardinality (a.k.a. selectivity)

Imagined or mythical issues


Indexes needing rebalancing
???

Some interesting options for indexing


Function-based, descending, compression
2

B*Tree index architecture


Oracle implements balanced B*Tree indexes
as its primary indexing method
Tree structure consisting of root, branch, and leaf nodes each node is one database block in Oracle root node is the top-level branch node starting point for searches into the index branch nodes are connectors point to other branch nodes or to leaf nodes leaf nodes contain data values and pointers to rows
3

B*Tree index architecture


Branch entries contain: Index: PK_X Max data value piece DBA of next level branch or leaf root Leaf entries contain: Data value ROWID of table row branch branch branch

branch

leaf

leaf

leaf

leaf

leaf Table: X

leaf

leaf

B*Tree index architecture


Index read I/O is sequential in nature
Oracle wait-event for single-block I/O requests used during indexed access db file sequential read Used with RANGE, UNIQUE, and FULL scans FAST FULL index-scans are an exception Uses sequential multiblock I/O similar to FULL tablescans Conventional wisdom that tables and indexes must be separated to different tablespaces to enable some form of parallel I/O is a myth Think of how a treasure hunt is performed.
5

B*Tree index architecture


When indexes are built on populated
tables using CREATE INDEX or ALTER INDEX xxx REBUILD
1. Table is scanned and column values are sorted 2. Leaf blocks of index are populated first 3. Then supporting branch levels are built to support those leaves

All blocks are filled up to PCTFREE


threshold
6

B*Tree index architecture


When indexes are built transactionally
using conventional-path INSERT or UPDATE statements
How the indexes grow is dependent upon two possible design decisions Optimize either for: random data values

Or:
sequentially ascending data values
7

B*Tree index architecture


If optimizing for random data values
As blocks fill, split them in half, move upper half to new block, leave lower half behind Leaves behind half-full blocks Allows back-filling as lower data values occur

If optimizing for sequential data values


As blocks fill, simply overflow into a new block Leaves behind full blocks No need to worry about back-filling as lower data values could never occur

B*Tree index architecture


Overflow: Grows like this

Overflowing uses space more efficiently in a heap


(non-sorted) data structure Also uses space more efficiently when optimizing sequential data values

Split:

Grows like this

Splitting anticipates back-filling in a sorted data


structure Therefore uses space more efficiently when optimizing random data values
9

B*Tree index architecture


If the last index entry inserted is the highest
data value in the block
Then data will overflow into the next block on the same level

If the last index entry inserted is not the


highest data value in the block
Then the current block will split, with the lower set of values remaining in place and the higher set of values moving to the next block on the same level

10

B*Tree index architecture


If the branch block above cannot
accommodate another child
Then this algorithm is applied recursively first

If this algorithm iterates recursively all the


way back to the root block, and the root becomes full
Then the root splits/overflows to generate a new branch level (BLEVEL)
Root block stays in place, generates two new blocks at a new 1st BLEVEL below
11

B*Tree index architecture


Balanced B*Tree indexes grow balanced by
design, automatically
New levels are added to the tree structure above, dynamically, as needed

Indexes do not shrink automatically when


deletions occur
Instead, empty index entries are left in place For possible reuse by newly inserted data ROWID (pointer to data) portion is set to NULL Deletions may cause an index to become sparse and thus less efficient over time
12

Issues with B*Tree indexes

B*Tree indexes optimize:


Both the random and sequential data values! Even distribution of distinct data values High data cardinality or selectivity of values Sparseness Resulting from data deletions
13

As a result, issues with B*Tree indexes are:

Block contention when inserting sequential data Uneven distribution of data values Popular and unpopular data values Low data cardinality

Sparsely-populated indexes
Indexes never become unbalanced over time
If so, then why does performance sometimes deteriorate over time?

Instead, indexes can become sparsely


populated due to:
Deletions of row data Unfortunate patterns of inserted random data

Sparseness simply means that more I/O is


required to perform the same work

14

Sparsely-populated indexes
How is this condition detected?
ANALYZE INDEX VALIDATE STRUCTURE command Populate session-private view named INDEX_STATS with only one row Column PCT_USED is the average percentage of space utilized in the blocks belonging to the index Derived from the ratio of the value of the columns USED_SPACE and BTREE_SPACE Expect PCT_USED to be 90 by default Lesser values may indicate sparseness developing Best to watch values over time
15

Sparsely-populated indexes
How is this condition detected?
If PCT_USED is much less than 90 Non-zero values in the column DEL_LF_ROWS Cause of sparseness is probably row deletions, if the value in the column DEL_LF_ROWS is a large percentage of value in column LF_ROWS

16

Sparsely-populated indexes

Rebuild or coalesce are two possible solutions for index sparseness


Dont guess!!!

Make certain using ANALYZE INDEX VALIDATE STRUCTURE


Be aware that the ANALYZE command locks the index, however

To rebuild an index:
Use ALTER INDEX REBUILD Use ALTER INDEX COALESCE

17

To coalesce space within an index:

Sparsely-populated indexes
ALTER INDEX REBUILD is the most commonlyused solution for sparsely-populated indexes
Unlike CREATE INDEX, uses the existing index as the source, which is faster because: Index is usually smaller than the table (less I/O) Index is already sorted (some sorting still needed) Other features of ALTER INDEX REBUILD include: Parallel execution (throw more resources at task) Direct-path operations (no rollback/undo generated)

NOLOGGING (no redo generated)


COMPUTE STATISTICS (better CBO stats cheaply) ONLINE (allows full usage of index during task)

18

Sparsely-populated indexes
ALTER INDEX COALESCE is another
alternative to rebuilding
Merges unused space within indexes to free blocks for reuse

Slower than ALTER INDEX REBUILD


because it is a transactional operation, not a bulk operation
No direct-path, parallel, nologging, etc

But COALESCE is implicitly an ONLINE operation

19

Contention on INSERT
Multiple concurrently-executing INSERTs into
an index on column(s) with sequential data values can also bottle-neck on buffer busy waits (a.k.a. block-level contention)
Pseudo-randomizing such data with REVERSE key indexes relieves this performance problem Instead of many processes attempting to insert index entries into the right-most leaf block

Insertions are evenly scattered over all of the available leaf blocks

20

Contention on INSERT
[ CREATE | ALTER ] INDEX REVERSE
Pseudo-randomizes non-random data By flipping or reversing the physical order of the data value during storage Causes sequential data values to become more random by simply reversing data 123456 becomes 654321 123457 becomes 754321, etc... Index data values are transparently converted (flipped) and unconverted (unflipped) upon insert and retrieval, respectively

21

Contention on INSERT
Adverse impacts of using REVERSE indexes:
only equivalence operations will use the index =, !=, <>, IN, and NOT IN range-scans will not use the index >, >=, <, <=, LIKE, BETWEEN

But, is this really a problem?


Depends on how the sequential data values are used: Sequence numbers or surrogate system-generated keys are usually sought using equivalence operations

No problem here
But timestamps are often range-scanned There is likely a problem with these situations

22

Uneven distribution of data values


Basic rule of thumb for using indexes:
Use indexes when selecting unpopular data Otherwise, use FULL table-scan Or a different index? Or partitioning? Or

By default, the Oracle cost-based optimizer


(CBO) assumes even data distribution
Only LOVAL, HIVAL, and DISTINCT_KEYS gathered during ANALYZE or DBMS_STATS

Real-life often invalidates this assumption


Selectively gathering column-level statistics provides the CBO with information on data values which are popular and which are unpopular
23

Uneven distribution of data values


There are two commands for gathering CBO
statistics
ANALYZE TABLE FOR [ ALL | INDEXED ] COLUMNS DBMS_STATS.GATHER_COLUMN_STATS

Populates data dictionary views


DBA_TAB_HISTOGRAMS and DBA_PART_HISTOGRAMS with rows representing buckets of data values
All buckets assumed to have the same number of rows Each bucket is defined by its highest value By default, only one row populated
24

Uneven distribution of data values


Please dont gather column-level statistics
unless the problem is proven
Gathering column-level statistics is not a good choice as a default operation Perform SQL tuning with SQL Trace/TKPROF or STATSPACK to provide proof of a problem

If the CBO fails to choose an index


it could be because it has detected low cardinality on average

If the CBO chooses an index


it could perform poorly if the data value is overly popular
25

Illustrating the importance of data


SQL> create table t1 2 ( c1 varchar2(30), 3 c2 number, 4 c3 number 5 ) tablespace tools; Table created. SQL> begin 2 for i in 1..100000 loop 3 insert into t1 4 values(to_char(mod(i,187)), i, mod(i,187)); 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed.
26

Illustrating the importance of data


SQL> create index i1 on t1(c1) tablespace tools; Index created. SQL> analyze table t1 compute statistics; Table analyzed. SQL> set autotrace on SQL> select c1 from t1 where c1 = '10000'; no rows selected Execution Plan ----------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=535 Bytes=1605) 1 0 TABLE ACCESS (FULL) OF 'T1' (Cost=37 Card=535 Bytes=1605)

27

Illustrating the importance of data


SQL> select num_rows, blocks 2 from user_tables 3 where table_name = 'T1'; NUM_ROWS BLOCKS ------------- --------100,000 232 SQL> select num_rows, distinct_keys, 2 avg_leaf_blocks_per_key, avg_data_blocks_per_key 3 from user_indexes where index_name = 'I1'; Avg Leaf Avg Data Distinct Blocks Blocks Nbr Rows Key Per Key Per Key -------- -------- ---------- --------100,000 187 1 231
28

Illustrating the importance of data


SQL> truncate table t1

Table truncated.
SQL> begin 2 for i in 1..100000 loop 3 insert into t1 4 values(to_char(round(i/187,0)), i, 5 round(i/187,0)); 6 end loop; 7 end; 8 / PL/SQL procedure successfully completed.
29

Illustrating the importance of data


SQL> create index i1 on t1(c1) tablespace tools; Index created. SQL> analyze table t1 compute statistics; Table analyzed. SQL> set autotrace on SQL> select c1 from t1 where c1 = '10000'; no rows selected Execution Plan ----------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=187 Bytes=561) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=187 Bytes=561) 2 1 INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=1 Card=187)

30

Illustrating the importance of data


SQL> select num_rows, blocks 2 from user_tables 3 where table_name = 'T1'; NUM_ROWS BLOCKS ------------- --------100000 242 SQL> select num_rows, distinct_keys, 2 avg_leaf_blocks_per_key, avg_data_blocks_per_key 3 from user_indexes where index_name = 'I1'; Avg Leaf Avg Data Distinct Blocks Blocks Nbr Rows Key Per Key Per Key -------- -------- ---------- --------100000 536 1 1
31

Low data cardinality


B*Tree indexes work best with high
cardinality
Think of the example of a telephone book Would you use the standard indexing method to find all occurances of a name comprising 25% of the book? Or, would it be better to just scan through the book?

32

Low data cardinality


Bitmap indexes are extremely compact
That is their primary advantage No real magic involved Less I/O is faster, plain and simple

Still, no matter how compact they are


a FULL table scan is still likely faster than a single bitmap index on low cardinality data However, merging two or more bitmap indexes brings out the real power of the mechanism Scanning of merged bitmaps using fast lowlevel bitmasking operations Sifts through huge numbers of rows quickly
33

Low data cardinality


Bitmap indexes are designed for quickly scanning low
cardinality data
each database block is comprised of two bitmap segments each bitmap segment is comprised of a bitmap, a ROWID list, and a list of distinct data values
ROWID list is a forward-compressed list of ROWIDs, positioned according to bits in bitmap Bitmap is organized into columns for distinct data values and rows of bits representing rows

ROWID list Distinct values list Bitmap ROWID list Distinct values list Bitmap

More distinct data values means less room for rows within each data value

34

Low data cardinality


Bitmap indexes extremely slow during DML
INSERT, UPDATE, and DELETE

For each change, entries within the bitmap segments must be:
Expanded/decoded Manipulated Re-encoded

For multiple concurrently-executing DML changes


Slower individual changes cause queueing Only two bitmap segments per block increases contention

Because of this, bitmap indexes are most feasible on partitioned tables in non-transactional applications
Data changes should be performed in bulk using some variation on the technique of EXCHANGE PARTITION

35

Function-based indexes
Indexes can be based on functions and
expressions
create index xxx_ix1 on xxx(upper(c1)); create index xxx_ix2 on xxx(upper(c1)||yadda_yadda(c2));

36

Function-based indexes
User-defined functions usable along with standard
built-in functions

Restrictions for using user-defined functions


But functions and expressions must be deterministic repeatable: given the same inputs, the function or expression must always provide the same results

user-defined functions cannot access tables or package variables


PRAGMA RESTRICT_REFERENCES(, RNDS, WNDS, RNPS, WNPS) CREATE FUNCTION DETERMINISTIC

37

Function-based indexes
Function-based indexes must have statistics
gathered before they can be used

System permissions
[ GLOBAL ] QUERY REWRITE needed to CREATE or ALTER REBUILD

Parameter QUERY_REWRITE_ENABLED = TRUE to


use

38

Descending indexes
Implemented as a form of function-based
indexes
CREATE INDEX DESC data values in index leaf blocks are sorted in descending order must have statistics before it will be utilized not usable with rule-based optimizer system permission QUERY REWRITE or GLOBAL QUERY REWRITE needed to CREATE or ALTER REBUILD parameter QUERY_REWRITE_ENABLED not necessary to use
39

Index compression
Since Oracle8i, compression has been available on
B*Tree indexes:
Reduces repeated storage of prefix column data values Index compression [ COMPRESS [ #-prefix-cols ] | NOCOMPRESS ] UNIQUE: default #-prefix-cols is (#-cols)-1 NONUNIQUE: default #-prefix-cols is #-cols Same syntax for IOT in USING clause as for index compression DML is supported on compressed indexes, but it becomes much slower Index compression is best used for read-mostly or readonly situations Mixing of compressed and uncompressed index partitions is possible
40

Solutions for B*Tree indexes

Dont assume that there is anything wrong with a B*Tree index


Prove it first!!!

To address sparse index structures


ALTER INDEX [ REBUILD | COALESCE ]

To address block contention on INSERTs


ALTER INDEX REVERSE

To address uneven data distribution


Gather column-level statistics or histograms

To address low cardinality


Use bitmap indexes or no indexes at all

41

Additional groovy stuff


Function-based, descending, and compressed indexes

Technical Session #549

Slides, paper, and scripts downloadable from http://www.SageLogix.com and http://www.EvDBT.com/papers.htm Email: tim@sagelogix.com
42