Index Tuning

Index Tuning
AOBD07/08
H. Galhardas
Index
An index is a data structure that supports efficient access to data
Condition on attribute value

AOBD2007/08
index
Set of Records
Matching records
(search key)
H. Galhardas
Performance Issues
Type of Query Index Data Structure Organization of data on disk Index Overhead Data Distribution Covering
AOBD2007/08
H. Galhardas
Types of Queries
1.
Point Query
SELECT balance FROM accounts WHERE number = 1023;
3.
Range Query
SELECT number FROM accounts WHERE balance > 10000;
2.
Multipoint Query
SELECT balance FROM accounts WHERE branchnum = 100;
4.
Prefix Match Query

SELECT * FROM employees WHERE name = Jensen and firstname = Carl and age < 30;
AOBD2007/08
H. Galhardas
Types of Queries
5.
Extremal Query
SELECT * FROM accounts WHERE balance = max(select balance from accounts)
7.
Grouping Query
SELECT branchnum, avg(balance) FROM accounts GROUP BY branchnum;
8.
Join Query
SELECT distinct branch.adresse FROM accounts, branch WHERE accounts.branchnum = branch.number and accounts.balance > 10000;
5.
Ordering Query
SELECT * FROM accounts ORDER BY balance;
AOBD2007/08
H. Galhardas
Search Keys

A (search) key is a sequence of attributes.

create index i1 on accounts(branchnum, balance);
Types of keys
Sequential: the value of the key is monotonic with the insertion order (e.g., counter or timestamp) Non sequential: the value of the key is unrelated to the insertion order (e.g., social security number)
AOBD2007/08
H. Galhardas
Data Structures
Most index data structures can be viewed as trees. In general, the root of this tree will always be in main memory, while the leaves will be located on disk.
The performance of a data structure depends on the number of nodes in the average path from the root to the leaf. Data structure with high fan-out (maximum number of children of an internal node) are thus preferred.
AOBD2007/08
H. Galhardas
B+-Tree
A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs.

96 75 83 107
33 48 69
75 80 81
83 92 95
96 98 103
107 110 120
AOBD2007/08
H. Galhardas
B+-Tree Performance
Key length influences fanout

Choose small key when creating an index Key compression
Prefix compression (Oracle 8, MySQL): only store that part of the key that is needed to distinguish it from its neighbors: Smi, Smo, Smy for Smith, Smoot, Smythe. Front compression (Oracle 5): adjacent keys have their front portion factored out: Smi, (2)o, (2)y. There are problems with this approach:

Processor overhead for maintenance Locking Smoot requires locking Smith too.
AOBD2007/08
H. Galhardas
Hash Index
A hash index stores key-value pairs based on a pseudo-randomizing function called a hash function.
Hashed key
values
key
2341
Hash function
0 1
R1 R5 R3 R6 R9
R14 R17 R21
R25
The length of these chains impacts performance
AOBD2007/08
H. Galhardas
Hash Index Performance

The best for answering point queries, provided there are no overflow chains Good for multipoint queries
Useless for range, prefix or extremal queries
Must be reorganized (drop/add or use reorganize function) if there is a significant amount of overflow chaining
Avoiding overflow may require underutilize the hash space
Size of hash structure is not related to the size of a key, because hash functions return keys to locations or page identifiers
Hash functions take longer to execute on a long key
AOBD2007/08
H. Galhardas
Clustered / Non clustered index
Clustered index (primary index)
Non-clustered index (secondary index)
A clustered index on attribute X co-locates records whose X values are near to one another.
A non clustered index does not constrain table organization. There might be several non-clustered indexes per table.
Records
AOBD2007/08
Records
H. Galhardas
Dense / Sparse Index
Sparse index
Dense index
Pointers are associated to pages
Pointers are associated to records Non clustered indexes are dense
P1
AOBD2007/08
P2
Pi
H. Galhardas
record record record
Index Implementations in some major DBMS
SQL Server

Oracle
B+-Tree data structure Clustered indexes are sparse Indexes maintained as updates/insertions/deletes are performed B+-Tree data structure, spatial extender for R-tree Clustered indexes are dense Explicit command for index reorganization
B+-tree, hash, bitmap, spatial extender for R-Tree No clustered index until 10g
DB2
Index organized table (unique/clustered) Clusters used when creating tables.
MySQL
B+-Tree, R-Tree (geometry and pairs of integers) Indexes maintained as updates/insertions/deletes are performed
AOBD2007/08
H. Galhardas
Index Tuning Knobs
Index data structure Search key Size of key Clustered/Non-clustered/No index Covering
AOBD2007/08
H. Galhardas
Types of Queries
1.
Point Query
SELECT balance FROM accounts WHERE number = 1023;
3.
Range Query
SELECT number FROM accounts WHERE balance > 10000;
2.
Multipoint Query
SELECT balance FROM accounts WHERE branchnum = 100;
4.
Prefix Match Query

SELECT * FROM employees WHERE name = Jensen and firstname = Carl and age < 30;
AOBD2007/08
H. Galhardas
Types of Queries
5.
Extremal Query
SELECT * FROM accounts WHERE balance = max(select balance from accounts)
7.
Grouping Query
SELECT branchnum, avg(balance) FROM accounts GROUP BY branchnum;
8.
Join Query
SELECT distinct branch.adresse FROM accounts, branch WHERE accounts.branchnum = branch.number and accounts.balance > 10000;
5.
Ordering Query
SELECT * FROM accounts ORDER BY balance;
AOBD2007/08
H. Galhardas
Benefits of a clustered index

1.

2.
3.
4.
A sparse clustered index stores fewer pointers than a dense index. This might save up to one level in the B-tree index. Nb pointers dense index = nb pointers sparse index * nb records per page If records small compared to pages, there will be many records per page, so sparse has one level less than dense A clustered index is good for multipoint queries White pages in a paper telephone book A clustered index based on a B-Tree supports range, prefix, extremal and ordering queries well A clustered index (on attribute X) can reduce lock contention: Retrieval of records or update operations using an equality, a prefix match or a range condition based on X will access and lock only a few consecutive pages of data
AOBD2007/08
H. Galhardas
Evaluation of Clustered Index
clustered 1
Throughput ratio
nonclustered
no index
0.8 0.6 0.4 0.2 0 SQLServer Oracle DB2
Multipoint query that returns 100 records out of 1000000. Cold buffer Clustered index is twice as fast as nonclustered index and orders of magnitude faster than a scan.
AOBD2007/08
H. Galhardas
19
Inconvenient of a clustered index
Benefits can diminish if there is a large number of overflow data pages
Accessing those pages will usually entail a disk seek
Overflow pages can result from two kinds of updates:
Inserts may cause data pages to overflow Record replacements that increase the size of a record (e.g., the replacement of a NULL values by a long string)
AOBD2007/08
H. Galhardas
Evaluation of clustered indexes with insertions (1)
SQLServer
Throughput (queries/sec)
No maintenance Maintenance 0 20 40 60 80 100
% Increase in Table Size
Index is created with fillfactor = 100. Insertions cause page splits and extra I/O for each query Maintenance consists in dropping and recreating the index With maintenance performance is constant while performance degrades significantly if no maintenance is performed.
21
AOBD2007/08
H. Galhardas

DB2
50
40 30 20 10 0 0 20 40
No maintenance Maintenance
60 80 100 % Increase in Table Size
Index is created with pctfree =0 Insertions cause records to be appended at the end of the table Each query thus traverses the index structure and scans the tail of the table. Performances degrade slowly when no maintenance is performed.
AOBD2007/08
H. Galhardas
Oracle
No maintenance

100
20
40 60 80 % Increase in Table Size
In Oracle, clustered index are approximated by an index defined on a clustered table No automatic physical reorganization Index defined with pctfree = 0 Overflow pages cause performance degradation
AOBD2007/08
H. Galhardas
23
Redundant tables
Because there is only one clustered index per table, it might be a good idea to replicate a table in order to use a clustered index on two different attributes
Yellow and white pages in a paper telephone book Works well if low insertion/update rate
AOBD2007/08
H. Galhardas
Covering Index - definition

A nonclustering index can eliminate the need to access the underlying table through covering (composite index) For the query: Select name from employee where department = marketing Good covering index would be on (department, name)

Index on (name, department) less useful. Index on department alone moderately useful.
Inconvenients (of composite indexes):

Tend to have a large key size Update to one of the attributes causes index to be modified
H. Galhardas
25
AOBD2007/08
Covering Index - impact
70 60 50 40 30 20 10 0 SQLServer covering - not ordered non clustering clustering covering
Covering index performs better than clustering index when first attributes of index are in the where clause and last attributes in the select. When attributes are not in order then performance is much worse.
AOBD2007/08
H. Galhardas
26
Benefits of non-clustered indexes

1.
A dense index can eliminate the need to access the underlying table through covering.
It might be worth creating several indexes to increase the likelihood that the optimizer can find a covering index
2.
A non-clustered index is good if each query retrieves significantly fewer records than there are pages in the table.
Point queries always useful Multipoint queries useful if: number of distinct key values > c * number of records per page Where c is the number of pages retrieved in each prefetch
H. Galhardas
AOBD2007/08
Table Scan Can Sometimes Win
scan non clustering

0 5 10 15 % of selected records 20 25
IBM DB2 v7.1 on Windows 2000 Range Query If a query retrieves 10% of the records or more, scanning is often better than using a non-clustering non-covering index.
AOBD2007/08
H. Galhardas
28
Joins, Foreign Keys and Indexes

R |X| R.A = S.B S can be executed through: (NR = nb tuples R, NS = nb tuples S, BR = nb pages R, BS = nb pages S) Nested Loop (NL)
Cost I/O: NR *BS + BR, can be reduced to BR + BS if the smaller relation fits entirely in memory Cost: BR + c * NR, c: cost of traversing index and fetching all
matching s tuples for one tuple or r
Indexed Nested Loop (INL)
Hash Join (HJ)
Cost I/O: 3(BR + BS), can be reduced to BR + BS if the entire build input can be kept in main memory
H. Galhardas
AOBD2007/08
Choice of join algorithms (1)
If there is no index present, the system will choose to use a hash join If there is an index present:
An INL based on an index on S.B works better than a HJ if the nb of distinct values in S.B is almost equal to the nb of rows of S
Common case because most joins are FK joins
The same, regardless the nb of distinct values of S.B, if the index covers the join
The only accesses to S data occur within the index
The same, regardless the nb of distinct values of S.B, if S is clustered based on B
All S rows having equal B values will be colocated
Otherwise, the hash join may be better
AOBD2007/08
H. Galhardas
Choice of join algorithms (2)

An index may be particularly useful in two other situations: In a non-equi-join (R.A > S.B), an index (using a Btree) on the join attribute avoids a full table scan in the NL. To support the FK constraint when R.A is a subset of R.B, so A is a FK in R; B is a PK in S
An index in S speeds up insertions on R: for every record inserted in R, check the foreign constraint on S Similarly, an index on R.A speeds up deletions in S
AOBD2007/08
H. Galhardas
Index on Small Tables
Tuning manuals suggest to avoid indexes on small tables (containing fewer than 200 records) This number depends on the size of records compared with the size of the index key
If all data from a relation fits in one page (or in a single disk track and can be read into memory through a single physical read by prefetching)
an index page adds at least an I/O
If each record fits in a page, then 200 records may require 200 disk accesses or more.
an index helps performance
If many inserts execute on a table with a small index, the index itself may become a concurrency control bottleneck
Lock conflicts near the root
AOBD2007/08
H. Galhardas
Index on Small Tables and Updates

If transactions update a single record, without an index, each transaction scans through many records before it locks the relevant record, thus reducing update concurrency

18 16 14 12 10 8 6 4 2 0 no index index
Throughput (updates/sec)
Small table: 100 records Two concurrent processes perform updates (each process works for 10ms before it commits) No index: the table is scanned for each update. No concurrent updates. A clustered index allow to take advantage of row locking.
AOBD2007/08
H. Galhardas
Table organization and index selection: basic rules (1)

1. 2.
Use a hash index for point queries only. Use a Btree if multipoint queries or range queries are used Use clustering
if your queries need all or most of the fields of each records returned, but the records are too large for a composite index on all fields if multipoint or range queries are asked
3. 4.
Use a dense index to cover critical queries Dont use an index if the time lost when inserting and updating overwhelms the time saved when querying
H. Galhardas
AOBD2007/08
Table organization and index selection: basic rules (2)
Use key compression
If you are using a B-tree Compressing the key will reduce the number of levels in the tree The system is disk-bound but not CPU-bound Updates are relatively rare
AOBD2007/08
H. Galhardas

Index Tuning

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Index Tuning

Hochgeladen von

Copyright:

Verfügbare Formate

Index Tuning

Condition on attribute value

Prefix Match Query

A (search) key is a sequence of attributes.

A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs.

107 110 120

Key length influences fanout

Choose small key when creating an index Key compression

R14 R17 R21

The length of these chains impacts performance

Hash Index Performance

Useless for range, prefix or extremal queries

Avoiding overflow may require underutilize the hash space

Hash functions take longer to execute on a long key

Clustered / Non clustered index

Clustered index (primary index)

Non-clustered index (secondary index)

Dense / Sparse Index

Pointers are associated to pages

Pointers are associated to records Non clustered indexes are dense

record record record

Index Implementations in some major DBMS

Index organized table (unique/clustered) Clusters used when creating tables.

Index Tuning Knobs

Prefix Match Query

Benefits of a clustered index

Evaluation of Clustered Index

0.8 0.6 0.4 0.2 0 SQLServer Oracle DB2

Inconvenient of a clustered index

Benefits can diminish if there is a large number of overflow data pages

Accessing those pages will usually entail a disk seek

Overflow pages can result from two kinds of updates:

Evaluation of clustered indexes with insertions (1)

No maintenance Maintenance 0 20 40 60 80 100

% Increase in Table Size

Evaluation of clustered indexes with insertions (2)

Evaluation of clustered indexes with insertions (3)

40 60 80 % Increase in Table Size

Covering Index - definition

Inconvenients (of composite indexes):

Covering Index - impact

70 60 50 40 30 20 10 0 SQLServer covering - not ordered non clustering clustering covering

Benefits of non-clustered indexes

Table Scan Can Sometimes Win

scan non clustering

Joins, Foreign Keys and Indexes

Indexed Nested Loop (INL)

Hash Join (HJ)

Choice of join algorithms (1)

Common case because most joins are FK joins

The only accesses to S data occur within the index

The same, regardless the nb of distinct values of S.B, if S is clustered based on B

All S rows having equal B values will be colocated

Otherwise, the hash join may be better

Choice of join algorithms (2)

Index on Small Tables

an index page adds at least an I/O

an index helps performance

Lock conflicts near the root

Index on Small Tables and Updates

Table organization and index selection: basic rules (1)

Table organization and index selection: basic rules (2)

Use key compression

Das könnte Ihnen auch gefallen