Sie sind auf Seite 1von 28

Understanding Histograms in the Oracle Database

Robert Gaydos R Wave Solutions


Robert.Gaydos@rwavesolutions.com

Copyright R Wave Solutions 2008 All rights reserved

Addenda
What is a Histogram? Types of Histograms How to create Histograms How to identify existing Histograms Rules and tips when creating Histograms Q&A

Copyright R Wave Solutions 2008 All rights reserved

What is a Histogram?
A histogram holds the data distribution of values within a column of a table.

Number of occurrences for a specific value/range Used by CBO to optimize a query. By default DBMS_STATS does not collect histogram stats.
Min/Max Value and # of distinct values NOTE: DBMS_STAT is used to delete histogram data.

It is collected by using DBMS_STATS.

Copyright R Wave Solutions 2008 All rights reserved

Types of Histograms
Two types of Histograms

frequency histograms height-balanced histograms Type of histogram is stored in the HISTOGRAM column of the *tab_col_statistics views (USER/DBA) Value = (HEIGHT BALANCED, FREQUENCY, or NONE)

Copyright R Wave Solutions 2008 All rights reserved

Histogram and Buckets


When Histograms are created the number of buckets can be specified. It is this number that controls the type of histogram created. # Buckets = # of Rows of information. When Distinct Values <= Buckets

Then Frequency Histogram is created Else Height-Balanced Histogram is created

Copyright R Wave Solutions 2008 All rights reserved

Frequency Histograms
Each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value.

Copyright R Wave Solutions 2008 All rights reserved

Frequency Histograms Example


Data A B C B C C C C C B D E Sorted A B B B C C C C C C D E

Results Bucket 1 A = 1 Bucket 2 B = 3 Bucket 3 C = 6 Bucket 4 D = 1 Bucket 5 E = 1

Copyright R Wave Solutions 2008 All rights reserved

Height-Balanced Histograms
In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall.

Copyright R Wave Solutions 2008 All rights reserved

Height-Balanced Example 1
Data A B C B C C C C C B D E Sorted A B B B C C C C C C D E

Results EPN 0 1 2 3

EPV B C C E

Buckets = 4
Copyright R Wave Solutions 2008 All rights reserved

Height-Balanced
Data Sorted A A B Notice that B C C B Crosses Buckets. B B C C C C This is howC C C C database knows C C which values are B C D D popular E E

The DB only stores 0-B Example 1 2-C 3-E

Results EPN 0 1 2 3

EPV B C C E

Buckets = 4
Copyright R Wave Solutions 2008 All rights reserved

How to Create a Histogram


Created by using DBMS_STATS.GATHER_TABLE_STATS METHOD_OPT => 'FOR COLUMNS SIZE <# of buckets> <Column Name> execute dbms_stats.gather_table_stats (ownname => 'oe', tabname => 'inventories', METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');
Copyright R Wave Solutions 2008 All rights reserved

Identify Existing Histograms


In 10g

HISTOGRAM column of the *tab_col_statistics views


(USER/DBA)

In 9 select owner, table_name, column_name, count(*) buckets from dba_histograms where endpoint_number not in (0,1) group by owner, table_name, column_name
Copyright R Wave Solutions 2008 All rights reserved

Rules On When To Create Histograms


First there are no rules of thumb that are always correct. When creating using dbms_stats, the use of the DEGREE (for parallel) can skew the value of density Running dbms_stats can drop histograms

Use method_opt==>'REPEAT

Copyright R Wave Solutions 2008 All rights reserved

Rules On When To Create Histograms


Index maintenance overhead is performed during update, insert and delete operations. Histogram maintenance overhead is performed during the analyze/dbms_stats process. Histograms are NOT just for indexed columns.

Adding a histogram to an un-indexed column that is used in a where clause can improve performance.

Copyright R Wave Solutions 2008 All rights reserved

Histogram Example
select sum(t1.d2*t2.d3*t3.d3) from t1, t2, t3 where t1.fk1 = t2.pk1 and t3.pk1 = t2.fk1 and t3.d2 = 35 and t1.d3 = 0; << This column has 2 values in it In this example column T1.D3 has 2 values and is our most selective criterion, but without a histogram CBO assumes an even distribution of 2 distinct values (density = 0.5).
Copyright R Wave Solutions 2008 All rights reserved

SQL*Plan Without Histogram


Rows ------1 2088 601 601 1000 4 62500 Row Source Operation --------------------------------SORT AGGREGATE HASH JOIN TABLE ACCESS BY INDEX ROWID INDEX RANGE SCAN OBJ#(57612) HASH JOIN TABLE ACCESS FULL OBJ#(57604) filter("T1"."D3"=0) TABLE ACCESS FULL OBJ#(57605)

Copyright R Wave Solutions 2008 All rights reserved

Timing Without Histogram


call count ------- -----Parse 1 Execute 1 Fetch 2 ------- -----total 4 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 0.00 0.00 0 0 0 1.04 4.91 19013 20231 0 -------- ---------- ---------- ---------- ---------1.04 4.91 19013 20231 0 rows ---------0 0 1 ---------1

Query = 20231

Copyright R Wave Solutions 2008 All rights reserved

Use DBMS_STATS to Add Histogram


execute dbms_stats.gather_table_stats(NULL, 'T1', estimate_percent=>null, method_opt=>'FOR COLUMNS SIZE AUTO d3');

Copyright R Wave Solutions 2008 All rights reserved

Timing With Histogram


call count ------- -----Parse 1 Execute 1 Fetch 2 ------- -----total 4 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.00 0 0 0 0.01 0.00 0 0 0 0.56 2.84 9841 10666 0 -------- ---------- ---------- ---------- ---------0.57 2.84 9841 10666 0 rows ---------0 0 1 ---------1

Now Query = 10666 It was Query = 20231

the IO
Copyright R Wave Solutions 2008 All rights reserved

Plan With Histogram


Rows ------Row Source Operation ---------------------------------------

1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID OBJ#(57606) 601 INDEX RANGE SCAN OBJ#(57612) 1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605) 1005 NESTED LOOPS 4 TABLE ACCESS FULL OBJ#(57604) filter("T1"."D3"=0) 1000 INDEX RANGE SCAN OBJ#(57609) access("T1"."FK1"="T2"."PK1")
Copyright R Wave Solutions 2008 All rights reserved

Histograms Opportunities
Any column used in a where clause with skewed data Columns that are not queried all the time Reduced overhead for insert, update, delete

Copyright R Wave Solutions 2008 All rights reserved

When Not To Create A Histogram


No rule of thumb is ever perfect. Do not create Histograms

Evenly distributed data in the column Columns that are not queried

Do not create them on every column of every table. PK of a table

Copyright R Wave Solutions 2008 All rights reserved

Creating / Maintaining Histograms


Table maintenance procedures are important.

Accidentally deleting histogram data is possible. Histogram data is ONLY Collected/Maintained when stats are collected (DBMS_STATS). Changing data in the table does not change data stored in the histogram.

Copyright R Wave Solutions 2008 All rights reserved

Maintaining Histograms
Always check if histogram exists on table before DBMS_STATS is run.

Do not forget to note the # of buckets. Create tables to hold information.

Use METHOD_OPT FOR ALL COLUMNS REPEAT to prevent deletion of histograms data. Distribution ratio change -> Recollect
Copyright R Wave Solutions 2008 All rights reserved

Creating Histograms
What is the best way to start out creating Histograms? Three popular ways to create.

Salt and Pepper - Create them as you need them. Little Hammer DBMS_STATS using METHOD_OPT AUTO
Should NOT be used after fresh restarted database

Big Hammer DBMS_STATS using SKEWONLY Over Kill Hammer - FOR ALL COLUMNS

Copyright R Wave Solutions 2008 All rights reserved

Other Comments
It is the Density of a column that the optimizer considers when accessing a table. Changing the number of buckets changes the density. More Buckets DOES NOT mean better density. Using degree != 1 can affect density.
Copyright R Wave Solutions 2008 All rights reserved

Review
What is a Histogram? Two types of Histograms in the database How to create Histograms Data is collected by running DBMS_STATS. Different methods to create histograms Bucket size affects density of column. Density of the column affects CBO.
Copyright R Wave Solutions 2008 All rights reserved

Thank You for Your Time

Copyright R Wave Solutions 2008 All rights reserved

Das könnte Ihnen auch gefallen