Beruflich Dokumente
Kultur Dokumente
Session Objectives
Examine some of the statistics-gathering options and their impact.
Focus on actual experience. Learn why to choose various options when gathering statistics.
DBMS_STATS Procedures
Package contains over 40 procedures, including:
Deleting existing statistics for table, schema, or database Setting statistics to desired values Exporting and importing statistics Gathering statistics for a schema or entire database Monitoring tables for changes
DBMS_STATS Procedures
We will focus on: DBMS_STATS.GATHER_TABLE_STATS DBMS_STATS.GATHER_INDEX_STATS
The starting points for getting statistics so the optimizer can make informed decisions
DBMS_STATS.GATHER_TABLE_ STATS
DBMS_STATS.GATHER_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT estimate_percent NUMBER DEFAULT block_sample BOOLEAN DEFAULT method_opt VARCHAR2 DEFAULT degree NUMBER DEFAULT granularity VARCHAR2 DEFAULT cascade BOOLEAN DEFAULT stattab VARCHAR2 DEFAULT statid VARCHAR2 DEFAULT statown VARCHAR2 DEFAULT no_invalidate BOOLEAN DEFAULT NULL, NULL, FALSE, 'FOR ALL COLUMNS SIZE 1', NULL, 'DEFAULT', FALSE, NULL, NULL, NULL, FALSE);
DBMS_STATS.GATHER_INDEX_ STATS
DBMS_STATS.GATHER_INDEX_STATS ( ownname VARCHAR2, indname VARCHAR2, partname VARCHAR2 DEFAULT estimate_percent NUMBER DEFAULT stattab VARCHAR2 DEFAULT statid VARCHAR2 DEFAULT statown VARCHAR2 DEFAULT degree NUMBER DEFAULT granularity VARCHAR2 DEFAULT no_invalidate BOOLEAN DEFAULT NULL, NULL, NULL, NULL, NULL, NULL, 'DEFAULT', FALSE);
These are the parameters which address statistics accuracy and performance.
estimate_percent
The percentage of rows to estimate Null means compute Use DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the best sample size for good statistics
block_sample
Determines whether or not to use random block sampling instead of random row sampling. "Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated."
10
method_opt
Determines whether to collect histograms to help in dealing with skewed data. FOR ALL COLUMNS, FOR ALL INDEXED COLUMNS, or a COLUMN list determine which columns, and SIZE determines how many histogram buckets.
11
method_opt
Use SKEWONLY for SIZE, to have Oracle determine the columns on which to collect histograms based on their data distribution. Use AUTO for SIZE, to have Oracle determine the columns on which to collect histograms based on data distribution and workload.
12
cascade
Determines whether to gather statistics on indexes as well.
13
Our Approach
Try different values for these parameters and look at the impact on accuracy of statistics and performance of the statistics gathering job itself.
14
15
NONUNIQUE
TSUTTON.FILEH_PREFIX_STATE
UNIQUE
TSUTTON.PK_FILE_HISTORY
16
17
NONUNIQUE
NONUNIQUE
TSUTTON.PROPC_LOOKUPID
TSUTTON.PROPC_PROPSTYLE
LOOKUPID
PROPSTYLE
18
19
tabcol.sql:
select table_name, column_name, data_type, num_distinct, sample_size, to_char(last_analyzed, ' HH24:MI:SS') last_analyzed, num_buckets buckets from dba_tab_columns where table_name in ('FILE_HISTORY','PROP_CAT') order by table_name, column_id;
20
21
22
NONUNIQUE
FILEH_FTYPE_STATE
FILE_TYPE STATE_NO
PREF STATE_NO FILE_ID
12 12
65,638 65,638 1,952,701
1266 1266
1053 1053 1347
NONUNIQUE
FILEH_PREFIX_STATE
UNIQUE
PK_FILE_HISTORY
DBA_INDEXES tells us we have 1,937,490 distinct keys for the index on FNAME, pretty close to the actual value of 1,951,673. 24
Only 178 distinct values for FNAME. The optimizer concludes that a full table scan will be more efficient. Were also told there are only 478 distinct values for PREF, when we know there are 65,345.
25
27
NONUNIQUE
FILEH_PREFIX_STATE
UNIQUE
PK_FILE_HISTORY
DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1951673 93010 VARCHAR2 11133 93010 NUMBER 5 93010 NUMBER 7 93010 VARCHAR2 23744 93010
29
30
31
78692 distinct values for FNAME! After examining every row in the table!
32
Is DBMS_STATS Better?
Lets start with a quick run, a 1% estimate and cascade=true, so the indexes are analyzed also.
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>1,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:41.70 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>1,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:44.29 33
34
1% estimate, cascade=true
Gathering statistics took 3:26. Analyze estimate statistics took 23 seconds. Analyze estimate 5% took 3:11. Stats are very accurate for FNAME. Stats are off for PREF (6522 vs. 65,345 actual)
35
5% estimate, cascade=true
Lets try 5%:
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:23.52 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:03:08.75
36
5% estimate, cascade=true
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1955420 97771 VARCHAR2 1955420 97771 VARCHAR2 24222 72377
37
5% estimate, cascade=true
A 5% estimate took 4:32. PREF now shows 24,222 distinct values (actual is 65,345). A 5% estimate using analyze took 3:11, but only showed 11,133 distinct values for FNAME
38
Full Compute
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>null,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:09:35.13 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>null,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:29:09.46
39
Full Compute
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME PREF PROP_CAT LOOKUPID EXTID PROPSTYLE DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED ---------- ------------ ----------- ------------NUMBER 1951673 1951673 14:19:30 VARCHAR2 1951673 1951673 14:19:30 VARCHAR2 65345 1448208 14:19:30 VARCHAR2 VARCHAR2 VARCHAR2 40903 11486321 48936 11486321 11486321 11486321 14:50:19 14:50:19 14:50:19
40
Full Compute
Total time taken for a DBMS_STATS compute was slightly longer than for an ANALYZE compute (38:45 vs. 36:54). The stats are right on! Theres another interesting behavior
41
NONUNIQUE
FILEH_PREFIX_STATE
UNIQUE
PK_FILE_HISTORY
PROP_CAT
NONUNIQUE
PK_PROP_CAT
NONUNIQUE NONUNIQUE
PROPC_LOOKUPID PROPC_PROPSTYLE
42
43
block_sample = true
If >20 rows per block, visiting 5% of rows could visit every block. So, is block_sample=true faster?
44
45
cascade = false
People have suggested a small estimate for table, compute for indexes. Lets test 1% estimate for tables, compute for indexes.
46
47
cascade = false
BUT the sample sizes are still not 100%
Table Uniqueness Index Name Column Name Distinct Keys Sample Size ------------ ---------- -------------------- ------------ ------------- -----------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,961,200 127775 NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO PREF STATE_NO FILE_ID 13 13 15,076 15,076 1,951,673 467998 467998 417975 417975 1951673
NONUNIQUE
FILEH_PREFIX_STATE
UNIQUE
PK_FILE_HISTORY
PROP_CAT
NONUNIQUE
PK_PROP_CAT
NONUNIQUE NONUNIQUE
PROPC_LOOKUPID PROPC_PROPSTYLE
48
20%, cascade
20%, cascade 50%, cascade null (compute) , cascade
False
True False False
9:21
9:13 20:24 38:45
1950750
1885260 1950472 1951673
48271
40653 60804 65345
29108
21897 39708 48936
49
Other Options
What about all that Auto stuff? Two commonly referenced auto options:
AUTO_SAMPLE_SIZE for estimate_percent. method_opt=>'FOR ALL COLUMNS SIZE AUTO'
50
DBMS_STATS.AUTO_SAMPLE_SIZE
This option tells Oracle to choose the proper sample size for estimate_percent. Lets test it.
51
DBMS_STATS.AUTO_SAMPLE_SIZE
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:08:19.19 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:22:55.99
52
DBMS_STATS.AUTO_SAMPLE_SIZE
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1969341 5841 VARCHAR2 1951673 1951673 NUMBER 6 1951673 NUMBER 7 1951673 VARCHAR2 65345 1448208
Took 31:15. 7 minutes less than compute, cascade. 10 minutes longer than 50%, cascade, but not much more accurate. Looks like it sampled nearly every row in the table anyway.
53
method_opt
We tested different histogram options. First, FOR ALL INDEXED COLUMNS, which seems to be the most commonly used choice. Note that:
Youre unlikely to need histograms on all your indexed columns. You may well want histograms on some nonindexed columns. But its a good start for a test.
54
56
57
58
59
60
Bug #3929552
Metalink Note 284917.1: DBMS_STATS WITH SKEWONLY GENERATES HISTOGRAMS FOR UNIQUE KEY COLUMN. The subject of the note says it all. So SKEWONLY isnt very useful for the affected versions.
61
62
63
64
65
66
67
68
69
71
72
73
74
75
76
dbms_stats has taken the workload into account and created histograms on the columns we queried (though its again creating a histogram on FNAME).
77
10g- AUTO_SAMPLE_SIZE
Lets see how 10g does with using DBMS_STATS.AUTO_SAMPLE_SIZE for estimate_percent.
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:02:15.95
78
10g- AUTO_SAMPLE_SIZE
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1951112 5764 VARCHAR2 1951112 57166 NUMBER 2 5764 NUMBER 7 5764 VARCHAR2 2200 4246
79
10g- AUTO_SAMPLE_SIZE
Statistics gathering took 6:17 (an enormous improvement over the 31:15 in 9i). Accuracy may be lacking.
PREF shows 2200 distinct values (actual is 65,345) Results arent as accurate as a 5% sample in 9i (which took 1/3 less time)
80
Summary
From our testing, it appears that: The sweet spot for balancing performance of the gathering process and accuracy of statistics lies between 5 and 20%. Gathering statistics separately for indexes is faster than the cascade=true option while gathering table statistics. Block_sample=true doesn't appear to appreciably speed up statistics gathering, while at the same time delivering somewhat less accurate statistics. 81
Summary
Using AUTO_SAMPLE_SIZE takes nearly as long as COMPUTE in Oracle 9i. It is much faster in 10gR1, but the accuracy of the statistics is not as good as with an estimate_percent of 5, which gathers the statistics faster.
82
Summary
Using the SKEWONLY option to size histograms is inadvisable in both Oracle 9i and 10gRelease1. Using the AUTO option to choose and size histograms is inadvisable in Oracle 9i, but seems to work better in 10gR1 (though it still has issues).
83
In Conclusion
These results should help you decide what options you want to use in gathering statistics on your databases. Hopefully youll test different options on your data. It is clear that DBMS_STATS provides greater power, accuracy, and flexibility than ANALYZE, and it is improving with each version.
84
85
86
Contact Information
Terry Sutton
Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111
87