Sie sind auf Seite 1von 87

Whats Up with dbms_stats?

Terry Sutton Database Specialists, Inc.


www.dbspecialists.com

Session Objectives
Examine some of the statistics-gathering options and their impact.
Focus on actual experience. Learn why to choose various options when gathering statistics.

Statistics are Important!


As the optimizer gets more sophisticated in each version of Oracle, the importance of accurate statistics increases.

DBMS_STATS Procedures
Package contains over 40 procedures, including:
Deleting existing statistics for table, schema, or database Setting statistics to desired values Exporting and importing statistics Gathering statistics for a schema or entire database Monitoring tables for changes

DBMS_STATS Procedures
We will focus on: DBMS_STATS.GATHER_TABLE_STATS DBMS_STATS.GATHER_INDEX_STATS
The starting points for getting statistics so the optimizer can make informed decisions

DBMS_STATS.GATHER_TABLE_ STATS
DBMS_STATS.GATHER_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT estimate_percent NUMBER DEFAULT block_sample BOOLEAN DEFAULT method_opt VARCHAR2 DEFAULT degree NUMBER DEFAULT granularity VARCHAR2 DEFAULT cascade BOOLEAN DEFAULT stattab VARCHAR2 DEFAULT statid VARCHAR2 DEFAULT statown VARCHAR2 DEFAULT no_invalidate BOOLEAN DEFAULT NULL, NULL, FALSE, 'FOR ALL COLUMNS SIZE 1', NULL, 'DEFAULT', FALSE, NULL, NULL, NULL, FALSE);

DBMS_STATS.GATHER_INDEX_ STATS
DBMS_STATS.GATHER_INDEX_STATS ( ownname VARCHAR2, indname VARCHAR2, partname VARCHAR2 DEFAULT estimate_percent NUMBER DEFAULT stattab VARCHAR2 DEFAULT statid VARCHAR2 DEFAULT statown VARCHAR2 DEFAULT degree NUMBER DEFAULT granularity VARCHAR2 DEFAULT no_invalidate BOOLEAN DEFAULT NULL, NULL, NULL, NULL, NULL, NULL, 'DEFAULT', FALSE);

Were Going to Test:


estimate_percent block_sample method_opt cascade

These are the parameters which address statistics accuracy and performance.

estimate_percent
The percentage of rows to estimate Null means compute Use DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the best sample size for good statistics

block_sample
Determines whether or not to use random block sampling instead of random row sampling. "Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated."

10

method_opt
Determines whether to collect histograms to help in dealing with skewed data. FOR ALL COLUMNS, FOR ALL INDEXED COLUMNS, or a COLUMN list determine which columns, and SIZE determines how many histogram buckets.

11

method_opt
Use SKEWONLY for SIZE, to have Oracle determine the columns on which to collect histograms based on their data distribution. Use AUTO for SIZE, to have Oracle determine the columns on which to collect histograms based on data distribution and workload.

12

cascade
Determines whether to gather statistics on indexes as well.

13

Our Approach
Try different values for these parameters and look at the impact on accuracy of statistics and performance of the statistics gathering job itself.

14

Our Data (Table 1)


FILE_HISTORY [1,951,673 rows 28507 blocks, 223MB ] Column Name Null? Type Distinct Values ------------------------ -------- ------------- --------------FILE_ID NOT NULL NUMBER 1951673 FNAME NOT NULL VARCHAR2(240) 1951673 STATE_NO NUMBER 6 FILE_TYPE NOT NULL NUMBER 7 PREF VARCHAR2(100) 65345 CREATE_DATE NOT NULL DATE TRACK_ID NOT NULL NUMBER SECTOR_ID NOT NULL NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

15

Our Data (Table 1) Indexes


Table Unique? Index Name ------------ ---------- --------------------------------------FILE_HISTORY NONUNIQUE TSUTTON.FILEH_FNAME NONUNIQUE TSUTTON.FILEH_FTYPE_STATE Column Name -----------FNAME FILE_TYPE STATE_NO PREF STATE_NO FILE_ID

NONUNIQUE

TSUTTON.FILEH_PREFIX_STATE

UNIQUE

TSUTTON.PK_FILE_HISTORY

16

Our Data (Table 2)


PROP_CAT Column Name -----------------------LINENUM LOOKUPID EXTID SOLD CATEGORY NOTES DETAILS PROPSTYLE [11,486,321 rows 117705 blocks, 920MB ] Null? Type Distinct Values -------- ------------- --------------NOT NULL NUMBER(38) 11486321 VARCHAR2(64) 40903 VARCHAR2(20) 11486321 NOT NULL NUMBER(38) 1 VARCHAR2(6) VARCHAR2(255) VARCHAR2(255) VARCHAR2(20) 48936

17

Our Data (Table 2) Indexes


Table Unique? Index Name Column Name ------------ ---------- ---------------------------------------- --------------PROP_CAT NONUNIQUE TSUTTON.PK_PROP_CAT EXTID SOLD

NONUNIQUE
NONUNIQUE

TSUTTON.PROPC_LOOKUPID
TSUTTON.PROPC_PROPSTYLE

LOOKUPID
PROPSTYLE

18

To Find What Values the Statistics Gathering Obtained:


index.sql:
select ind.table_name, ind.uniqueness, col.index_name, col.column_name, ind.distinct_keys, ind.sample_size from dba_ind_columns col, dba_indexes ind where ind.table_owner = 'TSUTTON' and ind.table_name in ('FILE_HISTORY','PROP_CAT') and col.index_owner = ind.owner and col.index_name = ind.index_name and col.table_owner = ind.table_owner and col.table_name = ind.table_name order by col.table_name, col.index_name, col.column_position;

19

tabcol.sql:
select table_name, column_name, data_type, num_distinct, sample_size, to_char(last_analyzed, ' HH24:MI:SS') last_analyzed, num_buckets buckets from dba_tab_columns where table_name in ('FILE_HISTORY','PROP_CAT') order by table_name, column_id;

20

The Old DaysANALYZE


A Quick and Dirty attempt:
SQL> analyze table file_history estimate statistics; Table analyzed. Elapsed: 00:00:08.26 SQL> analyze table prop_cat estimate statistics; Table analyzed. Elapsed: 00:00:14.76

21

The Old DaysANALYZE


But someone complains that their query is taking too long:
SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME = 'SOMETHING'; no rows selected Elapsed: 00:00:08.66

22

The Old DaysANALYZE


So we try it with autotrace on:
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2743 Card=10944 Bytes=678528) 1 0 TABLE ACCESS (FULL) OF 'FILE_HISTORY' (Cost=2743 Card=10944 Bytes=678528) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 28519 consistent gets 28507 physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed 23

The Old DaysANALYZE


Its not using our index! Lets look at the index statistics:
Table Uniqueness Index Name Column Name Distinct Keys Sample Size ------------ ---------- -------------------- ------------ ------------- -----------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,937,490 1106

NONUNIQUE

FILEH_FTYPE_STATE

FILE_TYPE STATE_NO
PREF STATE_NO FILE_ID

12 12
65,638 65,638 1,952,701

1266 1266
1053 1053 1347

NONUNIQUE

FILEH_PREFIX_STATE

UNIQUE

PK_FILE_HISTORY

DBA_INDEXES tells us we have 1,937,490 distinct keys for the index on FNAME, pretty close to the actual value of 1,951,673. 24

The Old DaysANALYZE


But when we look at DBA_TAB_COLUMNS
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1948094 1034 VARCHAR2 178 1034 NUMBER 1 1034 NUMBER 7 1034 VARCHAR2 478 1034

Only 178 distinct values for FNAME. The optimizer concludes that a full table scan will be more efficient. Were also told there are only 478 distinct values for PREF, when we know there are 65,345.

25

The Old DaysANALYZE


Lets try a larger sample. The first test only sampled about 0.05% of the rows. Lets try 5% of the rows.
SQL> analyze table file_history estimate statistics sample 5 percent; Table analyzed. Elapsed: 00:00:36.21 SQL> analyze table prop_cat estimate statistics sample 5 percent; Table analyzed. Elapsed: 00:02:35.11 26

The Old DaysANALYZE


We try our query:
SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME = 'SOMETHING';
no rows selected Elapsed: 00:00:00.54 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=54 Card=110 Bytes=6820) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'FILE_HISTORY' (Cost=54 Card=110 Bytes=6820) 2 1 INDEX (RANGE SCAN) OF 'FILEH_FNAME' (NON-UNIQUE) (Cost=3 Card=110)

27

The Old DaysANALYZE


Our query, continued:
Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed

Much better! Lets look at the stats.


28

The Old DaysANALYZE


Table Uniqueness Index Name Column Name Distinct Keys Sample Size ------------ ---------- -------------------- ------------ ------------- -----------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,926,580 101179
NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO PREF STATE_NO FILE_ID 8 8 74,935 74,935 1,952,701 102128 102128 98709 98709 101025

NONUNIQUE

FILEH_PREFIX_STATE

UNIQUE

PK_FILE_HISTORY

TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF

DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1951673 93010 VARCHAR2 11133 93010 NUMBER 5 93010 NUMBER 7 93010 VARCHAR2 23744 93010

29

The Old DaysANALYZE


11,133 distinct values for FNAME, much better than 178. 23,744 distinct values for PREF, much better than 478. And our query is efficient!
But can we do better?

30

The Old DaysANALYZE


How about a full compute statistics?
SQL> analyze table file_history compute statistics; Table analyzed. Elapsed: 00:07:38.32 SQL> analyze table prop_cat compute statistics; Table analyzed. Elapsed: 00:29:15.29

31

The Old DaysANALYZE


And the statistics:
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1951673 1951673 VARCHAR2 78692 1951673 NUMBER 6 1951673 NUMBER 7 1951673 VARCHAR2 65345 1951673

78692 distinct values for FNAME! After examining every row in the table!

32

Is DBMS_STATS Better?
Lets start with a quick run, a 1% estimate and cascade=true, so the indexes are analyzed also.
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>1,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:41.70 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>1,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:44.29 33

And the Statistics:


TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1937700 19377 VARCHAR2 1937700 19377 NUMBER 3 19377 NUMBER 7 19377 VARCHAR2 6522 14342

34

1% estimate, cascade=true
Gathering statistics took 3:26. Analyze estimate statistics took 23 seconds. Analyze estimate 5% took 3:11. Stats are very accurate for FNAME. Stats are off for PREF (6522 vs. 65,345 actual)

35

5% estimate, cascade=true
Lets try 5%:
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:01:23.52 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:03:08.75

36

5% estimate, cascade=true
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1955420 97771 VARCHAR2 1955420 97771 VARCHAR2 24222 72377

37

5% estimate, cascade=true
A 5% estimate took 4:32. PREF now shows 24,222 distinct values (actual is 65,345). A 5% estimate using analyze took 3:11, but only showed 11,133 distinct values for FNAME

38

Full Compute
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>null,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:09:35.13 SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>null,cascade=>true) PL/SQL procedure successfully completed. Elapsed: 00:29:09.46

39

Full Compute
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME PREF PROP_CAT LOOKUPID EXTID PROPSTYLE DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED ---------- ------------ ----------- ------------NUMBER 1951673 1951673 14:19:30 VARCHAR2 1951673 1951673 14:19:30 VARCHAR2 65345 1448208 14:19:30 VARCHAR2 VARCHAR2 VARCHAR2 40903 11486321 48936 11486321 11486321 11486321 14:50:19 14:50:19 14:50:19

40

Full Compute
Total time taken for a DBMS_STATS compute was slightly longer than for an ANALYZE compute (38:45 vs. 36:54). The stats are right on! Theres another interesting behavior

41

Full Compute Indexes


Table Uniqueness Index Name Column Name Distinct Keys Sample Size ------------ ---------- -------------------- ------------ ------------- -----------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 2,019,679 131585 NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO PREF STATE_NO FILE_ID 14 14 16,365 16,365 1,951,673 456182 456182 428990 428990 1951673

NONUNIQUE

FILEH_PREFIX_STATE

UNIQUE

PK_FILE_HISTORY

PROP_CAT

NONUNIQUE

PK_PROP_CAT

EXTID SOLD LOOKUPID PROPSTYLE

10,995,615 10,995,615 2,678 3,434

373959 373959 469772 504580

NONUNIQUE NONUNIQUE

PROPC_LOOKUPID PROPC_PROPSTYLE

42

Full Compute Indexes


Were doing a full compute, cascade. Sample size for the indexes ranges from 3% to 100%!

43

block_sample = true
If >20 rows per block, visiting 5% of rows could visit every block. So, is block_sample=true faster?

44

block_sample = true, estimate = 5%, cascade = true


Results shown in summary table. Took 4:11 (block_sample=false took 4:32). Doesnt seem more efficient. Less accurate.

45

cascade = false
People have suggested a small estimate for table, compute for indexes. Lets test 1% estimate for tables, compute for indexes.

46

Estimate = 1%, cascade = false


estimate = 1% for table, compute for indexes Took 2:59 (vs. 3:26 for cascade = true) As accurate as cascade = true Be careful if your indexes change, youll need to change your DBMS_STATS jobs.

47

cascade = false
BUT the sample sizes are still not 100%
Table Uniqueness Index Name Column Name Distinct Keys Sample Size ------------ ---------- -------------------- ------------ ------------- -----------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,961,200 127775 NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE STATE_NO PREF STATE_NO FILE_ID 13 13 15,076 15,076 1,951,673 467998 467998 417975 417975 1951673

NONUNIQUE

FILEH_PREFIX_STATE

UNIQUE

PK_FILE_HISTORY

PROP_CAT

NONUNIQUE

PK_PROP_CAT

EXTID SOLD LOOKUPID PROPSTYLE

11,224,990 11,224,990 2,666 3,063

381760 381760 459455 486558

NONUNIQUE NONUNIQUE

PROPC_LOOKUPID PROPC_PROPSTYLE

48

Summary of Tests So Far


estimate_percent Block_sample Elapsed Time # Distinct FNAME (1951673) 1937700 1950100 1941600 1955420 2024540 1956180 1962840 # Distinct PREF (65345) 6522 6835 6797 24222 15927 24188 36172 # Distinct PROPSTYLE (48936) 16984 16932 16917 20095 10092 20084 23547 1%, cascade 1% table, 20% indexes 1% table, compute indexes 5%, cascade 5%, cascade 5% table, 20% indexes 10%, cascade False False False False True False False 3:26 2:44 2:59 4:32 4:11 4:00 6:04

20%, cascade
20%, cascade 50%, cascade null (compute) , cascade

False
True False False

9:21
9:13 20:24 38:45

1950750
1885260 1950472 1951673

48271
40653 60804 65345

29108
21897 39708 48936

49

Other Options
What about all that Auto stuff? Two commonly referenced auto options:
AUTO_SAMPLE_SIZE for estimate_percent. method_opt=>'FOR ALL COLUMNS SIZE AUTO'

50

DBMS_STATS.AUTO_SAMPLE_SIZE
This option tells Oracle to choose the proper sample size for estimate_percent. Lets test it.

51

DBMS_STATS.AUTO_SAMPLE_SIZE
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:08:19.19 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:22:55.99

52

DBMS_STATS.AUTO_SAMPLE_SIZE
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1969341 5841 VARCHAR2 1951673 1951673 NUMBER 6 1951673 NUMBER 7 1951673 VARCHAR2 65345 1448208

Took 31:15. 7 minutes less than compute, cascade. 10 minutes longer than 50%, cascade, but not much more accurate. Looks like it sampled nearly every row in the table anyway.

53

method_opt
We tested different histogram options. First, FOR ALL INDEXED COLUMNS, which seems to be the most commonly used choice. Note that:
Youre unlikely to need histograms on all your indexed columns. You may well want histograms on some nonindexed columns. But its a good start for a test.

54

method_opt- 'for all indexed columns'


begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all indexed columns size 30', cascade=>true); end; / Elapsed: 00:01:35.63 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all indexed columns size 30', cascade=>true); end; / Elapsed: 00:06:01.14 55

method_opt- 'for all indexed columns'


TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF CREATE_DATE TRACK_ID SECTOR_ID TEAMS BYTE_SIZE START_DATE END_DATE LAST_UPDATE CONTAINERS DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- ------NUMBER 1950360 195036 30 VARCHAR2 18725 195036 10 NUMBER 5 195036 4 NUMBER 7 195036 6 VARCHAR2 36039 144909 14 DATE NUMBER NUMBER NUMBER NUMBER DATE DATE DATE NUMBER

56

method_opt- 'for all indexed columns'


Took 7:37 (compared to 6:04 for same sample size without histograms) Statistics arent gathered for non-indexed columns.

57

method_opt- 'for all columns size skewonly'


Rather than deciding the maximum number of buckets to use, lets let Oracle decide.
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:02:27.02 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:12:57.15

58

method_opt- 'for all columns size skewonly'


TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF CREATE_DATE TRACK_ID SECTOR_ID TEAMS BYTE_SIZE START_DATE END_DATE LAST_UPDATE CONTAINERS DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- ------NUMBER 1951550 195155 200 VARCHAR2 18908 195155 39 NUMBER 5 195155 4 NUMBER 7 195155 6 VARCHAR2 35913 144624 93 DATE 489501 195155 200 NUMBER 9 195155 8 NUMBER 6 195155 5 NUMBER 971 160436 46 NUMBER 0 1 DATE 0 1 DATE 0 1 DATE 525993 195155 200 NUMBER 971 160483 46

59

method_opt- 'for all columns size skewonly'


We get statistics on all columns It took 15:24 (twice as long as for all indexed columns 200 buckets for the primary key???

60

Bug #3929552
Metalink Note 284917.1: DBMS_STATS WITH SKEWONLY GENERATES HISTOGRAMS FOR UNIQUE KEY COLUMN. The subject of the note says it all. So SKEWONLY isnt very useful for the affected versions.

61

method_opt- 'for all columns size auto


Lets see if the AUTO option for size does any better.
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:01:40.49 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:03:53.78

62

method_opt- 'for all columns size auto'


TABLE_NAME COLUMN_NAME ------------ --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- ------NUMBER 1946580 194658 1 VARCHAR2 18905 194658 38 NUMBER 4 194658 1 NUMBER 7 194658 1 VARCHAR2 36144 144197 1

63

method_opt- 'for all columns size auto'


Took only 5:34 to gather statistics (best yet for a 10% estimate) Statistics leave something to be desired. 18,905 values for FNAME 38 buckets for FNAME (there is NO skew in this column)

64

Collecting Histograms with DBMS_STATS


Not as efficient as advertised in Oracle 9i.

65

Is 10g Any Better?


Performance and accuracy of statistics is reasonable in 9i for basic choices. Serious deficiencies with AUTO_SAMPLE_SIZE and histogram collection. Are these issues resolved in 10g Release 1?

66

10g- 'for all columns size skewonly'


First well try the SKEWONLY option for SIZE in histogram collection.
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:02:45.05 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true); end; / Elapsed: 00:12:28.17

67

10g- 'for all columns size skewonly'


TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- ------NUMBER 1947030 194703 200 VARCHAR2 26167 194703 200 NUMBER 6 194703 6 NUMBER 7 194703 7 VARCHAR2 45092 144582 200

68

10g- 'for all columns size skewonly'


Collection took 15:13 (about the same as in 9i). Results are just as bad.
Still collecting histograms on primary key and FNAME. 26,167 distinct values of FNAME.

69

10g- 'for all columns size auto'


Lets try the auto option for size (and start our demo).
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:03:22.81 begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; / Elapsed: 00:08:15.00 70

10g- 'for all columns size auto'


TABLE_NAME COLUMN_NAME ------------ --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- ------NUMBER 1947700 194770 1 VARCHAR2 1947700 194770 1 NUMBER 4 194770 1 NUMBER 7 194770 1 VARCHAR2 36073 144920 1

71

10g- 'for all columns size auto'


Statistics gathering took 11:38 (twice as long as in 9i). Statistics are reasonable. No histograms on unique columns. So the AUTO option for SIZE works in Oracle 10g sort of.

72

10g- 'for all columns size auto'


SQL> select STATE_NO, COUNT(*) from FILE_HISTORY group by STATE_NO; STATE_NO COUNT(*) ---------- ---------0 95 20 569 30 1950957 40 39 999 4 9999 9 6 rows selected.

Seems like a candidate for a histogram.

73

10g- 'for all columns size auto'


SQL> select file_type, count(*) from file_history group by file_type; FILE_TYPE COUNT(*) ---------- ---------1 670950 2 83799 3 58925 4 48241 5 777258 6 62681 8 249819 7 rows selected.

Also a candidate for a histogram?

74

10g- 'for all columns size auto'


Lets try some queries:
select count(*) from file_history where file_type = 5; select count(*) from file_history where file_type = 4; select count(*) from file_history where fname = 'SOMETHING';

75

10g- 'for all columns size auto'


And gather stats again
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true); end; /

76

10g- 'for all columns size auto'


TABLE_NAME COLUMN_NAME ------------ --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS ---------- ------------ ----------- --NUMBER 1952650 195265 1 VARCHAR2 26179 195265 254 NUMBER 5 195265 1 NUMBER 7 195265 7 VARCHAR2 36154 144848 1

dbms_stats has taken the workload into account and created histograms on the columns we queried (though its again creating a histogram on FNAME).

77

10g- AUTO_SAMPLE_SIZE
Lets see how 10g does with using DBMS_STATS.AUTO_SAMPLE_SIZE for estimate_percent.
begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:02:15.95

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true); end; / Elapsed: 00:04:01.00

78

10g- AUTO_SAMPLE_SIZE
TABLE_NAME COLUMN_NAME --------------- --------------FILE_HISTORY FILE_ID FNAME STATE_NO FILE_TYPE PREF DATA_TYPE NUM_DISTINCT SAMPLE_SIZE ---------- ------------ ----------NUMBER 1951112 5764 VARCHAR2 1951112 57166 NUMBER 2 5764 NUMBER 7 5764 VARCHAR2 2200 4246

79

10g- AUTO_SAMPLE_SIZE
Statistics gathering took 6:17 (an enormous improvement over the 31:15 in 9i). Accuracy may be lacking.
PREF shows 2200 distinct values (actual is 65,345) Results arent as accurate as a 5% sample in 9i (which took 1/3 less time)

80

Summary
From our testing, it appears that: The sweet spot for balancing performance of the gathering process and accuracy of statistics lies between 5 and 20%. Gathering statistics separately for indexes is faster than the cascade=true option while gathering table statistics. Block_sample=true doesn't appear to appreciably speed up statistics gathering, while at the same time delivering somewhat less accurate statistics. 81

Summary
Using AUTO_SAMPLE_SIZE takes nearly as long as COMPUTE in Oracle 9i. It is much faster in 10gR1, but the accuracy of the statistics is not as good as with an estimate_percent of 5, which gathers the statistics faster.

82

Summary
Using the SKEWONLY option to size histograms is inadvisable in both Oracle 9i and 10gRelease1. Using the AUTO option to choose and size histograms is inadvisable in Oracle 9i, but seems to work better in 10gR1 (though it still has issues).

83

In Conclusion
These results should help you decide what options you want to use in gathering statistics on your databases. Hopefully youll test different options on your data. It is clear that DBMS_STATS provides greater power, accuracy, and flexibility than ANALYZE, and it is improving with each version.

84

The White Paper


A companion white paper to this presentation is available for free download from our companys website at: www.dbspecialists.com/presentations.html

85

Resources from Database Specialists


The Specialist newsletter www.dbspecialists.com/specialist.html Database Rx dbrx.dbspecialists.com/guest Provides secure, automated monitoring, alert notification, and analysis of your Oracle databases

86

Contact Information
Terry Sutton
Database Specialists, Inc. 388 Market Street, Suite 400 San Francisco, CA 94111

Tel: 415/344-0500 Email: tsutton@dbspecialists.com Web: www.dbspecialists.com

87

Das könnte Ihnen auch gefallen