Beruflich Dokumente
Kultur Dokumente
Some times while analyzing the SQL for tuning you might have come across scenarios where
you feel that CBO is not making the optimized plan as it should be and you have tried all the
possible combinations of tuning the database or the SQL queries. In that scenario there are two
options left at the performance engineer end:
However the first option seems pretty simple but that is going to impact the performance of
those query at different data load conditions. Suppose you have tuned it for high-end data
condition but it may not be OK for low-end data or the case may be otherwise also. So this type
of fix is recommended only in the scenario in which you are left with no other option.
Second option is to take the dump of the optimizer plans and find out how it is making the plan.
What all can be changed or added so as to pick up a much more efficient and desirable plan.
For that there is some detailed analysis required.
This paper will be focused on this second attribute of analysis in terms of how CBO calculates
the cost for SQL plans so that you can make the changes that further lead optimizer to pick up a
different plan. In this sense you will be able to find out which all things are affecting the CBO
cost calculations and which can suitably be changed to make sure that optimizer is going to pick
up the correct plan.
PREREQUISITES
The one of the most important prerequisites for this process is that you are using the CBO and
for CBO to calculate the cost of your plan it is essential that all your tables, indexes are well
analyzed so as to have the accurate information in the dictionary views.
You just don’t see any of the plans which “lost out”. Unless you activate the 10053 event trace,
that is. There you see all the access plans the CBO evaluated and the costs assigned to them.
Event 10053 details the choices made by the CBO in evaluating the execution path for a query.
It externalizes most of the information that the optimizer uses in generating a plan for a query.
HOW TO SET EVENT 10053
Unlike other events, where higher levels mean more detail, the 10053-event trace at level 2
produces less detail than the trace at level 1. Like the sql_trace (a.k.a. 10046 event trace), the
10053-event trace is written to user_dump_dest.
The trace is only generated if it the query is parsed by the cost based optimizer. Note that this
entails two conditions: the query must be (hard) parsed and it must be parsed by the CBO. If the
session, for which the 10053 trace has been enabled, is executing only sql that is already
parsed and is being reused, no trace is produced. Likewise, if the sql statement is parsed by the
rule based optimizer, trace output will consist of only the sql query, none of the other
information.
TRACE CONTENTS
Query
This part of trace contains SQL you want to trace out. But if the Query is parsed using Rule
Based Optimizer then this trace file is going to be finished here only. Rule based optimizer will
be used in the following cases:
Terminology Used
This section of the trace the optimizer lists all the init.ora parameters that have an influence on
the access plan. The list changes from Oracle version to version. This here is the list for 9.2, the
description of these one can look into the Oracle Documentation.
Terminology Used
Now the optimizer is using this information to evaluate access plans. First the CBO looks at the
different possibilities and costs to access each of the tables in the SQL by itself, taking into
consideration all applicable predicates except join predicates.
Generally Optimizer uses table scan, index unique scan, index range scan, index & equal and
index fast full scan methods.
SINGLE TABLE ACCESS PATH SINGLE TABLE ACCESS SINGLE TABLE ACCESS PATH
Column: OPO_OMRPLA Col#: 3 Table: PATH TABLE: OP_TRANSIMP_REQS
OP_CURPLAN_TRANSIMPREQS Alias: TABLE: ORIG CDN: 509576 ROUNDED
OCT OP_TRANSIMP_MOVES CDN: 509576 CMPTD CDN:
NDV: 24732 NULLS: 0 DENS: ORIG CDN: 1126964 509576
4.0433e-05 LO: 2098117 HI: 2191648 ROUNDED CDN: 1126964 Access path: tsc Resc: 6743
NO HISTOGRAM: #BKT: 1 #VAL: 2 CMPTD CDN: 1126964 Resp: 6743
TABLE: Access path: tsc Resc: BEST_CST: 6743.00 PATH: 2
OP_CURPLAN_TRANSIMPREQS ORIG 17407 Resp: 17407 Degree: 1
CDN: 24736 ROUNDED CDN: 1 CMPTD BEST_CST: 17407.00 Table: OP_TRANSIMP_MOVES
CDN: 1 PATH: 2 Degree: 1 Join index: 31106
Access path: tsc Resc: 10 Resp: 10
Access path: index (iff)
Index:
OP_CURPLAN_TRANSIMPREQS_PK
TABLE:
OP_CURPLAN_TRANSIMPREQS
RSC_CPU: 0 RSC_IO: 14
IX_SEL: 0.0000e+00 TB_SEL:
1.0000e+00
Access path: iff Resc: 14 Resp: 14
Skip scan: ss-sel 0 andv 5
ss cost 5
index io scan cost 0
Access path: index (index-only)
Index:
OP_CURPLAN_TRANSIMPREQS_PK
TABLE:
OP_CURPLAN_TRANSIMPREQS
RSC_CPU: 0 RSC_IO: 2
IX_SEL: 4.0433e-05 TB_SEL: 4.0433e-
05
BEST_CST: 2.00 PATH: 4 Degree: 1
Terminology Used
I will describe the way it has calculated the cost for single table access paths for
OP_CURPLAN_TRANSIMPREQS.
Keep in mind that the thus established k factor is only used in the CBO’s estimate for
the cost of a full table scan – or fast index scan. The actual cost in I/O of a full table
scan depends on other factors besides db_file_multi_block_read_count like proper
extent planning and management and whether data blocks of the table are already
present in the buffer pool and how many.
PREDICATES AND FILTER FACTORS
In order to understand the index access cost calculations it is necessary to discuss filter factors
and their relationship to the query’s predicates. A filter factor is a number between 0 and 1 and,
in a nutshell, is a measure for the selectivity of a predicate, or, in mathematical terms, the
probability that a particular row will match a predicate or set of predicates. If a column has 10
distinct values in a table and a query is looking for all rows where the column is equal to one of
the values, you intuitively expect that it will return 1/10 of all rows, presuming an equal
distribution. That is exactly the filter factor of a single column for an equal predicate:
FF = 1/NDV = density
Both statistics, NDV (a.k.a. num_distinct) and density, are in dba_tab_columns but the optimizer
uses the value of density in most its calculations. This has ramifications, as we will see.
Here is the relationship between predicates and the resulting filter factor:
In the absence of histogram data, all filter factors are derived from the density of the columns
involved, or from fixed values if no statistics have been gathered at all. Histograms complicate
the filter factor calculations and go beyond the scope of this paper. In spite of that, we’ll take a
brief look at what changes when histograms are calculated on a column and correct the record
on a couple of myths on the way. Histograms are gathered when the option
• Value-Based Histograms
The number of buckets is equal to the number of distinct values and the “endpoint” of each
bucket is the number of rows in the table with that value. Oracle builds a value based histogram
if the size in the analyze for a column is larger than the number of distinct values for the column.
• Height-Based Histograms
Height-based histograms place approximately the same number of values into each bucket, so
that the endpoints of the bucket are determined by how many values are in that bucket. Oracle
builds a height based histogram if the size in the analyze for a column is smaller than the
number of distinct values for the column.
A commonly held belief is that histograms are useless, i.e. have no effect on the access plan, if
bind variables are used since the value is not known at parse time and the CBO – histograms
are only ever used by the cost based optimizer – can not determine from the histogram if it
should use an available index or not. While the latter is true, the gathering of histograms still can
change the access plan. Why and how?
Because
a) The optimizer uses the density in its filter factor calculation, not NDV.
b) The density is calculated differently for columns with histograms, not simply as 1/NDV
If the density changes, the costs of plan segments and the cardinality estimates of row sources
change and hence the entire plan may change. I have successfully exploited that aspect of
histograms in tuning. Another popular myth is that there is no point in gathering histograms on
non-indexed columns, likely born from the assumption that the only role of a histogram is to let
the optimizer decide between a tablescan and an index access. However, the CBO uses filter
factors, derived from column densities, to calculate the costs of different access plans, and
ultimately choose an access plan; and filter factors are used in two places in this calculation of
access plan costs:
In the latter calculation, the filter factors of predicates on non-indexed columns do get used.
What is more, the row source cardinality has ultimately the more decisive effect as it guides the
composition of the overall access plan. In my experience, the cause for a poor access plan is
more often the incorrect estimate of the cardinality of a row source than the incorrect estimate of
an index access cost.
Having discussed filter factors, we are now ready to look at the other part of the single table
access path evaluation – the calculation of the cost of accessing the needed rows via an index.
The formula used to calculate the cost of index fast full scan is same as that of FTS:
(Level+Leaf Blocks)/ DB_FILE_MULTIBLOCK_READ_COUNT
Cost = blevel+1
RSC_IO = blevel + 1
=2
· Leaf_blocks contributes to all but the unique index access cost. Index compression, where
appropriate, reduces the number of leaf blocks, lowering the index access costs and can
therefore result in the CBO using an index where before it did not.
· Except for a unique index access, the height of the index (blevel) contributes negligibly to
the cost.
· The clustering factor affects only an index range scan, but then heavily, given the fact the
this is orders of magnitude bigger than LEAF_BLOCKS.
Remember that the rule based optimizer parses statements where none of the tables have
statistics. What if only one, or a few but not all, tables, indexes, or columns have no statistics?
There are different claims for what Oracle does in that case. The most popular is that Oracle
uses the rule based optimizer for that table. But parsing is not a mix and match exercise – a
statement is either parsed entirely by CBO or entirely by RBO. If at least one table in the query
has statistics (and optimizer goal is not rule) then the cost base optimizer does parse the query.
Another claim is that Oracle will dynamically, at runtime, estimate statistics on the objects
without statistics. I have not seen any evidence of that.
Let us examine what the 10053 trace shows if the statistics on the any of the tables are deleted
or that particular table is not analyzed:
To find the default column statistics, remember that column statistics are not listed under “BASE
STATISTICAL INFORMATION” but under “SINGLE TABLE ACCESS PATH”:
The defaults for NDV and DENS do not look like nice round defaults like the ones for index
statistics. Note also that the density is not 1/NDV. Checking the default column statistics for
differently sized tables confirms the suspicion that the column statistics defaults, like the table
statistics defaults, are not static, but are derived from the NBLKS value. Examining and plotting
the default column density of tables of different sizes in a scatter diagram against the tables’
number of blocks shows not only a correlation but clear functional dependency:
density = 0.7821*nblks-0.9992 or practically density = 0.7821 / nblks
Note that again this is an empirically derived formula. For small values of NBLKS it does not
yield the exact same densities as observed in the 10053 trace. Note also that the equation of
the correlation function between density and NBLKS is different for different db_block_size
values. The actual formula is not really important, but the fact that there is a dependency of the
default column density on NBLKS and db_block_size is interesting.
Similar to filter factors for range predicates with bind variables, the optimizer uses defaults for
missing/unknown statistics of “not analyzed” tables, indexes, or columns. These defaults range
from plain static values (index statistics and avg_row_size) to actual values (NBLKS) and, as we
have seen, some values (CDN and column densities) derived from NBLKS using complicated
formulas. The analysis of these default values was done on Oracle 8.1.7. It should surprised
nobody if some of the default values or calculations have changed, and will continue to change,
from Oracle release to release.
GENERAL PLANS
This concludes the single table costing part of the 10053 CBO trace. The next section in the
10053 event trace starts with the heading “GENERAL PLANS”. For all but the simplest SQL
statements, this section makes up the largest part of the trace. This is where the CBO looks at
the costs of all different orders and ways of joining the individual tables and come up with the
best access plan.
The cost based optimizer has three join methods in its arsenal. These are the three join
methods and their costing formulas:
3. HA – HASH JOIN
Join cost = (cost of accessing outer table)
+ (Cost of building hash table)
+ (Cost of accessing inner table)
We will look at the join costing of each method for our simple query.
JOIN ORDER [N]
In the GENERAL PLANS section, the optimizer evaluates every possible permutation of the
tables in the query. An exception is tables in a correlated subquery where it is semantically
impossible to access the table(s) in the subquery before any tables of the outer query and thus
not all permutations constitute a valid plan. Apart from this situation, each permutation
evaluation in the trace is given a number and is then followed by the list of the tables – with their
aliases to distinguish them if a table occurs multiple times in the query.
The initial join order is chosen by ordering the tables in order of increasing computed cardinality.
In this simple case, the CBO is examining the cost of accessing first the
OP_CURPLAN_TRANSIMPREQS table and then joining the OP_TRANSIMP_REQS table and
finally joining the table OP_TRANSIMP_MOVES.
”Now Joining:” is always the beginning of the next batch of join evaluations, introducing the table
joining the fray – no pun intended.
Join order[1]: OP_CURPLAN_TRANSIMPREQS [OCT] OP_TRANSIMP_REQS [ITR] OP_TRANSIMP_MOVES [ITM]
Now joining: OP_TRANSIMP_REQS [ITR] *******
NL Join
Outer table: cost: 2 cdn: 1 rcz: 12 resp: 2
Inner table: OP_TRANSIMP_REQS
Access path: tsc Resc: 6743
Join: Resc: 6745 Resp: 6745
Join cardinality: 509658 = outer (1) * inner (509576) * sel (1.0000e+00) [flag=0]
Best NL cost: 6745 resp: 6745
Join result: cost: 6745 cdn: 509658 rcz: 37
Now joining: OP_TRANSIMP_MOVES [ITM] *******
NL Join
Outer table: cost: 6745 cdn: 509658 rcz: 37 resp: 6745
Inner table: OP_TRANSIMP_MOVES
Access path: tsc Resc: 17407
Join: Resc: 8871623551 Resp: 8871623551
Access path: index (scan)
Index: OP_TRANSIMP_MOVES_NU1
TABLE: OP_TRANSIMP_MOVES
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 5.3132e-07 TB_SEL: 2.8230e-13
Join: resc: 1535719 resp: 1535719
Access path: index (scan)
Index: OP_TRANSIMP_MOVES_PK
TABLE: OP_TRANSIMP_MOVES
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 5.3132e-07 TB_SEL: 2.8230e-13
Join: resc: 1535719 resp: 1535719
Join cardinality: 1 = outer (509658) * inner (1126964) * sel (1.0427e-12) [flag=0]
Using index (ndv = 509576 sel = 3.9807e-07)
Best NL cost: 1535719 resp: 1535719
SM Join
Outer table:
resc: 6745 cdn: 509658 rcz: 37 deg: 1 resp: 6745
Inner table: OP_TRANSIMP_MOVES
resc: 17407 cdn: 1126964 rcz: 12 deg: 1 resp: 17407
using join:1 distribution:2 #groups:1
SORT resource Sort statistics
Sort width: 2 Area size: 131072 Max Area size: 1257472 Degree: 1
Blocks to Sort: 3183 Row size: 51 Rows: 509658
Initial runs: 21 Merge passes: 5 IO Cost / pass: 4775
Total IO sort cost: 13529
Total CPU sort cost: 0
Total Temp space used: 53208000
SORT resource Sort statistics
Sort width: 2 Area size: 131072 Max Area size: 1257472 Degree: 1
Blocks to Sort: 3312 Row size: 24 Rows: 1126964
Initial runs: 22 Merge passes: 5 IO Cost / pass: 4968
Total IO sort cost: 14076
Total CPU sort cost: 0
Total Temp space used: 54322000
Merge join Cost: 51757 Resp: 51757
HA Join
Outer table:
resc: 6745 cdn: 509658 rcz: 37 deg: 1 resp: 6745
Inner table: OP_TRANSIMP_MOVES
resc: 17407 cdn: 1126964 rcz: 12 deg: 1 resp: 17407
using join:8 distribution:2 #groups:1
Hash join one ptn Resc: 1355 Deg: 1
hash_area: 60 (max=307) buildfrag: 3049 probefrag: 3302 ppasses: 1
Hash join Resc: 25507 Resp: 25507
Join result: cost: 25507 cdn: 1 rcz: 49
Best so far: TABLE#: 0 CST: 2 CDN: 1 BYTES: 12
Best so far: TABLE#: 1 CST: 6745 CDN: 509658 BYTES: 18857346
Best so far: TABLE#: 2 CST: 25507 CDN: 1 BYTES: 49
Join order[2]: OP_CURPLAN_TRANSIMPREQS [OCT] OP_TRANSIMP_MOVES [ITM] OP_TRANSIMP_REQS [ITR]
Now joining: OP_TRANSIMP_MOVES [ITM] *******
NL Join
Outer table: cost: 2 cdn: 1 rcz: 12 resp: 2
Inner table: OP_TRANSIMP_MOVES
Access path: tsc Resc: 17407
Join: Resc: 17409 Resp: 17409
Access path: index (scan)
Index: OP_TRANSIMP_MOVES_NU1
TABLE: OP_TRANSIMP_MOVES
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 5.3132e-07 TB_SEL: 5.3132e-07
Join: resc: 5 resp: 5
Access path: index (scan)
Index: OP_TRANSIMP_MOVES_PK
TABLE: OP_TRANSIMP_MOVES
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 5.3132e-07 TB_SEL: 5.3132e-07
Join: resc: 5 resp: 5
Join cardinality: 1 = outer (1) * inner (1126964) * sel (5.3132e-07) [flag=0]
Best NL cost: 5 resp: 5
SM Join
Outer table:
resc: 2 cdn: 1 rcz: 12 deg: 1 resp: 2
Inner table: OP_TRANSIMP_MOVES
resc: 17407 cdn: 1126964 rcz: 12 deg: 1 resp: 17407
using join:1 distribution:2 #groups:1
SORT resource Sort statistics
Sort width: 2 Area size: 131072 Max Area size: 1257472 Degree: 1
Blocks to Sort: 3312 Row size: 24 Rows: 1126964
Initial runs: 22 Merge passes: 5 IO Cost / pass: 4968
Total IO sort cost: 14076
Total CPU sort cost: 0
Total Temp space used: 54322000
Merge join Cost: 31485 Resp: 31485
HA Join
Outer table:
resc: 2 cdn: 1 rcz: 12 deg: 1 resp: 2
Inner table: OP_TRANSIMP_MOVES
resc: 17407 cdn: 1126964 rcz: 12 deg: 1 resp: 17407
using join:8 distribution:2 #groups:1
Hash join one ptn Resc: 12 Deg: 1
hash_area: 60 (max=307) buildfrag: 1 probefrag: 3302 ppasses: 1
Hash join Resc: 17421 Resp: 17421
Join result: cost: 5 cdn: 1 rcz: 24
Now joining: OP_TRANSIMP_REQS [ITR] *******
NL Join
Outer table: cost: 5 cdn: 1 rcz: 24 resp: 5
Inner table: OP_TRANSIMP_REQS
Access path: tsc Resc: 6743
Join: Resc: 6748 Resp: 6748
Access path: index (unique)
Index: OP_TRANSIMP_REQS_PK
TABLE: OP_TRANSIMP_REQS
RSC_CPU: 0 RSC_IO: 2
IX_SEL: 1.9624e-06 TB_SEL: 1.9624e-06
Join: resc: 7 resp: 7
Access path: index (eq-unique)
Index: OP_TRANSIMP_REQS_PK
TABLE: OP_TRANSIMP_REQS
RSC_CPU: 0 RSC_IO: 2
IX_SEL: 0.0000e+00 TB_SEL: 0.0000e+00
Join: resc: 7 resp: 7
Join cardinality: 1 = outer (1) * inner (509576) * sel (1.9624e-06) [flag=0]
Using index (ndv = 509576 sel = 3.9807e-07)
Best NL cost: 7 resp: 7
SM Join
Outer table:
resc: 5 cdn: 1 rcz: 24 deg: 1 resp: 5
Inner table: OP_TRANSIMP_REQS
resc: 6743 cdn: 509576 rcz: 25 deg: 1 resp: 6743
using join:1 distribution:2 #groups:1
SORT resource Sort statistics
Sort width: 2 Area size: 131072 Max Area size: 1257472 Degree: 1
Blocks to Sort: 1 Row size: 37 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
SORT resource Sort statistics
Sort width: 2 Area size: 131072 Max Area size: 1257472 Degree: 1
Blocks to Sort: 2371 Row size: 38 Rows: 509576
Initial runs: 16 Merge passes: 4 IO Cost / pass: 3557
Total IO sort cost: 8300
Total CPU sort cost: 0
Total Temp space used: 40936000
Merge join Cost: 15049 Resp: 15049
HA Join
Outer table:
resc: 5 cdn: 1 rcz: 24 deg: 1 resp: 5
Inner table: OP_TRANSIMP_REQS
resc: 6743 cdn: 509576 rcz: 25 deg: 1 resp: 6743
using join:8 distribution:2 #groups:1
Hash join one ptn Resc: 8 Deg: 1
hash_area: 60 (max=307) buildfrag: 1 probefrag: 2302 ppasses: 1
Hash join Resc: 6756 Resp: 6756
Join result: cost: 7 cdn: 1 rcz: 49
Best so far: TABLE#: 0 CST: 2 CDN: 1 BYTES: 12
Best so far: TABLE#: 2 CST: 5 CDN: 1 BYTES: 24
Best so far: TABLE#: 1 CST: 7 CDN: 1 BYTES: 49
Final:
CST: 7 CDN: 1 RSC: 7 RSP: 7 BYTES: 49
IO-RSC: 7 IO-RSP: 7 CPU-RSC: 0 CPU-RSP: 0
I am going to explain the second join method as it is the best one giving less cost. In the similar
way you can go through First join method.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=49)
1 0 NESTED LOOPS (Cost=4 Card=1 Bytes=49)
2 1 NESTED LOOPS (Cost=3 Card=1 Bytes=24)
3 2 INDEX (RANGE SCAN) OF 'OP_CURPLAN_TRANSIMPREQS_PK' (UNIQUE) (Cost=2 Card=1 Bytes=12)
4 2 TABLE ACCESS (BY INDEX ROWID) OF 'OP_TRANSIMP_MOVES' (Cost=2 Card=1 Bytes=12)
5 4 INDEX (RANGE SCAN) OF 'OP_TRANSIMP_MOVES_PK' (UNIQUE) (Cost=1 Card=1)
6 1 TABLE ACCESS (BY INDEX ROWID) OF 'OP_TRANSIMP_REQS' (Cost=2 Card=1 Bytes=25)
7 6 INDEX (UNIQUE SCAN) OF 'OP_TRANSIMP_REQS_PK' (UNIQUE)
The reason the cost is coming less here is that it is using the index scan cost in the cost calculations for those columns which
are used in the where clause and in the below mentioned our case the cost is higher due to giving equal weightage to index
scan and table scans and considering them equally costly. Also in below case it may not be caching the index in preference to
table so as to have these values from the cache itsself.
If these parameters are set as 0/100 then plan that is coming as(as in our case):
Alter session set OPTIMIZER_INDEX_CACHING = 0;
Alter session set OPTIMIZER_INDEX_COST_ADJ = 100;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=49)
1 0 NESTED LOOPS (Cost=7 Card=1 Bytes=49)
2 1 NESTED LOOPS (Cost=5 Card=1 Bytes=24)
3 2 INDEX (RANGE SCAN) OF 'OP_CURPLAN_TRANSIMPREQS_PK' (UNIQUE) (Cost=2 Card=1 Bytes=12)
4 2 TABLE ACCESS (BY INDEX ROWID) OF 'OP_TRANSIMP_MOVES' (Cost=3 Card=1 Bytes=12)
5 4 INDEX (RANGE SCAN) OF 'OP_TRANSIMP_MOVES_NU1' (NON-UNIQUE) (Cost=2 Card=1)
6 1 TABLE ACCESS (BY INDEX ROWID) OF 'OP_TRANSIMP_REQS' (Cost=2 Card=1 Bytes=25)
7 6 INDEX (UNIQUE SCAN) OF 'OP_TRANSIMP_REQS_PK' (UNIQUE) Cost=1 Card=1)
This is the plan we are getting as our parameters are set as 0/100 which are default settings. However if these parameters are
changed to 90/30 from 0/100 you can find the reduction in cost from 7 to 4 which is approx 75% reduction.
The rest of the join is self-explanatory.
Join cardinality = card of outer*card of inner*Selectivity of inner table
cost = cost of outer + cardinality of outer * cost of inner
Cost = 2 + 1*3
=5
I am skipping the portions for hash and sort merge joins as for those there are lot of work needs
to be done to reach to a conclusion to find out how oracle is calculating the cost for sorting and
hashing and finally merging.
It joins 3 tables, enough to have some permutations of join orders to consider (6), but
not so many that one gets lost following the trail. With 4 tables, there would be 24
permutations, with 5 tables 120 permutations.
In general, there are n! (n faculty ) possible permutations to join n tables, so the number
of permutations – and the cost and time to evaluate them – rises dramatically as the
number of tables in the SQL increases. The init.ora parameter
“optimizer_max_permutations” can be used to limit the number of permutations the
CBO will evaluate.
There are also plenty of predicates to give rise to many different base table access
considerations. But we are not going to go into those details. I only want to demonstrate
the join permutations of our case as follows:
Join order[1] and join order[2] here are completely evaluated by CBO and as soon as
these join order starts and it finds the cost of the join exceeding at anty stage than the
optimum cost till now(7 in our case for Join Order [2]), it abandons that plan there and
then. It doesn’t evaluate the plan any further. Here the cost of accessing the first table
itsself is more than 7(accessing the table OP_TRANSIMP_REQS cost 17407 and
accessing the table OP_TRANSIMP_MOVES cost 6743), so there is no point in carry
forwarding and doing any join analysis on these join orders.
CONCLUSION
If you take one piece of advice from this paper then this:
· Pay close attention to the cardinality values in an explain plan. Wrong estimates by
the CBO for the cardinalities can have a devastating effect on the choice of access
plan and the performance of the SQL statement. Assuming for the moment that the
estimate of cardinality for any of the table in the plan above is incorrect and too low,
then that would invalidate the entire plan costs and perhaps make the optimizer
choose a different, and better, plan. Armed with the knowledge of how the CBO
arrived at this number you know what to change in order to help the optimizer make
a better assumption: the filter factor of the combined predicates on this table, i.e.
ultimately the densities and NDVs of the predicate columns. Here are some means
of doing that:
· Using histograms on some columns and experiment with the sizes (number of
buckets). Even if the histogram data itself is never used, the density for a column
with a histogram changes. Generally, collecting a value-based histogram for a
column significantly reduces its density, often by orders of magnitude, whereas
collecting a height-based histogram increases the density.
· Deleting the statistics of a column – now possible with the DBMS_STATS
procedure – an index or an entire table and let the optimizer use the default
densities.
· “Seeding” a table with rows that artificially either increase the cardinality of a
column (more than that of the table) and thus lower its density, or increase the
cardinality of the table without changing that of the column and thus raise the
column’s density.
· Using brute force and setting the density. This is possible as of Oracle 8.0 with
the advent of DBMS_STATS.SET_xxx_STATS. I recommend using
export_table_stats to export the statistics into a STATTAB table, modify the
value(s) there, and then import the statistics back into the dictionary. Of course
you’ll make two exports – one into a “work” statid and one into a “backup” statid
so that you can restore the original statistics in case the change does not have
the desired effect.
Submitted By:
Sumit Popli
Tata Consultancy Services
sumitp@delhi.tcs.co.in
popli_sumit@yahoo.com