You are on page 1of 35

Join Cardinality Estimation Methods

Chinar Aliyev
chinaraliyev@gmail.com
As it is known the Query Optimizer tries to select best plan for the query and it does that based on
generating all possible plans, estimates cost each of them, and selects cheapest costed plan as optimal
one. Estimating cost of the plan is a complex process. But cost is directly proportionate to the
number of I/O s. Here is functional dependence between number of the rows retrieved from
database and number of I/O s. So the cost of a plan depends on estimated number of the rows
retrieved in each step of the plan – cardinality of the operation. Therefore optimizer should accurately
estimate cardinality of each step in the execution plan. In this paper we going to analyze how oracle
optimizer calculates join selectivity and cardinality in different situations, like how does CBO
calculate join selectivity when histograms are available (including new types of histograms, in 12c)?,
what factors does error (estimation) depend on? And etc. In general two main join cardinality
estimation methods exists: Histogram Based and Sampling Based.
Thanks to Jonathan Lewis for writing “Cost Based Oracle Fundamentals” book. This book actually
helped me to understand optimizer`s internals and to open the “Black Box”. In 2007 Alberto
Dell`Era did an excellent work, he investigated join size estimation with histograms. However there
are some questions like introduction of a “special cardinality” concept. In this paper we are going to
review this matter also.
For simplicity we are going to use single column join and columns containing no null values. Assume
we have two tables t1, t2 corresponding join columns j1, j2 and the rest of columns are filter1 and
filter2. Our queries are
(Q0)
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2
AND t1.filter1 ='value1'
AND t2.filter2 ='value2'

(Q1)
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2;

(Q2)
SELECT COUNT (*)
FROM t1, t2;

© 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 1

Histogram Based Estimation

Understanding Join Selectivity and Cardinality

As you know the query Q2 is a Cartesian product. It means we will get Join Cardinality - 𝐶𝑎𝑟𝑑𝑐𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛
for the join product as: 𝐶𝑎𝑟𝑑𝑐𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛
=num_rows(𝑡1 )*num_rows(𝑡2 )
Here num_rows(𝑡𝑖 ) is number of rows of corresponding tables. When we add join condition into the
query (so Q1) then it means we actually get some fraction of Cartesian product. To identify this
fraction here Join Selectivity has been introduced.
Therefore we can write this as follows 𝐶𝑎𝑟𝑑𝑄
1 ≤ 𝐶𝑎𝑟𝑑𝑐𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛 𝐶𝑎𝑟𝑑𝑄
1 = 𝐽𝑠𝑒𝑙 *𝐶𝑎𝑟𝑑𝑐𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛 = 𝐽𝑠𝑒𝑙 ∗num_rows(𝑡1 )*num_rows(𝑡2 ) (1)

𝐽𝑠𝑒𝑙 = 𝐶𝑎𝑟𝑑𝑄1 /(𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡1 ) ∗ 𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡2 ))

Definition: Join selectivity is the ratio of the “pure”-natural cardinality over the Cartesian product.
I called 𝐶𝑎𝑟𝑑𝑄1 as “pure” cardinality because it does not contain any filter conditions.
Here 𝐽𝑠𝑒𝑙 is Join Selectivity. This is our main formula. You should know that when optimizer tries to
estimate JC- Join Cardinality it first calculates 𝐽𝑠𝑒𝑙 . Therefore we can use same 𝐽𝑠𝑒𝑙 and can write
appropriate formula for query Q0 as
𝐶𝑎𝑟𝑑𝑄0 = 𝐽𝑠𝑒𝑙 ∗Card(𝑡1 )*Card(𝑡2 ) (2)

Here Card (𝑡𝑖 ) is final cardinality after applying filter predicate to the corresponding table. In other
words 𝐽𝑠𝑒𝑙 for both formulas (1) and (2) is same. Because 𝐽𝑠𝑒𝑙 does not depend on filter columns,
unless filter conditions include join columns. According to formula (1)
𝐽𝑠𝑒𝑙 = 𝐶𝑎𝑟𝑑𝑄1 /(num_rows(𝑡1 ) ∗ num_rows(𝑡2 )) (3)
𝐶𝑎𝑟𝑑𝑄1 ∗Card(𝑡1 )∗Card(𝑡2 )
or 𝐶𝑎𝑟𝑑𝑄0 = (4)
num_rows(𝑡1 )∗num_rows(𝑡2 )

Based on this we have to find out estimation mechanism of expected cardinality - 𝑪𝒂𝒓𝒅𝑸𝟏 . Now
consider that for 𝑗𝑖 join columns of 𝑡𝑖 tables here is not any type of histogram. So it means in this

© 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 2

case optimizer assumes uniform distribution and for such situations as you already know 𝐽𝑐𝑎𝑟𝑑 and 𝐽𝑠𝑒𝑙
are calculated as

𝐽𝑠𝑒𝑙 = 1/max(𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ), 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 )) (5)

𝐽𝑐𝑎𝑟𝑑 = 𝐽𝑠𝑒𝑙 ∗ num_rows(𝑡1 ) ∗ num_rows(𝑡2 )
The question now is: where does formula (5) come from? How do we understand it?
According to (3) in order to calculate 𝐽𝑠𝑒𝑙 we first have to estimate “pure” expected cardinality - 𝐶𝑎𝑟𝑑𝑄
1 . And it only depends on Join Columns. For 𝑡1 table, based on uniform distribution the
number of rows per distinct value of the 𝑗1 column will be num_rows(𝑡1 )/𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ) and for 𝑡
2 table it will be num_rows(𝑡2 )/𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 ). Also here will be
min(𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ), 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 )) common distinct values. Therefore expected “pure” cardinality
is
num_rows(𝑡1 ) num_rows(𝑡2 )
𝐶𝑎𝑟𝑑𝑄1 = min (𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ), 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 )) ∗ ∗ (6)
num_dist(𝑗1 ) num_dist(𝑗2 )

Then according to formula (3) Join Selectivity will be:
𝐶𝑎𝑟𝑑𝑄1 min (𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ),𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 ))
𝐽𝑠𝑒𝑙 = = =
num_rows(𝑡1 )∗num_rows(𝑡2 ) 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 )∗ 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 )
1
max (𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ),𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗2 ))

As it can be seen we have got formula (5). Without histogram optimizer is not aware of the data
distribution, so in dictionary of the database here are not “(distinct value, frequency)” – this pairs
indicate column distribution. Because of this, in case of uniform distribution, optimizer actually
thinks and calculates “average frequency” as 𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡1 )/𝑛𝑢𝑚_𝑑𝑖𝑠𝑡(𝑗1 ). Based on “average
frequency” optimizer calculates “pure” expected cardinality and then join selectivity. So if a table
column has histogram (depending type of this) optimizer will calculates join selectivity based on
histogram. In this case “(distinct value, frequency)” pairs are not formed based on “average
frequency”, but are formed based on information which are given by the histogram.

Case 1. Both Join columns have frequency histograms
In this case both join columns have frequency histogram and our query(freq_freq.sql) is
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2 AND t1.f1 = 13;

Corresponding execution plan is

© 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 3

"J1"="T2".num_rows from user_tables where table_name in (‘T1’. 0 ) prev_endpoint. tab t2. col col j1 j2 value frequency ep value frequency ep 0 40 40 0 100 100 1 40 80 2 40 140 2 80 160 3 120 260 3 100 260 4 20 280 © 2016 Chinar A.filter("T1".access("T1". --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 2272 | 2260 | |* 3 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 | | 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 . endpoint_number ep FROM (SELECT endpoint_number. but it has not been exactly estimated. Therefore information about columns and tables is as follows.’T2’).NVL (prev_endpoint. Aliyev Hotsos Symposium March 6-10 4 . And why? How did optimizer calculate cardinality of the join as 2272? If we enable SQL trace for the query then we will see oracle queries only histgrm$ dictionary table. tab_name num_rows T1 1000 T2 1000 (Freq_values1) SELECT endpoint_value COLUMN_VALUE. Select table_name. endpoint_number ."F1"=13) Estimation is good enough for this situation. 0) frequency. NVL (LAG (endpoint_number. endpoint_value FROM user_tab_histograms WHERE table_name = 'T1' AND column_name = 'J1') ORDER BY endpoint_number tab t1. 1) OVER (ORDER BY endpoint_number)."J2") 3 .

These data is spread between max(min_value(j1). so we get following table tab t1. Also we have to take equval values.min_value(j2)) and min(max_value(j1). So “(column value. Firstly we have to find common data for the join columns.0568 𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡1)∗𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡2) 1000∗1000 And eventually our cardinality will be according to the formula (2) © 2016 Chinar A. frequency)” pair gives us all opportunity to estimate cardinality of any kind of operations. Now we have to try to estimate pure cardinality 𝐶𝑎𝑟𝑑𝑄1 then we can find out 𝐽𝑠𝑒𝑙 according to formula (3). It means we are not interested in the data which column value greater than 10 for j2 column. col j1 j2 value frequency value frequency 0 40 0 100 2 80 2 40 3 100 3 120 4 160 4 20 5 60 5 40 6 260 6 100 8 120 8 40 9 60 9 20 Because of this expected pure cardinality 𝐶𝑎𝑟𝑑𝑄1 Will be 100*40+80*40+100*120+160*20+60*40+260*100+120*40+60*20=56800 and Join selectivity 𝐶𝑎𝑟𝑑𝑄1 56800 𝐽𝑠𝑒𝑙 = = = 0. col tab t2. Aliyev Hotsos Symposium March 6-10 5 .max_value(j2)). 4 160 420 5 40 320 5 60 480 6 100 420 6 260 740 8 40 460 7 80 820 9 20 480 8 120 940 10 20 500 9 60 1000 11 60 560 12 20 580 13 20 600 14 80 680 15 80 760 16 20 780 17 80 860 18 80 940 19 60 1000 Frequency histograms exactly express column distribution.

Another question was why we did not get exact cardinality – 2260? Although join selectivity by definition does not depend on filter columns and conditions. max/min value. At least it will require additional estimation algorithms.𝑗2)) 𝐽𝑠𝑒𝑙 = (7) 𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡1)∗𝑛𝑢𝑚_𝑟𝑜𝑤𝑠(𝑡2) Here freq is corresponding frequency of the column value. distinct values after applying filter – in line 3 of execution plan.056800) Join Card . As result we got the following formula for join selectivity. but filtering actually influences this process. I think it is not an issue in general. --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 56800 | 56800 | | 3 | TABLE ACCESS FULL| T1 | 1 | 1000 | 1000 | | 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .000000 As we see same number as in above execution plan."J1"="T2".Rounded: 2272 Computed: 2272. So if we remove filter condition from above query we will get exact estimation.000000) * sel (0.(Case2 q1) where t1. © 2016 Chinar A.𝑗1)∗𝑓𝑟𝑒𝑞(𝑡2.j2. Join columns with height-balanced (equ-height) and frequency histograms Now assume one of the join column has height-balanced(HB) histogram and another has frequency(FQ) histogram (Height_Balanced_Frequency. ∑min _𝑚𝑎𝑥 𝑖=max _𝑚𝑖𝑛 𝑓𝑟𝑒𝑞(𝑡1. Optimizer does not consider join column value range.access("T1". Aliyev Hotsos Symposium March 6-10 6 .𝐶𝑎𝑟𝑑𝑄0 = 𝐽𝑠𝑒𝑙 ∗Card (𝑡1 )*Card (𝑡2 ) = 0.j1 = t2. spreads.sql) We are going to investiagte cardinality estimation of the two queries here select count(*) from t1.000000) * inner (1000."J2") It means optimizer calculates “average” join selectivity. Join Card: 2272. t2 --.0568 ∗ 40 ∗ 1000 = 2272 Also if we enable 10053 event then in trace file we see following lines regarding on join selectivity.000000 = outer (40. Case 2. then efficiency of whole estimation process could be harder. It is not easy to resolve.

col tab t2 .HB and for the column j2 here is frequency . Also we can find number of the distinct values per bucket.j1 we do not have exact frequencies. col j1 j2 column frequency ep column frequency ep value value 1 0 0 1 2 2 9 1 1 7 2 4 16 1 2 48 3 7 24 1 3 64 4 11 32 1 4 40 1 5 48 2 7 56 1 8 64 2 10 72 2 12 80 3 15 Ferquency column for t1.“populars not matching populars” © 2016 Chinar A.j2 column therefore our base source must be values of the t2. It is actually “frequency of the bucket”. t2 --. But for the t1. Although this approach gave me some approximation and estimation of the join cardinality but it did not give me exact number(s) which oracle optimizer calculates and reports in 10053 trace file. We have exact “value. The appropriate information from user_tab_histogrgrams dictionary view shown in Table 3.(Join Over histograms).“populars matching populars” .select count(*) from t1.j2 column. We have to find what information we need to improve this approach? .j1 of Table 3 does not express real frequency for the column. Within HB bucket we aslo can assume uniform distrbution then we can estimate size of this disjoint subset – {value of FQ and Bucket of HB} . So we have to ignore HB histogram buckets with endpoint number greater than 10.j2 and t1.j1 = t2. Firstly Alberto Dell'Era investigated joins based on the histograms in 2007.f1=11.FQ histogram avilable.(Case2 q2) where t1. For the column J1 here is Height balanced histogram . frequency” pairs of the t2. Then for every value of the frequency histogram we can identify appropriate bucket of the HB histogram. HB histogram cointains buckets which hold approximately same number of rows. His approach was based on grouping values into three major categories: . tatb t1. Aliyev Hotsos Symposium March 6-10 7 . First we have to identify common values.

We have to walk t2. But my point of view to the matter is quite different: . then it is easy to calculate “pure” cardinality so it means we can easily and more accurately estimate join cardinality. .We have to identify “(distinct value. frequency)” pairs based on the HB histogram.“not popular subtables” Estimating each of them.015384615≈0. The sum of cardinality of each group will give us join cardinality.j2 columns (histograms) values and identify second part of “(distinct value.J1 30 T2 130 T2.𝐶𝑎𝑟𝑑𝑄1 =∑ 𝐹𝑟𝑒𝑞𝑏𝑎𝑠𝑒𝑑 𝑡1 (value=t2.Our main data here is t2. frequency) pairs based on HB histogram.Then we can calculate join selectivity and cardinality We have to identify (value. So let’s identify “(value. tab_name num_rows (user_tables) col_name num_distinct T1 11 T1. frequency) pairs based on HB histogram. .num_buckets=15( as max(ep) from Table 3 ) Number of popular buckets – num_pop_bucktes=9(as sum(frequency) from table 3 where frequency>1) Popular value counts – pop_value_cnt=4(as count(frequency) from table 3 where frequency>1) 𝑛𝑢𝑚_𝑢𝑛𝑝𝑜𝑝_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 NewDensity= = 𝑢𝑛𝑝𝑜𝑝_𝑛𝑑𝑣∗𝑛𝑢𝑚_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 𝑛𝑢𝑚_𝑏𝑢𝑐𝑘𝑒𝑡𝑠−𝑛𝑢𝑚_𝑝𝑜𝑝_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 15−9 =(30−4)∗15 =0. we should not approach as uniform for the single value which is locate within the bucket.j2) . because HB gives us actually “average” density – NewDensity (actually the density term has been introduced to avoid estimation errors in non-uniform distribution case and has been improved with new density mechanism) for un- popular values and special approach for popular values. frequency)” based on height balanced histogram.j2)*𝐹𝑟𝑒𝑞𝑏𝑎𝑠𝑒𝑑 𝑡2 (value=t2. because it gives us exact frequencies . Aliyev Hotsos Symposium March 6-10 8 . frequency)” pairs based on the HB histogram will be: © 2016 Chinar A.J2 4 Number of buckets . But when forming (value.j2 column`s data.015385 (8) (𝑁𝐷𝑉−𝑝𝑜𝑝_𝑣𝑎𝑙𝑢𝑒_𝑐𝑛𝑡)𝑛𝑢𝑚_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 And for popular values selectivity is: 𝑒𝑝_𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑒𝑝_𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 𝑛𝑢𝑚_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 15 So “(value. frequency)” pairs to approximate “pure” cardinality 𝐶𝑎𝑟𝑑𝑄1 .

000000) * sel (0. Execution plan of the query is as follows --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 129 | 104 | | 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | | 4 | TABLE ACCESS FULL| T1 | 1 | 130 | 130 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 ."J1"="T2".015385 .00005 130*0. column popular frequency calculated value 1 N 2.333333 It means we were able to figure out exact estimation mechanism in this case.access("T1".(num_rows*frequency/num_buckets) 64 Y 17."J2") And the corresponding information from 10053 trace file : Join Card: 129.(num_rows*frequency/num_buckets) We have got all “(value.00005 7 2 4.3335 129.000000) * inner (11.(num_rows*density) 7 N 2. frequency)” pairs so according formula (7) we can calculate Join Selectivity.0001 7 2. col j1 j2 column frequency column frequency freq*freq value value 1 2.0001 48 17.00005 1 2 4.33333 Sum 129.090443 numrows(t1)∗numrows(t2) 11∗130 So our “pure” cardinality is 𝐶𝑎𝑟𝑑𝑄1 = 129.015385 .33333333 130*2/15 .33333333 48 3 52 64 17.Rounded: 129 Computed: 129.33333333 64 4 69. Execution plan of the second query (Case2 q2) as follows © 2016 Chinar A. tab t1 .090443) Join Card .333333 = outer (130. Aliyev Hotsos Symposium March 6-10 9 .33333333 130*2/15 .3335 129.3335 And finally 𝐽𝑠𝑒𝑙 = = =0.00005 130*0.(num_rows*density) 48 Y 17. col tab t2 .

In addition we do not have the value 40 in t2. Aliyev Hotsos Symposium March 6-10 10 . Reviewing Alberto Dell'Era`s – complete formula (join_histogram_complete.value>70."J2") 3 .173333 It actually confirms our approach.val)=70 it means we have to ignore column values t2. col value tatb t2. --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 5 | 7 | |* 3 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | | 4 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .173333 = outer (11. it is correct because it must be rounded up but during join estimation process optimizer consider original values rather than rounding. Because of this we are getting following table © 2016 Chinar A. However execution plan shows cardinality of the single table t1 as 5.value=10 also max(t1.value)=20 due to we must ignore t2.200000) * sel (0.000000) * inner (5.090443*card(t1)*card(t2) Also from optimizer trace file we will see the following: Join Card: 5. col value column value frequency column value frequency 20 1 10 1 40 1 30 2 50 1 50 1 60 1 60 4 70 2 70 2 80 2 90 1 99 1 So we have to find common values.090443) Join Card .Rounded: 5 Computed: 5."F1"=11) According our approach join cardinality should computed as 𝐽𝑠𝑒𝑙 *card(t1)*card(t2)= 0. as you see min(t1.filter("T1"."J1"="T2".access("T1".sql) We can list column information from dictionary as below: tatb t1.value therefore we have to delete it also.

=> 𝑛𝑢𝑚_𝑢𝑛𝑝𝑜𝑝_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 6−2 newdensity= = (8−1)∗6 =0. Aliyev Hotsos Symposium March 6-10 11 .571428571 70 4 70 2 8 sum 16 Optimizer also estimated exactly 16 as we see it in execution plan of the query.value)=8. t1.value column frequency column frequency freq*freq value value 30 1.142857143 50 1 1.value freq calculated 30 1.value)=6.142857143 num_rows*newdensity 60 1.142857143 num_rows*newdensity 50 1.285714286 50 1.num_distinct(t1. col j1 tab t2 . col j2 column value frequency column frequency value 20 1 30 2 50 1 50 1 60 1 60 4 70 2 70 2 Num_rows(t1)=12.142857143 60 4 4.value t2.095238095."VALUE"="T2".142857143 30 2 2.num_buckets(t1. --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 16 | 13 | | 3 | TABLE ACCESS FULL| T1 | 1 | 12 | 12 | | 4 | TABLE ACCESS FULL| T2 | 1 | 14 | 14 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .142857143 num_rows*newdensity 70 4 num_rows*freq/num_buckets And finally cardinality will be. so appropriate column values 𝑢𝑛𝑝𝑜𝑝_𝑛𝑑𝑣∗𝑛𝑢𝑚_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 frequency based on HB histogram will be : t1. tatb t1.142857143 60 1.access("T1"."VALUE") © 2016 Chinar A.

num_dist(t2. col value value column frequency ep column frequency ep value value 10 2 2 10 2 2 20 1 3 20 1 3 30 2 5 50 3 6 40 1 6 60 1 7 50 1 7 70 4 11 60 1 8 70 2 10 And num_rows(t1)=20.000000) * inner (1.818182. To prove this one we can use join_histogram_essentials1.818182 = outer (20. Density (t1.value has only one value 20 with frequency one. col tab t2 . In this case t1 table is same as in join_histogram_essentials.sql.sql) This is quite interesting case.val)=5. Above mechanism does not give us exact number as expected as optimizer estimation.sql .value freq EP 10 2 2 20 1 1 20 1 3 30 2 5 40 1 6 50 1 7 60 1 8 70 2 10 In this case oracle computes join cardinality 2 as rounded up from 1.05.There Alberto also has introduced “Contribution 4: special cardinality”. t1.090909) © 2016 Chinar A. Because in this case to estimate frequency for un-popular values oracle does not use density it uses number of distinct values per bucket and number of rows per distinct values instead of the density. tatb t1. Reviewing Alberto Dell'Era`s essential case (join_histogram_essentials. firstly because in oracle 12c optimizer calculates join cardinality as 31 but not as 30.num_rows(t2)=11. Aliyev Hotsos Symposium March 6-10 12 . and second in this case old and new densities are same.000000) * sel (0. Let’s interpret the case.value freq EP t2.value)=(10-6)/((11-3)*10)= 0. The by corresponding information from user_tab_histograms.value)=11. but it seems it is not necessary.num_dist(t1. The column T2. We can it from trace file Join Card: 1.

value.join_histogram_essentials2.818182 rows.value.5) and t1 table is same as in previous case. Aliyev Hotsos Symposium March 6-10 13 .sql.value column distribution. t1.Join Card .1=1. Every bucket has 1.value freq EP 10 2 2 20 5 5 20 1 3 30 2 5 40 1 6 50 1 7 60 1 8 70 2 10 Corresponding lines from 10053 trace file: Join Card: 5. But if we increase frequency of the t2. number of rows per distinct value within bucket) Question is why? In such cases I think optimizer tries to minimize estimation errors. 𝑣𝑎𝑙𝑢𝑒 = 1 Frequency (non-popular t1.818182. So © 2016 Chinar A.050000) Join Card . And this is our cardinality.value) = { 𝑛𝑢𝑚_𝑑𝑖𝑠𝑡_𝑏𝑢𝑐𝑘𝑒𝑡𝑠 1 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡2. 𝑣𝑎𝑙𝑢𝑒 > 1 or Cardinality = max (frequency of t2.818182 It looks like the estimation totally depend on t1.1 distinct value and within bucket every distinct value has 2/1. frequency)=(20. The (t2.000000) * sel (0.value freq EP t2.000000 = outer (20.000000 Tests show that in such cases cardinality of the join computed as frequency of the t2.value .000000) * inner (5. So num_rows_bucket(number of rows per bucket) is 2 and num_rows_distinct(number of distinct value per bucket) is 20/11=1. So it means frequency of the popular value will be: 𝑛𝑢𝑚_𝑟𝑜𝑤𝑠_𝑏𝑢𝑐𝑘𝑒𝑡 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡2.Rounded: 5 Computed: 5.val.Rounded: 2 Computed: 1.

(num_rows*frequency/num_buckets) Therefore tab t1.63636364 We get 30.818181818 . col value value column value frequency column value frequency freq*freq 10 4 10 2 8 20 1.000000 --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 31 | 29 | | 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | | 4 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .value 60 1.col tab t2 .access("T1". Let`s see trace file and execution plan Join Card: 31. Aliyev Hotsos Symposium March 6-10 14 .818181818 50 1 50 3 3 60 1.000000) * sel (0.(num_rows*frequency/num_buckets) 20 1.Rounded: 31 Computed: 31.64≈31 as expected cardinality.818181818 .(num_rows_bucket/num_dist_buckets) 70 4 .col val column frequency calculated value 10 4 .818181818 70 4 70 4 16 sum 30."VALUE"="T2".818181818 20 1 1.818181818 60 1 1.(num_rows_bucket/num_dist_buckets) 50 1 -frequency of t2.000000 = outer (11.000000) * inner (20."VALUE") © 2016 Chinar A.140909) Join Card . tab t1.

col j2 column value frequency endpoint_rep_cnt column value frequency 0 6 6 0 3 2 9 7 1 6 4 8 5 2 6 6 8 5 3 8 7 7 7 4 11 9 10 5 5 3 10 6 6 6 3 11 3 3 7 9 12 7 7 8 6 13 4 4 9 5 14 5 5 15 5 5 16 5 5 17 7 7 19 10 5 As it can be seen common column values are between 0 and 9. 1) OVER (ORDER BY endpoint_number). if value is popular then frequency will be equval to the corresponding endpoint repeat count. Aliyev Hotsos Symposium March 6-10 15 .frequency)” pairs based on hybrid histogram are same as height based histogram. © 2016 Chinar A. So we are not interested in buckets which contain column values greater than or equival 10. Note that the query is same -(Case2 q1). 0) frequency. ENDPOINT_REPEAT_COUNT. Hybrid histogram gives us more information to estimate single table and also join selectivity than height balanced histogram. Join columns with hybrid and frequency histograms In this case we are going to analyze how optimizer calculates join selectivity when there are hybrid and frequency histograms available on the join columns (hybrid_freq. SELECT endpoint_value COLUMN_VALUE.sql). ENDPOINT_REPEAT_COUNT. endpoint_value FROM user_tab_histograms WHERE table_name = 'T3' AND column_name = 'J3') ORDER BY endpoint_number Tab t1.NVL (prev_endpoint. col j1 tab t2 .The corresponding information from dictionary view. But how does optimizer use this information to estimate join? Principle of the estimation “(value. NVL (LAG (endpoint_number.0) prev_endpoint. endpoint_number . So it depends on popularity of the value. Specially endpoint repeat count column are used by optimizer to exactly estimate endpoint values. Case 3. endpoint_number FROM (SELECT endpoint_number.

csr.count 15. selNdv 0.frequency)” will be as t1.hreq TRUE. 1. SUBSTRB (DUMP (MAX (val) OVER (). 1. 64). 0. freq.5 density*num_rows 7 Y 7 endpoint_repeat_count 8 N 4. ssize 100. 240) minval FROM (SELECT val. pct 100.5 density*num_rows 6 N 4. 64). popCnt 4. COUNT ("ID") freq FROM "SYS".popfreq)/((NDV- popCnt)*crdn)=(100-28)/((20-4)*100)= 0. freq.6 DBMS_STATS: Evaluating hybrid histogram: cht. Therefore “(value.5 density*num_rows 2 Y 7 endpoint_repeat_count 3 N 4. ndv. 16. bktSzFrc . SUBSTRB (DUMP (MIN (val) OVER (). (SUM (freq) OVER ()) cdn."T1" t WHERE "ID" IS NOT NULL GROUP BY "ID")) ORDER BY valDBMS_STATS: > cdn 100. ndv 20.j1 popular frequency calculated 0 N 4. cdn.5 density*num_rows 5 N 4. normalize TRUE Average bucket size is 7. 240) maxval.5 density*num_rows 4 N 4. (CASE WHEN freq > ( (SUM (freq) OVER ()) / 15) THEN 1 ELSE 0 END) pop FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl no_substrb_pad */ "ID" val. 16.5 density*num_rows © 2016 Chinar A. Oracle considers value as popular when correspoindg endpoint repeat count is greater than or equval average bucket size.6.5 density*num_rows 9 N 4.otherwise it will be calculated based on the density. (SUM (pop * freq) OVER ()) popfreq. min_ssize 2500. mnb 15. 64). bktSize 6.5 density*num_rows 1 N 4. appr_ndv TRUE. selFreq 0. avg_bktsize 7.045. Also in our case density is (crdn. If we enable 10053 trace event you can clearly see columns and tables statistics. 240) ep. 1. 16. popFreq 28. 0. Aliyev Hotsos Symposium March 6-10 16 . If we enable dbms_stats trace when gathering hybrid histogram. (SUM (pop) OVER ()) popcnt. (COUNT ( * ) OVER ()) ndv. 0. We get the following DBMS_STATS: SELECT SUBSTRB (DUMP (val.

5 1 6 27 2 7 2 6 42 3 4.00 NEB: 0 ChainCnt: 0."J1"="T2". Table Stats:: Table: T2 Alias: T2 #Rows: 201 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.5 4 11 49.5 sum 307.05125 Lets now check execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 308 | 293 | | 3 | TABLE ACCESS FULL| T2 | 1 | 60 | 60 | | 4 | TABLE ACCESS FULL| T1 | 1 | 100 | 100 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 . Aliyev Hotsos Symposium March 6-10 17 .5 9 5 22.And then final cardinality.j2 value frequency value frequency freq*freq 0 4.5 3 8 36 4 4.access("T1".5 7 7 7 9 63 8 4.Rounded: 308 Computed: 307.5 1 4.5 5 4.000000) * inner (100.sql)."J2") And trace from 10053 trace file : Join Card: 307. We are going to use same query as above – (Case2 q1). t1.5 5 3 13.5 0 3 13.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 © 2016 Chinar A.5 8 6 27 9 4. Corresponding column information is.500000 = outer (60.5 6 4.051250) Join Card .5 Join sel 0. Both Join columns with Top frequency histograms In this case join columns have top-frequency histogram (TopFrequency_hist.500000 Case 4.000000) * sel (0.j1 t2.5 6 3 13.

j1).j2 freq 4 10 1 14 5 16 2 18 6 17 3 18 8 12 4 17 100 1 5 15 6 19 7 19 8 22 9 17 10 18 11 13 200 2 By definition of the Top-Frequency histogram. In principle we have to see and gather common values which are between 4 and 100.015385 Min: 4. we can say that here are two types of buckets. So after identifying common values.00 NEB: 0 ChainCnt: 0.j2))=4 and also max(max(t1.000000 Histogram: Top-Freq #Bkts: 56 UncompBkts: 56 EndPtVals: 5 ActualVal: yes t1.004975 Min: 1. Therefore we could create following table: © 2016 Chinar A.000000 Max: 200. The max(min(t1. Aliyev Hotsos Symposium March 6-10 18 . but for “low frequency” values we can approach by using “Uniform distribution”. for popular values we are going to use exact frequency and for non-popular values new density.j2))=100.j1 freq t2. Oracle placed high frequency values into some buckets (appropriate) and rest of the values of the table oracle actually “placed” into another “bucket”.j1).00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. Therefore for “high frequency” values we also have exact frequencies. #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Max: 100.000000 Column (#1): J2(NUMBER) AvgLen: 4 NDV: 21 Nulls: 0 Density: 0.max(t2.min(t2.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 14 Nulls: 0 Density: 0. Firstly we have to build high frequency pairs based on common values. So we actually have “high frequency” and “low frequency” values.000000 Histogram: Top-Freq #Bkts: 192 UncompBkts: 192 EndPtVals: 12 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 65 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.

00045 11 13 1. Aliyev Hotsos Symposium March 6-10 19 .freq freq*freq 4 17 10 170 5 15 16 240 6 19 17 323 7 19 1.access("T1".000025=1.000000 © 2016 Chinar A.000325 100 1 1. Lets see execution plan. Due to cardinality for each individual unpopular rows will be: CardIndvPair=unpopular_freq(t1."J1"="T2".000425 10 18 1.000000 = outer (201.000025 19. Frequency for each unpopular rows of the t2.999975 also for t1.j2 is num_rows(t2)*density(t2)= 201*0. ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| | 1 | SORT AGGREGATE | | 1 | 6 | | |* 2 | HASH JOIN | | 1074 | 6444 | 4 (0)| | 3 | TABLE ACCESS FULL| T1 | 65 | 195 | 2 (0)| | 4 | TABLE ACCESS FULL| T2 | 201 | 603 | 2 (0)| ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .000025 18.000000) * inner (65."J2") And the trace file Join Card: 1074.unpopular_rows(t2. common value j2.j1)* unpopular_freq(t2.000475 8 22 12 264 9 17 1.015385 = 1.000000) * sel (0.j2)= 0.000025 1.000025 17.freq j1.0017 For t1 table we have num_rows-popular_rows=65-56=9 unpopular rows and ndv- popular_value_count=14-5=9 also for t2 table we have 201-192=9 unpopular rows and 21-12=9 unpopular distinct values.j2))* CardIndvPair=9 Therefore final cardinality of the our join will be CARD(high freq values)+ CARD(low freq values)=1065+9=1074.004975= 0.082204) Join Card .999975*1.000025 sum 1065.j1).j1 it is num_rows(t1)*density(t1)= 65*0.000025.Rounded: 1074 Computed: 1074.000025 13. Test cases show that oracle considers all low frequency (unpopular rows) values during join when top frequency histograms are available it means cardinality for “low frequency” values will be Card(Low frequency values)=max(unpopular_rows(t1.

Join columns with Top frequency and frequency histograms Now consider that we have tables which for join columns there are top frequency and frequency histogram (TopFrequency_Frequency. The columns distribution from dictionary is as below (Freq_values1) t1.000000 Histogram: Freq #Bkts: 4 UncompBkts: 16 EndPtVals: 4 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.000000 Max: 4.j1 freq t2. It is quite interesting case. 2.023810 Min: 1. 4} values.000000 Max: 25.j2 freq 1 3 0 4 2 3 1 7 3 5 2 2 4 5 4 3 5 5 6 4 7 6 8 4 9 5 25 1 In this case there is a frequency histogram for the column t2.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. Case 5. Because of this we have frequency histogram and it should be our main source and this case should have been similar to the case 3.00 NEB: 0 ChainCnt: 0.min(t2.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.j2 and we have exact common {1.00 NEB: 0 ChainCnt: 0.j1).j2)) and min(max(t1. But test cases show that optimizer also considers all the values from top frequency histogram which are between max(min(t1.j2)).062500 Min: 0.000000 Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes Considered values and their frequencies: © 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 20 .sql). Table Stats:: Table: T2 Alias: T2 #Rows: 16 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.j1).000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 4 Nulls: 0 Density: 0.max(t2.

Considered j1.00 NEB: 0 ChainCnt: 0.freq freq*freq values 1 3 7 21 2 3 2 6 3 5 1 5 4 5 3 15 sum 47 Here for the value 3 j2.j3 freq 1 3 0 4 2 3 1 7 3 5 2 2 4 5 4 3 5 5 10 2 6 4 7 6 8 4 9 5 25 1 Tables and columns statistics: Table Stats:: Table: T3 Alias: T3 #Rows: 18 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.freq j2. Aliyev Hotsos Symposium March 6-10 21 .000000 © 2016 Chinar A.000000 Let see another example-TopFrequency_Frequency2.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.069940) Join Card .00 NEB: 0 ChainCnt: 0.j1 freq t3.sql t1.000000) * inner (42.freq calculated as num_rows(t2)*density=16*0.000000 = outer (16.000000) * sel (0. And in 10053 file Join Card: 47.000000 Histogram: Freq #Bkts: 5 UncompBkts: 18 EndPtVals: 5 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.000000 Max: 25.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.055556 Min: 0.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.000000 Column (#1): J3(NUMBER) AvgLen: 3 NDV: 5 Nulls: 0 Density: 0.Rounded: 47 Computed: 47.023810 Min: 1.000000 Max: 10.0625=1.

000272 And from trace file Join Card: 73.000008 num_rows*density 4. so optimizer should consider only values from frequency histogram.000032 9 5 freq 1. Aliyev Hotsos Symposium March 6-10 22 ."J3") As we see here is significant difference.096561) Join Card .00002 num_rows*density 2 freq 2.000008 num_rows*density 5.000000) * inner (42. these values should be main source of the estimation process – as similar to the case3.000000 But if we compare estimated cardinality with actual values then we will see: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 73 | 42 | | 3 | TABLE ACCESS FULL| T3 | 1 | 18 | 18 | | 4 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .00004 4 5 freq 3 freq 15 5 5 freq 1.000048 8 4 freq 1.freq calculated freq*freq val 1 3 freq 7 freq 21 2 3 freq 2 freq 6 3 5 freq 1.000008 num_rows*density 6. error estimation is enough big.000008 num_rows*density 5.00004 10 1.000032 7 6 freq 1.000008 num_rows*density 5.access("T1"."J1"="T3". 73 vs 42.freq calculated j3.00004 sum 73.000000) * sel (0.freq calculated j3. That is why we said before its quite interesting case.freq calculated freq*freq 1 3 freq 7 freq 21 2 3 freq 2 freq 6 © 2016 Chinar A.Rounded: 73 Computed: 73.000008 num_rows*density 4.000000 = outer (18.00004 6 4 freq 1. So if consider and walk on the values of the frequency histogram as common values then we will get the following table: common val j1. Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes Considered column values and their frequencies are: Considered j1.

0. freq.sql). Column information from dictionary t1. freq. 4 5 freq 3 freq 15 10 1. Case 6. (SUM (freq) OVER ()) cdn. Aliyev Hotsos Symposium March 6-10 23 . 16. 16.freq 1 3 3 1 5 3 10 6 2 3 4 6 6 3 4 6 5 2 4 5 9 8 5 5 4 10 1 1 6 3 11 2 2 7 1 13 5 3 26 1 30 1 And from dbms_stats trace file SELECT SUBSTRB (DUMP (val. ndv. 1. For example here is hybrid histogram for t1.00002 num_rows*density 2 freq 2. such estimation is very close to the actual rows. 16.j1 t1. (SUM (pop) OVER ()) popcnt. 1. (SUM (pop * freq) OVER ()) popfreq. 64).freq ep_rep_cnt t2. 1. 0.j1 and top frequency histogram for t2. Join columns with Hybrid and Top frequency histograms It is quite hard to interpret when one of join column has top frequency histogram (Hybrid_topfreq. SUBSTRB (DUMP (MAX (val) OVER (). 64). 0. 240) ep. SUBSTRB (DUMP (MIN (val) OVER (). 240) maxval. 64).j2 j2.00004 sum 44. 240) minval FROM (SELECT val.j2. cdn. (COUNT ( * ) OVER ()) ndv.00004 You can clearly see that. (CASE WHEN freq > ( (SUM (freq) OVER ()) / 8) THEN 1 ELSE 0 END) pop © 2016 Chinar A.

bktSzFrc 0 DBMS_STATS: Evaluating hybrid histogram: cht.00 NEB: 0 ChainCnt: 0.000000 Max: 13.freq t2.54544 5 12. normalize TRUE High frequency common values are located between 1 and 7.freq freq*freq 1 2. popFreq 12. COUNT ("J1") freq FROM "T". ndv 13.063636 Min: 1. Also we have two popular values for t1. popCnt 2. pct 100.033333 Min: 1.54544 3 7. csr.000000 Histogram: Hybrid #Bkts: 8 UncompBkts: 40 EndPtVals: 8 ActualVal: yes Therefore common values and their frequencies are: Common value t1. avg_bktsize 5.hreq TRUE.00 NEB: 0 ChainCnt: 0.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 12 Nulls: 0 Density: 0.00 SPC 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl no_substrb_pad */ "J1" val.000000 Max: 30. ssize 40.j1 column :{3.00 SPC 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. selNdv 0. selFreq 0.63632 3 6 4 24 © 2016 Chinar A."T1" t WHERE "J1" IS NOT NULL GROUP BY "J1")) ORDER BY val DBMS_STATS: > cdn 40. min_ssize 2500. appr_ndv TRUE.7272 2 2. bktSize 5.4}. Table Stats:: Table: T2 Alias: T2 #Rows: 30 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3. Aliyev Hotsos Symposium March 6-10 24 .000000 Histogram: Top-Freq #Bkts: 27 UncompBkts: 27 EndPtVals: 9 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 40 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3. mnb 8.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.count 8.

72704+ 9= 103.54544 4 10.086439) Join Card .54544 sum 94. INSERT INTO t1 VALUES(7). Aliyev Hotsos Symposium March 6-10 25 ."J2") The above test case was a quite simple because popular values of the hybrid histogram also are located within range of high frequency values of the top frequency histogram.727273 And execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 104 | 101 | | 3 | TABLE ACCESS FULL| T2 | 1 | 30 | 30 | | 4 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 . Let see another example CREATE TABLE t1(j1 NUMBER).72704 Moreover we have num_rows-top_freq_rows=30-27=3 infrequency rows and NDV- top_freq_count=12-9=3 unpopular NDV.000000) * inner (40.18176 6 2. Also we have 3 unpopular (low frequency values) rows and 3 unpopular NDV.076923 .54544≈3 frequency and we have 3 low frequency-unpopular rows therefore unpopular cardinality is 3*3=9 so final cardinality will be CARD(popular rows)+CARD(un popular rows)= 94. © 2016 Chinar A.Rounded: 104 Computed: 103."J1"="T2". In current case I think based on the uniform distribution. I have done several test cases and I think cardinality of the join in this case consists two parts: High frequency values and low frequency values (unpopular).54544 1 2.72704. 5. In different cases estimating cardinality for low frequency values was different for me.727273 = outer (30.54544 3 7. INSERT INTO t1 VALUES(6). 6} of the hybrid histogram actually located 1-6 range of top frequency histogram. Lines from 10053 trace file Join Card: 103.j1 “average frequency” is number for rows(t1)/NDV(j1)=40/13= 3. 4 6 5 30 5 2. INSERT INTO t1 VALUES(2).63632 7 2.access("T1".000000) * sel (0. It means for t1. For each “low frequency” value we have num_rows(t1)*density(j1)=2. I mean popular values {1.

INSERT INTO t2 VALUES(4). INSERT INTO t1 VALUES(9). COMMIT. INSERT INTO t2 VALUES(3). INSERT INTO t2 VALUES(4). /*exec dbms_stats. null). INSERT INTO t2 VALUES(1).INSERT INTO t1 VALUES(8). Aliyev Hotsos Symposium March 6-10 26 . INSERT INTO t2 VALUES(2). INSERT INTO t2 VALUES(19). INSERT INTO t1 VALUES(4). INSERT INTO t1 VALUES(17). INSERT INTO t1 VALUES(7). INSERT INTO t1 VALUES(5). INSERT INTO t1 VALUES(7). INSERT INTO t2 VALUES(2). INSERT INTO t1 VALUES(2). INSERT INTO t2 VALUES(3).'t2'. INSERT INTO t2 VALUES(1). Lets enable 10053 trace file ALTER SESSION SET EVENTS '10053 trace name context forever'.*/ ---Creating second table CREATE TABLE t2(j2 number). /* execute dbms_stats.method_opt=>'for all columns size 4').to_char(512+128+2048+32768+4+8+16)).gather_table_stats(null. INSERT INTO t1 VALUES(6). t2 © 2016 Chinar A. INSERT INTO t2 VALUES(4). INSERT INTO t2 VALUES(1). INSERT INTO t2 VALUES(18). INSERT INTO t1 VALUES(1). INSERT INTO t2 VALUES(3). INSERT INTO t2 VALUES(4). */ execute dbms_stats. COMMIT. INSERT INTO t1 VALUES(3). execute dbms_stats. INSERT INTO t2 VALUES(4). INSERT INTO t1 VALUES(20). INSERT INTO t2 VALUES(17).'t1'. INSERT INTO t1 VALUES(3).set_global_prefs('TRACE'. INSERT INTO t1 VALUES(6). EXPLAIN PLAN FOR SELECT COUNT ( * ) FROM t1. INSERT INTO t1 VALUES(7).method_opt=>'for all columns size 8'). INSERT INTO t1 VALUES(18). INSERT INTO t2 VALUES(3).set_global_prefs('trace'. INSERT INTO t2 VALUES(4).gather_table_stats(null. INSERT INTO t2 VALUES(4). INSERT INTO t2 VALUES(20). INSERT INTO t1 VALUES(19).

popFreq 7. avg_bktsize 3. pct 100.hreq TRUE. SELECT * FROM table (DBMS_XPLAN.j2. csr. mnb 8.j1 freq ep_rep ep_num t2.000000 Max: 20. ndv 13. selFreq 0.4 DBMS_STATS: Evaluating hybrid histogram: cht.00 NEB: 0 ChainCnt: 0. Aliyev Hotsos Symposium March 6-10 27 . ALTER SESSION SET EVENTS '10053 trace name context off'. bktSzFrc .00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. ssize 20.062500 Min: 1. min_ssize 2500.j2 freq ep_num 1 1 1 1 1 3 3 2 2 2 3 3 4 7 4 3 1 6 4 7 14 6 4 3 10 20 1 15 7 4 4 14 17 3 1 17 18 1 1 18 20 2 1 20 And lines from dbms_stats trace DBMS_STATS: > cdn 20.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 8 Nulls: 0 Density: 0. WHERE t1.count 8. Corresponding information from the dictionary: t1.4. bktSize 2. 7}. Table and column statistics from optimizer trace file: Table Stats:: Table: T2 Alias: T2 #Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. appr_ndv TRUE.00 NEB: 0 ChainCnt: 0.000000 Column (#1): J1(NUMBER) © 2016 Chinar A.display). normalize TRUE DBMS_STATS: Histogram gathering flags: 527 DBMS_STATS: Accepting histogram DBMS_STATS: Start fill_cstats .hybrid_enabled: TRUE So we our average bucket size is 3 and we have 2 popular values {6. popCnt 2.000000 Histogram: Top-Freq #Bkts: 15 UncompBkts: 15 EndPtVals: 4 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3. selNdv 0.j1 = t2. These values are not a part of high frequency values in top frequency histogram.

If we try find out frequencies of these values based on the top frequency histogram then we have to use density.75+5=31.18182 3.000000) * sel (0.7273.7273+8.477273 = outer (20.54546 3 4 1.18182≈1 frequency and we have 5 “low frequency” values (or unpopular rows of the j2 column) therefore cardinality for “low frequency” could be consider as 5. In our case the value 6 and 7 are popular values and popular frequency is 7 (sum of popular frequency). estType = JOIN Join Card: 31.18182 8.freq freq*freq 1 3 1.75. But as you see oracle computed final cardinality as 31.18182 1.72728 4 7 1. Aliyev Hotsos Symposium March 6-10 28 .18182 sum 17. And we also have num_rows(t1)- popular_rows(t1)=20-15=5 unpopular rows.477273 Firstly lets calculate cardinality for the high frequency values.000000 Histogram: Hybrid #Bkts: 8 UncompBkts: 20 EndPtVals: 8 ActualVal: yes ---Join Cardinality SPD: Return code in qosdDSDirSetup: NOCTX. Test cases show that optimizer in such situations also tries to take advantage of the popular values. Moreover for every “low frequency” values we have 1. High freq values j2.000000 Max: 20.0625=8.Rounded: 31 Computed: 31.078693) Join Card .18182 4. CARD = CARD (High frequency values) + CARD (Low frequency values) + CARD (Unpopular rows) = 17. So cardinality for popular values will be: Popular frequency*num_rows(t1)*density(j2)=7*20*0. And execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 31 | 26 | | 3 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 | | 4 | TABLE ACCESS FULL| T2 | 1 | 20 | 20 | --------------------------------------------------------------- © 2016 Chinar A.000000) * inner (20.27274 20 1 1.4773. AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.059091 Min: 1.freq j1. Eventually we can figure out final cardinality.7273 So our cardinality for high frequency values is 17. In my opinion popular rows of the hybrid histogram here play role.

't1'.'t2'. CREATE TABLE t1 AS SELECT * FROM dba_users. Sampling based estimation As we know in Oracle Database 12c new dynamic sampling feature has been introduced. Lets see following example and try to understand sampling mechanism in the join size estimation. Aliyev Hotsos Symposium March 6-10 29 .access("T1". CREATE TABLE t2 AS SELECT * FROM dba_objects.owner."J2") So it is an expected cardinality."USERNAME"="T2". Without histogram and in default sampling mode execution plan is: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 92019 | 54942 | | 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | | 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 . t2 WHERE t1."OWNER") Without histogram and automatic sampling mode execution plan is: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 58728 | 54942 | © 2016 Chinar A. group by and join operations for which oracle automatically defines sample size and tries to estimate cardinality of the operations.access("T1".gather_table_stats(user.method_opt=>'for all columns size 1'). But in general here could be estimation or approximation errors which are related with rounding.Predicate Information (identified by operation id): --------------------------------------------------- 2 ."J1"="T2".username = t2.gather_table_stats(user. EXECUTE dbms_stats. The dynamic sampling level=11 is designed for the operations like single table. SELECT COUNT (*) FROM t1. EXECUTE dbms_stats.method_opt=>'for all columns size 1').

---------.---------. Aliyev Hotsos Symposium March 6-10 30 . | 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | | 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 .-----. SQL ID: 1bgh7fk6kqxg7 Plan Hash: 3696410285 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(51."USERNAME"="T2#0".---------. -------.---------.05 0 879 0 1 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------.06 0.---------. 8) SEED(1) "T2#0"."OWNER") Note ----- .--------------------------------------------------- 1 SORT AGGREGATE (cr=879 pr=0 pw=0 time=51540 us) 30429 HASH JOIN (cr=879 pr=0 pw=0 time=58582 us cost=220 size=1287306 card=47678) 42 TABLE ACCESS FULL T1 (cr=3 pr=0 pw=0 time=203 us cost=2 size=378 card=42) 51770 TABLE ACCESS SAMPLE T2 (cr=876 pr=0 pw=0 time=35978 us cost=218 size=858204 card=47678) During parsing oracle has executed this SQL statement and result has been used to estimate size of the join.---------.06 0.05 0 879 0 1 ------."USERNAME"="T2". The SQL statement used sampling (undocumented format) actually read 50 percent of the T2 table blocks. Sampling was not applied to the T1 table because its size is quite small when compared to the second table and 100% sampling of the T1 table does not consume “lot of” time during parsing.-----.8135.00 0 0 0 0 Execute 1 0. It means oracle first identifies appropriate sampling size based on the table size and © 2016 Chinar A. The Question is how did optimizer actually get cardinality as 58728? How did optimizer calculate it? To give the explanation we could use 10046 and 10053 trace events. So in SQL trace file we could see following lines. ---------- Parse 1 0. "T1" "T1#1" WHERE ("T1#1"."OWNER")) innerQuery call count cpu elapsed disk query current rows ------. -------.00 0.dynamic statistics used: dynamic sampling (level=AUTO) As we see without histogram there is significant difference between actual and estimated rows but in case when automatic (adaptive) sampling is enabled estimation is good enough.access("T1".---------.---------.00 0.00 0 0 0 0 Fetch 1 0. ---------- total 3 0.

000000 = outer (42.94≈58728. estType = JOIN Join Card: 92019."USERNAME"="T2#0".970000 cardNSQ=58727.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.970000 cardNlj=58727.000000) * inner (368076.970000 cardNSQ_na=92019.000000) * sel (0. Table Stats:: Table: T2 Alias: T2 #Rows: 368076 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.000000 Column (#1): OWNER(VARCHAR2) AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.023810 Join Card: 1507639296.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. Aliyev Hotsos Symposium March 6-10 31 .8135 percent therefore our estimated cardinality is 30429/51.000000 = outer (172032.000000 to 58727.8135*100=58727. So we get 30429 rows based on the 51. prelen=2 Adjusted Join Cards: adjRatio=0.000000) * inner (92019.then execute specific SQL statement.032258 *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.000000 Join Card . 8) SEED(1) "T2#0".970000 Let see what will happen if we increase sizes of both tables – using multiple insert into t select * from t table name blocks row nums size mb T1 3186 172032 25 T2 6158 368076 49 In this case oracle completely ignores adaptive sampling and uses uniform distribution to estimate join size.00 NEB: 0 ChainCnt: 0."OWNER")) innerQuery © 2016 Chinar A. Now let’s check optimizer trace file: SPD: Return code in qosdDSDirSetup: NOCTX.970000 cardHjSmjNPF=58727. "T1" "T1#1" WHERE ("T1#1".000000) * sel (0.970000 due to adaptive dynamic sampling.Rounded: 58728 Computed: 58727.023810) In addition if you see SQL trace file then SQL ID: 0ck072zj5gf73 Plan Hash: 3774486692 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(12.00 NEB: 0 ChainCnt: 0.9912.023810) >> Join Card adjusted from 92019.638216 cardHjSmj=58727.000000 Column (#1): USERNAME(VARCHAR2) AvgLen: 9 NDV: 42 Nulls: 0 Density: 0.

---------.---------.--------------------------------------------------- 0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=36 us) 4049738 HASH JOIN (cr=885 pr=0 pw=0 time=2696440 us cost=1835 size=5288231772 card=195860436) 44649 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=28434 us cost=218 size=860706 card=47817) 6468 TABLE ACCESS FULL T1 (cr=124 pr=0 pw=0 time=28902 us cost=866 size=1548288 card=172032) It is obvious that oracle stopped execution of this SQL during parsing.call count cpu elapsed disk query current rows ------. In this case we will get the following lines SQL ID: 8pu5v8h0ghy1z Plan Hash: 3252009800 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(12.8 sec elapsed time) therefore oracle stopped it.---------.---------.91 0 885 0 0 ------.00 0. Aliyev Hotsos Symposium March 6-10 32 .---------. t2 WHERE t1.00 0 0 0 0 Fetch 1 1.00 0 2 0 0 Execute 1 0.---------.70 1."OBJECT_TYPE"='TABLE')) innerQuery call count cpu elapsed disk query current rows ------. -------.username = t2.-----.---------. Oracle did not complete HASH JOIN operation in this SQL.owner AND t2.70 1.---------.00 0 0 0 0 Fetch 1 0.---------.object_type = 'TABLE'. ---------- © 2016 Chinar A.9912. -------. -------.---------.00 0. ---------- Parse 1 0.01 0 761 0 1 ------.---------.---------.---------. although sample size is not small but in our case sample SQL actually took quite long time during parsing (1. Sizes of the tables are not big actually but why did optimizer ignore and decided to continue with previous approach? In my opinion here could be two factors.---------. ---------- total 3 1. I have added one filter predicate to the query: SELECT COUNT (*) FROM t1.01 0.-----.---------. -------. ---------- Parse 1 0. we can be confirm that with result of the above SQL and row source statistics. 8) SEED(1) "T2" WHERE ("T2".-----.00 0.---------.00 0.-----. we can see it from rows column of the execution statistics and also from row source statistics.91 0 887 0 0 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------.00 0 2 0 0 Execute 1 0.

74 1.-----. ---------- total 3 0.00 0 0 0 0 Fetch 1 0. "T2" "T2#0" WHERE ("T2#0". because it has filter predicate and optimizer thinks using ADS could be very efficient.---------. Aliyev Hotsos Symposium March 6-10 33 .1099.---------.02 0 6285 0 0 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------.01 0 763 0 1 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------. "T2#0". 8) SEED(1) "T1#1". -------. ROWS=5819. If we should have added predicate like t2.01 0.31) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T1#1") */ 1 AS C1 FROM "T1" SAMPLE BLOCK(25. However in this case only T2 table has been estimated. But the mechanism of estimating subset of the join and then estimate whole join principle in this case actually ignored.---------.--------------------------------------------------- 0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=32 us) 1412128 HASH JOIN (cr=6283 pr=0 pw=0 time=1243755 us cost=1908 size=215466084 card=5985169) 9880 TABLE ACCESS FULL T2 (cr=6167 pr=0 pw=0 time=20665 us cost=1674 size=87285 card=5819) 6035 TABLE ACCESS SAMPLE T1 (cr=116 pr=0 pw=0 time=6069 us cost=218 size=907137 card=43197) It means oracle firstly tried to estimate size of T2 table. ---------- Parse 1 0.---------.--------------------------------------------------- 1 SORT AGGREGATE (cr=761 pr=0 pw=0 time=14864 us) 756 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=5969 us cost=219 size=21378 card=1018) ******************************************************************************** SQL ID: 9jv79m9u42jps Plan Hash: 3525519047 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) OPT_ESTIMATE(@"innerQuery". We can easily see this fact from the trace file: BASE STATISTICAL INFORMATION *********************** Table Stats:: © 2016 Chinar A.76 1."USERNAME"="T2#0".owner=’HR’ then optimizer would tried to estimate also T1 table cardinality.total 3 0. TABLE.-----.00 0.---------.---------."OWNER")) innerQuery call count cpu elapsed disk query current rows ------.02 0 6283 0 0 ------.00 0 2 0 0 Execute 1 0.01 0.---------.---------."OBJECT_TYPE"='TABLE') AND ("T1#1". -------.

000000 Column (#1): OWNER(VARCHAR2) AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.262422 Card: 172032.000000 Rounded: 172032 Computed: 172032.000000 resc_cpu: 332391893348 resp: 5031394.000000 (scan (Disk)) = 865. Aliyev Hotsos Symposium March 6-10 34 .262422 Resp: 866.840000 (scan (Disk)) = 48493707. ** ** Not using old style dynamic sampling since ADS is enabled.059186 resp_io: 5022741.021277 Table: T2 Alias: T2 Card: Original: 368144.310000 due to adaptive dynamic sampling Rounded: 5819 Computed: 5819.310000 Non Adjusted: 7832. estType = TABLE *** 2016-02-13 19:59:49.000000 Cost_cpu: 48493708 Resp_io: 865.000000 resc_cpu: 332391893348 SPD: Return code in qosdDSDirSetup: NOCTX.851064 to 5819.000000 Total Scan CPU Cost = 48493707.262422 Degree: 1 Resp: 866.00 NEB: 0 ChainCnt: 0. estType = JOIN © 2016 Chinar A.032258 *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.059186 resc: 5031394.840000 Access Path: TableScan Cost: 866.000000 Scan CPU Cost (Disk) = 48493707.417 ** Performing dynamic sampling initial checks.000000 Access path analysis for T2 *************************************** SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T2[T2] SPD: Return code in qosdDSDirSetup: NOCTX.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0. ** ** Not using old style dynamic sampling since ADS is enabled.262422 Degree: 0 Cost_io: 865.000000 Scan IO Cost (Disk) = 865.000000 Non Adjusted: 172032. estType = TABLE *** 2016-02-13 19:59:49.840000 Total Scan IO Cost = 865.851064 Best NL cost: 5031394. Column (#6): OBJECT_TYPE(VARCHAR2) AvgLen: 9 NDV: 47 Nulls: 0 Density: 0.059186 resc_io: 5022741.000000 Resp_cpu: 48493708 Best:: AccessPath: TableScan Cost: 866. Table: T1 Alias: T1 Card: Original: 172032.416 ** Performing dynamic sampling initial checks.000000 Column (#1): USERNAME(VARCHAR2) AvgLen: 9 NDV: 42 Nulls: 0 Density: 0. Table: T2 Alias: T2 #Rows: 368144 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.023810 SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T1[T1] SPD: Return code in qosdDSDirSetup: NOCTX.000000 Bytes: 0.000000 >> Single Tab Card adjusted from 7832.

760000 In last case I have increased of both table sizes as table name blocks row nums size mb T1 101950 5505024 800 T2 196807 11780608 1600 In this case oracle completely ignored ADS and used statistics from dictionary to estimate size of tables and join cardinality. Summary In this paper has explained the mechanism of the oracle optimizer to calculate join selectivity and cardinality. frequency” pairs for each column. 2006 • Alberto Dell'Era. Also top frequency histogram gives us enough information for high frequency values. We learned that firstly optimizer calculates join selectivity based on “pure” cardinality.310000) * inner (172032. However for less significantly values we can approach “uniform distribution”.adellera.toadworld.760000 = outer (5819.automaticadaptive-dynamic- sampling-in-oracle-12c-part-3 © 2016 Chinar A. based on the column distribution. Moreover if here are hybrid histograms for the join columns in the dictionary then optimizer can use endpoint repeat count to formulate frequency. Automatic Sampling in Oracle 12c.it/investigations/join_over_histograms/JoinOverHistograms.Join Card: 23835893. Cost-Based Oracle: Fundamentals Based Oracle: Fundamentals.055 Join Card . The column distribution information is identified by the histogram. Join Over Histograms. Aliyev Hotsos Symposium March 6-10 35 . Although this process influenced by time restriction and size of the tables. Jonathan. To estimate the “pure” cardinality optimizer identifies “distinct value.com/platforms/oracle/w/wiki/11036.automaticadaptive-dynamic- sampling-in-oracle-12c-part-2 • https://www.com/platforms/oracle/w/wiki/11052. 2014 • https://www. 2007 • http://www. In addition optimizer has chance to estimate join cardinality via sampling.023810) *** 2016-02-13 19:59:51.000000) * sel (0. As a result optimizer can completely ignore adaptive dynamic sampling. Apress. References • Lewis. And as we know that. frequency histogram gives us completely whole data distribution of the column.pdf • Chinar Aliyev.toadworld.Rounded: 23835894 Computed: 23835893.