Beruflich Dokumente
Kultur Dokumente
Cube, Rollup, Grouping Sets New aggregates: Inverse Distribution, FIRST/LAST,etc Window Functions: Rank, Moving, Cumulative Statistical Functions: Correlation, Linear Regression,etc
Analytic Functions
Old tools still have more modeling power SQL Model enhances SQL with modeling power than SQL
Case Study Modeling with Excel at the personal scale Excel fits well
Inter-row calculation. Treats relations as an N-Dim array Symbolic references to cells and their ranges Multiple Formulas over N-Dim arrays Automatic Formula Ordering Recursive Model Solving Model is a relation & can be processed further in SQL Multiple arrays with different dimensionality in one query Parallel Processing in partitioning & formulas Multiple-self joins with one data access structure Multiple UNIONs with one data access structure Automatic Consolidation (models as views combine using SQL) Self Adjusting (as database changes no need to redefine)
Performance
Why Better?
prod time
vcr dvd 2001 2001 9 0
Relation Array
time
1 5 9 vcr
2 6 0 dvd prod
3 7 1 tv
4 8 2 pc
prod time
vcr dvd 2001 2001 9 0
Relation Array
time
1 5 9 vcr
2 6 0 dvd prod
3 7 1 tv
4 8 2 pc
DIMENSION BY (prod, time) MEASURES (s) RULES UPSERT ( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] =AVG(s) [CV(prod), time<2001] )
Sales in 2000 2x of previous Predict year vcr sales in 2002 Predict dvd sales in 2002
prod time
vcr dvd 2001 2001 9 0
Relation Array
1 2 9 vcr 2 4 0 dvd 3 6 1 tv 4 8 2 pc
DIMENSION BY (prod, time) MEASURES (s) RULES UPSERT ( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time<2001] )
Sales in 2000 2x of previous Predict year vcr sales in 2002 Predict dvd sales in 2002
prod time
vcr dvd 2001 2001 9 0
Relation
1 2 9 11 vcr dvd tv pc 2 4 0 3 6 1 4 8 2
DIMENSION BY (prod, time) MEASURES (s) RULES UPSERT ( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time<2001] )
Sales in 2000 2x of previous Predict year vcr sales in 2002 Predict dvd sales in 2002
prod time
vcr dvd 2001 2001 9 0
Relation
1 2 9 2 4 0 3 6 1 4 8 2
DIMENSION BY (prod, time) MEASURES (s) RULES UPSERT ( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time<2001] )
11 3 vcr dvd tv pc
Sales in 2000 2x of previous Predict year vcr sales in 2002 Predict dvd sales in 2002
prod time
vcr dvd 2001 2001 9 0
Return as Relation
SELECT prod, time, s FROM sales
Relation
1 2 9 2 4 0 3 6 1 4 8 2 1999 2000 2001 2002 tv pc
DIMENSION BY (prod, time) MEASURES (s) RULES UPSERT ( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time<2001] )
11 3 vcr dvd
Formula Options
Data as N-dim arrays with DIMENSIONS & MEASURES Data can be PARTITION-ed - creates an array per partition Formulas defined over the arrays express a (business) model Use symbolic addressing using familiar array notation Can be ordered automatically based on dependency between cells Can be recursive with a convergence condition recursive models Can UPDATE or UPSERT cells Support most SQL functions including aggregates
Can participate further in processing via joins, etc. Can define views containing Model computations Executed after joins, aggregation, window functions Before ORDER BY Can relate models of different dimensionality
s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000] s[vcr, 2002] = AVG(s)[vcr, t<2002] right
s[p IN (vcr,dvd), t<2002] = 1000 multi ref on left s[ANY, t=2002] = 2 * s[CV(p), CV(t)-1] left-right correlation s[p IN (SELECT prod FROM prod_tb), 2000] = 1000
ORDER BY t
ORDER BY t
options
MODEL KEEP NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s) RULES UPSERT ( s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] s[tv ,2003] = sum(s) [tv, t BETWEEN 2001 AND 2002] ) keep nav
2001
West West West West West West dvd tv dvd tv vcr vcr 2001 2002 2003 2003 2001 2002 300.00 500.00 500.00 200.00 400.00
MODEL INGNORE NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s) RULES UPSERT ( s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] s[tv ,2003] = sum(s) [tv, t BETWEEN 2001 AND 2002] ) ignore nav West West West West West West dvd tv dvd tv vcr vcr 2001 2002 2003 2003 2001 2002 300.00 500.00 300.00 500.00 200.00 400.00
assume 0
MODEL KEEP NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s) RULES UPSERT ( s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] s[tv ,2003] = sum(s) [tv, t BETWEEN 2001 AND 2002] ) keep nav ignore nav
2001
assume 0
MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s) RULES UPDATE IGNORE NAV ( UPDATE s[vcr, 2002] = s[vcr, 2002] * 1.2, UPSERT s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] )
MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s) RULES UPDATE IGNORE NAV ( UPDATE s[vcr, 2002] = s[vcr, 2002] * 1.2, UPSERT s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] )
Product Time dvd 2001 dvd 2002 vcr 2002 dvd 2001
updated
MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s) RULES UPDATE IGNORE NAV ( UPDATE s[vcr, 2002] = s[vcr, 2002] * 1.2, UPSERT s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001] )
Product Time dvd 2001 dvd 2002 vcr 2002 dvd dvd 2003 2001
updated upserted
Different dimensions: Reference dimensions. Represent each as nRelate Models with different
dimensional array: one main, others as reference or lookup arrays.
Sales Table c USA USA Poland France p dvd tv vcr vcr t 2001 2001 2001 2001 s 300.00 $ 500.00 $ 200.00 zl 100.00 fr Conv table converts currency to $ c USA Poland France ratio 1 0.24 0.12
SELECT c, p, t, s FROM sales MODEL REFERENCE convert ON (SELECT c, ratio FROM conv) DBY (c) MEASURES(r) MAIN DIMENSION BY (c,p,t) MEASURES (s) RULES UPSERT ( s[ANY, ANY, ANY] = r[CV(c)] * s[CV(c), CV(p), CV(t)] )
Conv table converts currency to $ c ratio USA 1 Poland 0.24 France 0.12
SELECT c, p, t, s FROM sales MODEL REFERENCE convert ON (SELECT c, ratio FROM conv) DBY (c) MEASURES(r) MAIN DIMENSION BY (c,p,t) MEASURES (s) RULES UPSERT ( s[ANY, ANY, ANY] = r[CV(c)] * s[CV(c), CV(p), CV(t)] ) USA USA Poland France dvd tv vcr vcr 2001 2001 2001 2001 300.00 $ 500.00 $ 48.00 $ 12.00 $
Converted values
Recursive Model Solving Model can contain with Until cyclic (recursive) formulas.
- If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations or Use FROM dual SELECT x, s UNTIL clause to specify convergence conditions MODEL DIMENSION BY (1 x) MEASURES (1024 s)
RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) s[1] <= 1) ( s[1] = s[1] / 2 ) Iteration 1 2 S value 1024 512 3 256 4 128 5 64 6 32 7 16 8 8 9 4 10 2
Recursive Model Solving Model can contain with Until cyclic (recursive) formulas.
- If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations or Use FROM dual SELECT x, s UNTIL clause to specify convergence conditions MODEL DIMENSION BY (1 x) MEASURES (1024 s)
RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) s[1] <= 1) ( s[1] = s[1] / 2 ) Iteration 1 2 S value 1024 512 3 256 4 128 5 64 6 32 7 16 8 8 9 4 10 2
Recursive Model Solving Model can contain with Until cyclic (recursive) formulas. They
are automatically detected, and error is reported. Unless cycles are intentional which is indicated with ITERATE option Use ITERATE clause to specify # of iterations or Use UNTIL to specify convergence conditions. Stop
SELECT x, s FROM dual if true. MODEL DIMENSION BY (1 x) MEASURES (1024 s) RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) s[1] <= 1) ( s[1] = s[1] / 2 ) Iteration 1 2 S value 1024 512 3 256 4 128 5 64 6 32 7 16
8 8
9 4
10 2
previous(s[1]) - s[1] = 4
Time Series Calculation (1) the ratio of current month sales of each product to Compute
sales one year ago, one quarter ago and one month ago. Assume: Sales cube with product sales per year, quarter, and month & a time table mapping periods to prior year, quarter and month
time table: maps t to y_ago, q_ago, m_ago t 1999m01 1999m02 y_ago 1998m01 1998m02 q_ago 1998m10 1998m11 m_ago 1998-m12 1999-m01 t 1999m01 1999m02 Sales cube: prod sales per y, q, m product vcr vcr sales 100.00 120.00 360.00
1999-q01 vcr
CV carries values from the left side to the right side Without Model, you need 3 outer joins and a regular join
SELECT product, sales, r_y_ago, r_q_ago, r_m_ago FROM sales_cube MODEL REFERENCE r ON (SELECT * from time) DIMENSION BY (t) MEASURES (y_ago, q_ago, m_ago) MAIN PARTITION BY (product) DIMENSION BY (t) MEASURES (sales, 0 r_y_ago, 0 r_q_ago, 0 r_m_ago) RULES ( r_y_ago[ANY] = s[CV(t)] / s[ y_ago[CV(t)] ], -- year ago r_q_ago[ANY] = s[CV(t)] / s[ q_ago[CV(t)] ], -- quarter ago r_m_ago[ANY] = s[CV(t)] / s[ m_ago[CV(t)] ] -- month ago );
Time Series Calculation (3) the ratio of current period sales of each product to sales a Compute
year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.
Sales cube: prod sales per y, q, m t product sales 1999m01 1999m02 vcr vcr 100.00 120.00 360.00 370.00
r_y_ago 0.050
r_q_ago 0.280
a_m_ago 0.830
Time Series Calculation (3) the ratio of current period sales of each product to sales a Compute
year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.
Sales cube: prod sales per y, q, m t product sales 1999m01 1999m02 vcr vcr 100.00 120.00 360.00 370.00 0.160 0.970 null
a_m_ago 0.830
Time Series Calculation (3) the ratio of current period sales of each product to sales a Compute
year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.
Sales cube: prod sales per y, q, m t product sales 1999m01 1999m02 vcr vcr 100.00 120.00 360.00 370.00 0.160 0.970 null null
Recursive Model Solving: Ledgerhave accounts: Net income, Interest, Taxes, etc. In my ledger, I (1)
I want to have 30 % of my Net income as Interest (F1) My Net income is Salary minus Interest, minus Tax (F2) Taxes are 38% of Gross (salaryinterest) and 28% of Capital_gain (F3)
SELECT account, b FROM ledger MODEL IGNORE NAV DIMENSION (account) MEASURES (balance b) RULES ITERATE (..) UNTIL .. ( b[interest] = b[net] * 0.30, --F1 b[net] = b[salary] b[interest] b[tax], --F2 b[tax] = (b[salary] b[interest]) * 0.38 + b[capital_gain] *0.28 --F3 )
Recursive Model Solving: Ledgerhave accounts: Net income, Interest, Taxes, etc. In my ledger, I (1)
I want to have 30 % of my Net income as Interest (F1) My Net income is Salary minus Interest, minus Tax (F2) Taxes are 38% of Gross (salaryinterest) and 28% of Capital_gain (F3)
SELECT account, b FROM ledger MODEL IGNORE NAV DIMENSION (account) MEASURES (balance b) RULES ITERATE (..) UNTIL .. ( b[interest] = b[net] * 0.30, --F1 b[net] = b[salary] b[interest] b[tax], --F2 b[tax] = (b[salary] b[interest]) * 0.38 + b[capital_gain] *0.28 --F3 ) net
F1
interest interest
F2
tax
F3
Recursive Model Solving: Ledgerknow Salary & Capital_gains. What are my In my ledger, I (2)
Net income, Interest expense & Taxes?
SELECT account, b FROM ledger MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b) RULES ITERATE (10000) UNTIL (ABS(b[net] - PREVIOUS(b[net])) < 0.01) ( b[interest] = b[net] * 0.30, b[net] = b[salary] b[interest] b[tax], b[tax] = (b[salary] b[interest]) * 0.38 + b[capital_gain] *0.28 ) Input Ledger
Account Balance salary 100,000 capital_gains 15,000 net 0 tax 0 interest 0
Recursive Model Solving: Ledger (2) In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?
SELECT account, b FROM ledger MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b) RULES ITERATE (10000) UNTIL (ABS(b[net] - PREVIOUS(b[net])) < 0.01) ( b[interest] = b[net] * 0.30, b[net] = b[salary] b[interest] b[tax], b[tax] = (b[salary] b[interest]) * 0.38 + b[capital_gain] *0.28 ) Input Ledger
Account Balance salary 100,000 capital_gains 15,000 net 0 tax 0 interest 0
Recursive Model Solving: Ledger (2) In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?
SELECT account, b FROM ledger MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b) RULES ITERATE (10000) UNTIL (ABS(b[net] - PREVIOUS(b[net])) < 0.01) ( b[interest] = b[net] * 0.30, b[net] = b[salary] b[interest] b[tax], b[tax] = (b[salary] b[interest]) * 0.38 + b[capital_gain] *0.28 ) Input Ledger
Account Balance salary 100,000 capital_gains 15,000 net 0 tax 0 interest 0
Financial Functions: NPVpresent value of a series of periodic cash flows. NPV net
Cash_Flow table year 1999 i 0 prod vcr amount npv -100.00
values i (1 + rate) i
amount
(1 + rate) i
1 2 3 0
2000
dvd
22.00
amount
(1 + rate) i
amount[1]/power(1+rate,1) + npv[1-1]
1 2 3 0
2000
dvd
22.00
amount
(1 + rate) i
1 2 3 0
2000
dvd
22.00
amount
(1 + rate) i
1 2 3 0
amount[3]/power(1+rate,3) + npv[3-1]
2000
dvd
22.00
amount
(1 + rate) i
1 2 3 0
12.00 10.00 20.00 -200.00 amount[i]/power(1+rate, i) + npv[i-1] npv[ANY] ORDER BY i = amount[ CV(i) ] / power(1+rate, CV(i)) + npv[CV(i) 1]
2000
dvd
22.00
Financial Functions: NPV amount (2) Net present value of a series of i NPV
periodic cash flows.
Cash_Flow table and npv for rate = 0.14 year 1999 i 0 prod vcr amount npv
(1 + rate) i
-100.00 -100.00
1 2 3 0
SELECT year, i, prod, amount, npv FROM cash_flow MODEL PARTITION BY (prod) DIMENSION BY (i) MEASURES (amount, 0 npv, year) RULES ( npv[ 0] = amount[0], npv[i !=0] ORDER BY i = amount[ CV() ] / POWER(1.14, CV() ) + npv[CV() - 1] )
2000
dvd
22.00 -180.70
ANSI SQL version needs outer join for each formula plus a join for reference model. N formulas, M reference models N+M joins 4 joins in this example: sales_cube time sales_cube sales_cube sales_cube
ANSI joins
Summary
New facility for spreadsheet-like computations in SQL High Performance
Replaces multiple joins, unions Scalable in size and parallel processing Powerful optimizations
Collaborative analysis Move external processing such as spreadsheets into RDBMs for manageability and consolidation
Next Steps.
Demonstration at Oracle DEMOgrounds
Exhibit hall, Booth 1326, Database Area Monday: 5:00 PM - 8:00, Tuesday: 10:30 - 1:00, 3:00 - 6:00, Wednesday: 11:00 - 4:30, Thursday: 10:30 - 2:00 Marriott Hotel - Golden Gate B1 Lab Section: Use Information from your Data Warehouse Lesson 1: Using the SQL Model clause Monday: 10:30 - 5:00, Tuesday: 8:30 - 12:30, 3:00 - 5:00, Wednesday: 8:30 - 4:30, Thursday: 8:30 - 2:30
Hands-on Lab
Reminder please complete the OracleWorld online session survey Thank you.
Q U E S T I O N S A N S W E R S
Q & A