Sie sind auf Seite 1von 62

Data Warehousing concepts By Bibhu Datta Rout

Agenda
Introduction Basic Concepts Extraction, Transformation and
Loading Schema Modeling SQL for Aggregation

Introduction
Data Warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It separates analysis workload from transactional workload and enables an organization to consolidate data from several resources.

Introduction
Who is the best customer for last quarter? How a P/E of a stock moved in the whole
year? Which department produced the maximum profits in the current financial year? (Variable allowance and bonus calculation) Producing the control chart to know about adherence to company process on the basis of defect log mechanism. Answer to all the above A Managed Data warehouse.

Introduction

Introduction Architecture

Basic Concepts
End user is only interested in the aggregate data
rather than individual transactions. So both logical and physical design in an effective way is first requirement. Entity-Relationship (ER) modeling involves identifying the things of importance (entities), properties of these things (attributes) and the relationship. Tools used in case of modeling the ER are Oracle Warehouse Builder and Oracle Designer.

Basic Concepts
Three Basic schemas used
Third Normal Form (3NF) schema Star schema Snowflake schema

Two Basic table structure


Fact Tables Dimensional table or lookup tables

Extraction ,Transformation and loading (ETL)


External tables
Materialized Sql merge List Views Multitable Partition Insert Bitmap Join Index + Memory

Extract

Load

Transform

Store

Analyze Performance Users

Extraction ,Transformation and loading contd

External tables

Multitable insert Merging of SQLs Partitioning Informations Materialized view enhancements Index Enhancement Memory Manipulation

Extraction, Transformation and Loading Contd.. External Tables Access to data stored in
flat files on the OS.

Access to data stored in

External data can be

queried directly using SQL as if it were in the database. No DML opertion and indexing is possible.

Database Table Access to data are stored in the database. SQLs are used for data retrieval.

DML operation and

indexing is possible.

Extraction, Transformation and Loading cont.. Advantages of External tables

First data is loaded into the

temporary table and then to the main DB table. Reduces the required space during ETL.

Extraction, Transformation and Loading cont.. Example Cont..

2. Create a directory
create directory test_dir as /ora/stage/bibhu/external grant read on directory test_dir to bibhu grant write on directory test_dir to bibhu

Extraction, Transformation and Loading cont..

External tables

Multitable insert

Sql merge List partitioning Materialized view enhancements Bitmap join indexes Memory enhancements

Multitable Insert Statement


Allows inserting records into more than
one table with one statement Advantages:
reduces load complexity reading the source table only 1 time

Three kinds:
Unconditional Conditional All Conditional First

Unconditional insert
For each row returned by the
subquery, each into clause will be executed without restriction

Example Unconditional insertup my costs and revenues for Split


each product sold into a separate cost table and revenue table
insert all into cost values (prod_id,cust_id,sysdate,cost) into revenue values (prod_id,cust_id,sysdate,revenue)
select a.cust_id,a.prod_id,a.cost,a.revenue from cost_revenue a,product b,customer c where a.cust_id=c.cust_id and a.prod_id=b.prod_id

Conditional ALL-Conditional FIRST


Insert ALL

Insert FIRST

every when clause

that evaluates to TRUE will be executed Oracle checks all the when clause

only the first when


clause that evaluates to TRUE will be executed and then further when-clauseevaluation stops

Example Conditional All Insert


INSERT ALL WHEN order_total < 1000000 THEN INTO small_orders WHEN order_total > 1000000 AND order_total < 2000000 THEN INTO medium_orders WHEN order_total > 2000000 THEN INTO large_orders SELECT order_id, order_total, sales_rep_id, customer_id FROM orders;

Example Conditional First Insert


INSERT FIRST WHEN ottl < 100000 THEN INTO small_orders VALUES(oid, ottl, sid, cid) WHEN ottl > 100000 and ottl < 200000 THEN INTO medium_orders VALUES(oid, ottl, sid, cid) WHEN ottl > 290000 THEN INTO special_orders WHEN ottl > 200000 THEN INTO large_orders VALUES(oid, ottl, sid, cid) SELECT o.order_id oid, o.customer_id cid, o.order_total ottl, o.sales_rep_id sid, c.credit_limit cl, c.cust_email cem FROM orders o, customers c WHERE o.customer_id = c.customer_id;

Restrictions Multitable Insert


You cannot perform a multitable
insert into a remote table Plan stability is not supported for multitable insert statements The subquery of the multitable insert statement cannot use a sequence

Oracle9i
External tables Multitable insert

Sql Merge

List partitioning Materialized view enhancements Bitmap join indexes Memory enhancements

SQL Merge Command


single SQL statement to either insert or update a
table conditionally
Key value already exist in table ?

Advantages:

Yes => update row No => insert row

Before Oracle9i, a number of DML statements or PL/SQL blocks needed. Overall loading performance is improved because it reduces the number of table scans. Look into the text attached with this slide to get a practical use of this.

Example SQL Merge Command

merge into cost_revenue d using cr_source s on (d.inv_id = s.inv_id) when matched then update set d.prod_id = s.prod_id, d.cust_id = s.cust_id, d.cost = s.cost, d.revenue = s.revenue when not matched then insert (prod_id,cost,revenue,cust_id,inv_id) values (prod_id,cost,revenue,cust_id,inv_id

SQL Merge Command


The merge command can be run in
parallel according to normal parallel DML rules You cannot update a column that has been referenced in the ON condition clause

Oracle9i
External tables Multitable insert Sql Merge

Partitioning

Materialized view enhancements Bitmap join indexes Memory enhancements

Partitioning
What is partitioning ?
Partitioning breaks up one large table
into several more manageable pieces called partitions Tables and indexes can partitioned Use it when having large tables Advantage : manageability and performance
Application

SQL
Sales

Jan Feb

Mar

Range partitioning
JAN2004

FEB2004

JAN2004 MAY2004

MAR2004

APR2004

MAY2004

Range partitioning Example


CREATE TABLE sales_range (salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount NUMBER(10), sales_date DATE) COMPRESS PARTITION BY RANGE(sales_date) ( PARTITION sales_jan2004 VALUES LESS THAN(TO_DATE('02/01/2004','DD/MM/YYYY')), PARTITION sales_feb2004 VALUES LESS THAN(TO_DATE('03/01/2004','DD/MM/YYYY')), PARTITION sales_mar2004 VALUES LESS THAN(TO_DATE('04/01/2004','DD/MM/YYYY')), PARTITION sales_apr2004 VALUES LESS THAN(TO_DATE('05/01/2004','DD/MM/YYYY')) );

Hash Partitioning
Hash Partitioning uses maps data to partitions using hashing algorithm

PART1

PART2

PART3

HASH function

Key value

PART4

Composite Partitioning
create table cost_revenue ( nr number, logofftime date, logon_time date, user_id number, name_id number, value number ) partition by range (user_id) subpartition by hash (nr) subpartitions 4 (partition p1 values less than (11), partition p2 values less than (21), partition p3 values less than (31), partition p4 values less than (41))

RANGE (user_id)
10 PART1a 20 PART2a 30 PART3a 40 PART4a

HASH (nr)

10 PART1b

20 PART2b

30 PART3b

40 PART4b

10 PART1c

20 PART2c

30 PART3c

40 PART4c

10 PART1d

20 PART2d

30 PART3d

40 PART4d

List Partitioning
precise control over which data maps to
which partition specify a list of discrete values for the partition column and assign a group of those values to individual partitions each partition in a list partitioning scheme corresponds to a list of discrete values.

Example List Partitioning


CREATE TABLE sales_list (salesman_id NUMBER(5) , salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) ( PARTITION sales_west VALUES('California', 'Hawaii') COMPRESS, PARTITION sales_east VALUES('New York', 'Virginia', 'Florida'), PARTITION sales_central VALUES('Texas', 'Illinois') );

Comparison Range-List Partitioning


Range Partitioning
partitioning a table along a

List Partitioning
useful along a column
with discrete values

continuous column most often, tables are range partitioning by time, so that each range partition contains the data for a given range of time values

Oracle9i
External tables Multitable insert Sql Merge List Partitioning

Materialized View
Enhancements
Bitmap join indexes Memory enhancements

Materialized View Enhancements


Two important changes :
Partition Change Tracking Fast refresh for MV with joins and aggregates

Product prod_id prod_nam e 536 537 Truck Bus

Customer cust_id cust_name 88230 Daniels 88231 Smith

Fresh MV
MV_CR cust_id 88230 88230 88230 88231 88231 88231 88232 88232 prod_id profit 536 537 538 536 537 538 536 537 120 230 -15 248 36 150 -96 250

538 Car Cost_revenue cr_id

88232 Stevens prod_id cost revenue 125 241 124 147 85 200 211 125 263 365 185 230

145698 537 145699 538 145700 537 145701 538 145702 536 145703 537

query rewrite

Product prod_id prod_nam e 536 537 Truck Bus

Customer cust_id cust_name 88230 Daniels 88231 Smith

Stale MV
MV_CR cust_id 88230 88230 88230 88231 88231 88231 88232 88232
prod_id

profit 120 230 -15 248 36 150 -96 250

538 Car Cost_revenue cr_id

536 537 538 536 537 538 536 537

88232 Stevens prod_id cost revenue 125 241 124 147 85 200 180 211 125 263 365 185 230 169

145698 537 145699 538 145700 537 145701 538 145702 536 145703 537 145704 538

STALE

new records

No query rewrite in 8i

Materialized View Refresh


Two ways of refreshing a MV :
complete refresh involves reading every record of the master tables to compute the results for the materialized view fast refresh will only update the data that is changed in the master tables

Partition Change Tracking


an addition to fast refresh in Oracle9i allows tracking freshness to a finer

grain than the entire materialized view identify which rows are affected by a certain detail table partition at least one of master tables need to partitioned

Product prod_id prod_nam e 536 537 538 Truck Bus Car

Customer cust_id cust_name 88230 Daniels 88231 Smith

Stale MV
MV_CR cust_id 88230 88230 88230 88231 88231 88231 88232 88232
prod_id

profit 120 230 -15 248 36 150 -96 250

538 537 536 536 537 538 536 537

Part. 1 145698 537 537 Part. 2 538 145700 537 Part. 3 145701 538 536
145702 538 145703 538 145699 537

Cost_revenue 88232 Stevens cr_id prod_id cust_id revenue 88230 88231 88232 88231 88230 88232 211 125 263 365 185 230

Query rewrite In 9i

fresh fresh stale stale fresh fresh stale fresh

Partition Change Tracking


PCT allows tracking of records in the
materialized view that correspond to a certain detail table partition These records become stale when a partition is modified while the other records remain fresh => fast refresh process is much quicker because it only needs to refresh the stale records

Example Partition Change Tracking


MV : MV_CR
select mview_name,last_refresh_type,staleness from user_mviews;
mview_name last_refresh_type staleness ------------------------------------------MV_CR FAST STALE

Example Partition Change Tracking


FAST REFRESH :
exec dbms_mview.refresh('MV_CR','F'); select mview_name,last_refresh_type,staleness from user_mviews
mview_name last_refesh_type staleness ------------------------------------------MV_CR FAST_PCT FRESH

Example Partition Change Tracking


exec
dbms_mview.explain_mview(MV_CR);
REMARK : run script utlxmv.sql in $ORACLE_HOME/rdbms/admin select * from mv_capabilities_table;

select capability_name,possible,related_text, msgtxt from mv_capabilities_table;


CAPABILITY_NAME POSSIBLE RELATED_TEXT MSGTXT -----------------------------------------------------------------PCT Y REFRESH_COMPLETE Y REFRESH_FAST Y REWRITE Y PCT_TABLE Y MARKET PCT_TABLE Y COST_REVENUE REFRESH_FAST_AFTER_INSERT Y REFRESH_FAST_AFTER_ONETAB_DML N PROFIT SUM(expr) without COUNT(expr) REFRESH_FAST_AFTER_ANY_DML N REFRESH_FAST_PCT Y REWRITE_FULL_TEXT_MATCH Y REWRITE_PARTIAL_TEXT_MATCH Y REWRITE_GENERAL Y REWRITE_PCT Y

Fast refresh of MV with joins and aggregates


Prior to Oracle9i only possible if master tables
were modified through direct load Now, fast refresh is possible of MV with joins and aggregates
new keyword with create materialized view log keyword = sequence this allows Oracle to track the sequence of DML operations on the base tables

Oracle9i
External tables Multitable insert Sql Merge List Partitioning Materialized View Enhancements

Bitmap Join Indexes


Memory enhancements

Bitmap Join Index


B-tree index each indexed value is stored with its rowid
5

Bitmap index each distinct value is stored with its own bitmap

1-3

4-6

7-8

M0 0 1 0 1 1 0 F 1 1 0 1 0 0 1

Bitmap Join Index


index on two (or more) tables the value of a column in one table is stored

with the associated rowids of the values in the other table BJI contains data from two or more tables no need to access the dimension tables and calculating the join

Oracle9i
External tables Multitable insert Sql Merge List Partitioning Materialized View Enhancements Bitmap Join Indexes

Memory Enhancements

Memory enhancements
1. SGA:
Dynamic Memory Management

2. PGA:
Automatic Memory Tuning

SGA: Dynamic Memory Man.


Before Oracle9i, the size of the SGA
could not be altered online. Oracle now introduces dynamically resizing the buffer cache and shared pool.

SGA Init.ora parameters


DB_CACHE_SIZE
replaces the depreciated parameter DB_BLOCK_BUFFERS specifies size of standard block size buffer cache 9i supports multiple block sizes multiple buffer caches, one for each corresponding
block size DB_nK_CACHE_SIZE
db_block_size db_cache_size db_16K_cache_size

= = =

8192 500M 100M

SGA Init.ora parameters


SHARED_POOL_SIZE DB_KEEP_CACHE_SIZE, DB_RECYCLE_CACHE_SIZE
replace depreciated parameters BUFFER_POOL_KEEP and BUFFER_POOL_RECYCLE remark: in Oracle8i, these memory areas were part of the default buffer pool, in Oracle9i they represent separate memory areas not allocated out of the default buffer pool maximum amount of memory the SGA can allocate when not specified => represents initial SGA at start-up time => SGA can not grow

SGA_MAX_SIZE

PGA: Automatic Memory Tuning


DBA task prior to Oracle9i: choosing
correct amount of memory each server process may allocate
sort_area_size hash_area_size bitmap_merge_size create_bitmap_area_size

DBA task: choosing correct amount of memory each server process may allocate
sort_area_size hash_area_size bitmap_merge_size create_bitmap_area_size

PGA: Automatic Memory Tuning

Automated SQL Execution Memory Management

PGA: Automatic Memory Tuning


What ?
PGA dynamically configured by Oracle automatically and dynamically adjusts the parameters mentioned before (memory area sizes)

Advantage ?
reduces time and effort required to tune memory parameters can compensate for low or high memory usage along with controlling the maximum amount of memory the PGAs can use

PGA Init.ora parameters


WORKAREA_SIZE_POLICY:
AUTO: Oracle controls size of each individual PGA memory MANUAL: control is left to the user or DBA. The parameters (memory areas) need to be set manually. This may result in sub-optimal performance and poor PGA memory utilization. MANUAL is the default Parameter can be set at session or system level

Any Questions?

Thank you

Das könnte Ihnen auch gefallen