You are on page 1of 99

Oracle Data Warehousing

and
Business Intelligence
The Complete Story
Andreas Katsaris
Arisant, LLC

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

The complete picture


Oracle BI Suite EE
Answers

Interactive
Dashboards

Publisher

Delivers

Oracle BI Presentation Server


Oracle BI Server

Oracle BI
Enterprise Data
Warehouse

Bulk E-LT
Oracle Data Integrator
E-LT Agent

Other
Sources

SAP/R3

E-LT
Metadata

PeopleSoft

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Oracle
EBS

Siebel
CRM

Agenda

Warehouse definition
Why Bother?
Life before Integrated warehousing and BI
Terminology

Warehouse design aspects and considerations


Methodology
Dimensional Modeling
Top Most powerful and underutilized database features

Oracle Business Intelligence (OBI)


OBI Components and descriptions

Oracle Data Integrator (ODI)


ETL Strategy
Architecture

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Warehouse Mission


Gather, integrate and reconcile operational,
decision support, and external data
Provide meaningful, accessible, consistent, and
easy to understand business information to
enterprise users
Act as a single integrated source of data for
processing information
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

What is a Data Warehouse?


a relational database designed for query and
analysis
contains historical data derived from
transaction data, but it can include data
from other sources
enables an organization to consolidate data
from several sources
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

What is a Data Warehouse?


Components of a warehouse environment
extraction, transformation, and loading (ETL)
engine
online analytical processing (OLAP) solution
client analysis tools/Data Mining
other applications that manage the delivery of
data to enterprise users
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Warehouse Methodology


Methodology
Iterative/Agile process, avoid waterfall big bang approach
Start with one Subject Area and one target user group
Scoping Phase
- Identify key players (business sponsors, stakeholders, users)
- Build and validate the business case
Show ROI benefits
- Support for the following types of functionality
- Activities that cannot be done at all today, but should be
- Activities that are accomplished today by manually synthesizing
data from reports, file extracts, and other sources
Evaluate candidate technologies for the ETL, data warehouse and frontend tools
Today we are confident that we can do all this with Oracle
technologies!
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Warehouse Architectures


Data Warehouse Architecture (Basic)
Data Warehouse Architecture (with Staging
Area)
Data Warehouse Architecture (with Staging
Area and Data Marts)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Warehouse Architecture (Staging Area and Data Marts)


DATA

Managed
Reporting

INFORMATION

Staging
Area

Warehouse

Data Mart

BUS. INTELLIGENCE DELIVERY

Transformation Transformation
Extraction
& Load
& Load

Notification
End
Users

Analysis

Ad-Hoc
Queries

Data Sources:Databases,flat files


spreadsheets, XML

KNOWLEDGE

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Dashboards
Balanced
Scorecards

Ultimate Goal
Timely, Accurate, Fast delivery of knowledge
3NF
Star Schema

Star Schema with

OLAP cubes

Mviews (rewrite, refresh)

(rewrite, refresh)

Better Performance
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Top Most powerful and underutilized


database features
Data Sourcing
ETL at database level
Maintenance
Performance

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Sourcing
-
-
-
-
-
-
-

Replication
Transportable Tablespaces
Streams
CDC (Change Data Capture)
External Tables (access to flat files)
Data Pump
SqlLoader (access to flat files)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Replication
Snapshot logs on source system
Snapshot technology can be extended to
produce/preserve exact row changes
Resistance from source system owners
usually complain about performance impact
space requirements for snapshot logs
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Transportable Tablespaces
copying of datafiles (tablespaces in read only mode)
move both table and index data avoids index rebuilds
Cross platform support
When to consider:
Lots of source system changes
Required tables can be placed in specific tablespace(s)
No timestamps so incremental extraction not possible
Other options not allowed (streams, replication, CDC)
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Streams

routes published information to subscribed destinations


only changes to desired objects are captured
Architecture

Capture
Staging/Propagation
Consumption
Parallel processing of log files

Non-intrusive
No triggers or snapshot logs
uses redo log/archive log

Custom Transformations possible


Access via Oracle-supplied PL/SQL packages or Enterprise
Manager Console
Golden Gate acquisition

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Change Data Capture


simplifies the process of identifying changed data since the
last extraction (incremental extraction)
architecture is based on publisher-subscriber model
synchronous (part of transaction)
change data is captured via triggers and stored inside the database
in change tables

asynchronous (not part of transaction)


lightweight Oracle Streams application
Changes extracted from the log files

Captured data made available to the target systems in a


controlled manner (subscription window), using database
views
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

External Tables
Files stored outside the database
Read only in 9i Read/Write in 10g(CTAS)
compressed and encrypted in 11g
DML/index creation not allowed
Can be read with SQL as if they were tables
May be joined with database tables
Filtering allowed with WHERE clause
Can be used to load staging area or act as a staging
area themselves
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Data Pump
Next generation export/import tools
Various interfaces
expdp / impdp
Web based GUI via Database Control
DBMS_DATAPUMP

Jobs (exports and loads) are interruptible and resumable


Parallelism
Fine-grain object selection
Allows data movement via db links!
Works in a pipelined fashion

Compressed and encrypted in 11g


Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

ETL at the database level


Resumable Operations
PL/SQL
Direct Path Operations
Transformation and Loading functions
BULK operations

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Resumable Operations
Instance level
init.ora: RESUMABLE_TIMEOUT = 3600

Session level
ALTER SESSION ENABLE RESUMABLE <TIMEOUT secs>;
ALTER SESSION DISABLE RESUMABLE;

Used for:
DDL
CREATE INDEX, CTAS

DML
INSERT/UPDATE

Detection:
Look in DBA_RESUMABLE where STATUS='SUSPENDED'
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

PL/SQL
Easy to write, flexible, fast deployment
Extraction/Loading using Direct Path
Operations
CTAS operations with NOLOGGING
INSERT SELECT with APPEND and PARALLEL hints
Ideal for incremental extractions (e.g. timestamp based)

Transformations/Loading
Pipelined functions, MERGE, Multitable INSERT, INSERT
FIRST, pivot, myriad of Analytical functions
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

PL/SQL
BULK operations
BULK COLLECT
bulk binds with SELECT statements
fetch into user defined array or PL/SQL table

FORALL
bulk binds with INSERT, UPDATE, and DELETE
statements

RETURNING INTO
Retrieving DML Results into a Collection
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

DML Error Logging


Rows causing errors are loaded in an error log table
Alternative to using PL/SQL with the SAVE EXCEPTIONS
clause
BEGIN
DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_FACT');
END;
/
INSERT /*+ APPEND */
INTO sales_fact
SELECT *
FROM <source tables>
LOG ERRORS
REJECT LIMIT UNLIMITED
;
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Maintenance
Maintenance
DML/DDL Operations using :
Direct Path Operations (NOLOGGING, append
hint)
Parallelism
Partitioning
ILM (Oracle ILM tool)
Advanced Compression
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Performance
Dimensional modeling/star schemas
Partitioning
Parallelism
Deferred/disabled constraints
Index Management (unuse/rebuilt)
Advanced compression (test first)
Materialized Views
OLAP Cubes/Cube Organized Materialized Views
Data Caching
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Star Schema
FACT table
skinny and large
Contains numeric measurements and FKs to dimensions
FK columns bitmap indexes
Consider partitioning and archiving aspects
Dimensions
Denormalized small and wide
Contain business attributes relating to the measurements
Avoid snowflake if possible
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Star Schema
Intuitive to end-users
Mirrors the way people think about the business
High Query Performance
DIMENSION 1
REGION

DIMENSION 2
PRODUCT

DIMENSION 3
TIME

FACT
SALES

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

DIMENSION 4
ENTITY

Direct Path Operations


NOLOGGING option, APPEND hint
CTAS NOLOGGING
INSERT /*+ append */ INTO SELECT
exclusive locks on the table (no parallel streams,
use PARALLEL hint instead)
No enabled FKs (hint ignored)
Index maintenance deferred to end to direct
path insert operation
CREATE INDEX . NOLOGGING;
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Parallelism
Queries requiring large table scans, joins, or
partitioned index scans
Creation of large indexes
Creation of large tables (including
materialized views)
Bulk inserts, updates, merges, and deletes
Understand related init.ora parms and set
appropriately
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Parallelism
Types:
Parallel Query
Parallel DDL
Parallel DML

Implementation:
via a hint
/*+ PARALLEL(<table name>,<degree of parallelism>) */

at the object definition level


CREATE/ALTER TABLE finance_trans PARALLEL 4;
CREATE/ALTER INDEX PARALLEL 2;
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Partitioning
Goal
Rolling window operations
Increased performance

Types
Range Partitioning
Hash Partitioning
List Partitioning
Composite Partitioning

FACT tables are candidates for partitioning


Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Partitioning
Partition Pruning (a.k.a partition elimination)
evaluate WHERE clause to eliminate unneeded
partitions
Reduces unnecessary I/O
Improved query performance

Partition operations to facilitate warehouse loads


Load multiple partitions in parallel
Load stand alone table - EXCHANGE partition
Index maintenance at the partition level
Archive old data using EXCHANGE partition
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Partitioning- 11g enhancements


Partition Advisor
New composite combinations
list/range, range/range, list/hash, list/list

Interval Partitioning
Automatic creation of range-based partitions

REF Partitioning
Partition detail table based on the master-table key

Virtual-Column Based Partitioning


Partition based on an expression
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Interval Partitioning
Minimizes periodic partition maintenance (No need to
create new partitions)
Partition segments allocated as soon as new data
arrives
Local indexes are created and maintained as well
Requires at least one range partition
Range key value determines the range high point
Partitioning key can only be a single column, and either
DATE or NUMBER datatype
CREATE TABLE sales()
PARTITION BY RANGE (sales_date)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
( PARTITION p1 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

REF Partitioning

Related tables benefit from same partitioning strategy


Partition a table based on a column in another table
Partition key inherited via FK/PK relationship
Avoids having the partition key on the child table
CREATE TABLE orders /* parent table */
( order_id /* PK */..order_date /* partition key*/
CONSTRAINT orders_pk PRIMARY KEY(order_id)
)
PARTITION BY RANGE(order_date)
CREATE TABLE order_items /* child table */
( order_id /* FK */
CONSTRAINT order_items_fk
..FOREIGN KEY(order_id) REFERENCES orders(order_id)
)
PARTITION BY REFERENCE (order_items_fk);

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Virtual-Column Based Partitioning


Add expression based virtual (meta data) columns
Virtual column used as the partition key
CREATE TABLE sales
( sales_num NUMBER(12),
rep_name VARCHAR2(20),
sales_code NUMBER(2) GENERATED ALWAYS AS
TO_NUMBER (SUBSTR(TO_CHAR(sales_num),1,2))
)
PARTITION BY LIST (sales_code)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Advanced Compression
Table Compression
Table Scan Performance: 2x faster
Storage Savings: 2x smaller
DML Performance: 5% slower
CREATE TABLE SALES_FACT () COMPRESS FOR ALL OPERATIONS;

RMAN Compression
~40% faster than compressed backups in 10g
Slightly better compression ratio than in 10g
RMAN> CONFIGURE COMPRESSION ALGORITHM zlib;
RMAN> backup as COMPRESSED BACKUPSET database archivelog all;

Data Pump Compression


expdp hr FULL=y DUMPFILE=dpump_dir:full.dmp COMPRESS;
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Advanced Compression

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

ILM Information Lifecycle Management


Why Bother
Compliance
Performance
Cost
Data Maintenance
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

5% - most active

35% - less active

60% - historical

ILM Information Lifecycle Management


Implementation of different tiers of storage
Consider Oracle ILM Assistant (free!)
Leverages Oracle Partitioning
Uses Lifecycle Definitions
Calculates storage costs & savings
Simulates the impact of partitioning on a table
Advises how to partition a table
Generates scripts to move data when required
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

SQL Query Result Cache


Caching of query results or PL/SQL function calls
DML/DDL against dependent database objects
invalidates cache
Candidate queries
access many, many rows
return few rows (small result set)
executed many times
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

SQL Query Result Cache


result_cache_mode init.ora parameter
AUTO (optimizer uses repetitive executions to determine if query will
be cached)
MANUAL (need use /*+ RESULT_CACHE */ hint in queries)
FORCE (All results are stored in cache)

result_cache_max_size init.ora parameter


default is dependent on other memory settings
(0.25% of memory_target or 0.5% of sga_target or 1% of
shared_pool_size)
0 disables result cache
never >75% of shared pool (built-in restriction)
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Optimizing Querying
Star Queries
Materialized Views
Cube Organized Materialized Views
Index Merge
Bitmap Join Indexes
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Star Queries
A join between a fact table and a number of dimension
tables
Require the existence of bitmap indexes on FK columns on
the FACT table
retrieve exactly the necessary rows from the fact table using
n merged bitmap indexes
join this result set to the dimension tables
Implementation:
star_transformation_enabled=true|false|temp_disable (system or
session modifiable)
hints for stubborn queries
/*+ STAR_TRANSFORMATION */
/*+ FACT */
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Materialized Views

Pre-aggregation of data
Changes to the underlying tables reflected in mview
Fast/Complete refresh
Refresh can be on demand or scheduled

Can be partitioned and indexed like a table


Query Rewrite
/*+ REWRITE_OR_ERROR */ hint new in 10g (ORA-30393)
DBMS_MVIEW.EXPLAIN_REWRITE() to see why rewrite failed
$ORACLE_HOME/rdbms/admin/utlxrw.sql for REWRITE_TABLE (explanation)

DBMS_ADVISOR package
APIs to recommend materialized views and indexes
DBMS_ADVISOR.TUNE_MVIEW()
API to tune existing materialized views and indexes
Tuned version in USER_TUNE_MVIEW
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Materialized Views
Query rewrite makes mviews transparent to end-users
Query (report, ETL
code, etc)
Create materialized
view enable
query rewrite

Select .

With query rewrite

Without query
rewrite

Refresh mode ?

INSERT/
Snapshot logs

UPDATE/
DELETE

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Base tables

Cubes and Cube Organized MViews


Query Options
Query star schema (fact+dimensions)
Build mviews on star schema
Too many combinations

Build cubes

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Cubes and Cube Organized MViews


Stored in special db areas called Analytic
Workspaces (which are stored in BLOBs)
Manipulated via Analytic Workspaces Manager tool
11g- Best of both worlds
Rewrite and refresh features of regular MVs
Performance benefits of OLAP cubes
Expose OLAP cube as a relational object access via SQL
CUBE_TABLE function searches cube using SQL
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Cubes and Cube Organized MViews


Query cube as if it was a relational object
SQL> explain plan for
2 select * from table(cube_table('GLOBAL.PRICE_CUBE'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------Plan hash value: 3184667476
-------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
| 2000 |
195K|
29
(0)| 00:00:01 |
|
1 | CUBE SCAN PARTIAL OUTER| PRICE_CUBE | 2000 |
195K|
29
(0)| 00:00:01 |
--------------------------------------------------------------------------------------

Refresh using DBMS_MVIEW.REFRESH


Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Merge
Merge two separate indexes
Avoid creating a new concatenated index
Implementation via a hint
/*+ index_join(table_name,index1, index2) */

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Merge
SQL> select count(*) from FINANCE_TRANS where COMPANY_KEY between 1 and 1000 and CHRGE_CODE
= 'BASIC';
Elapsed: 00:03:44.03
Execution Plan
---------------------------------------------------------0

SELECT STATEMENT Optimizer=CHOOSE (Cost=5147 Card=1 Bytes=11 )

0 SORT (AGGREGATE)

TABLE ACCESS (BY INDEX ROWID) OF FINANCE_TRANS' (Cost=5147 Card=10004 Bytes=110044)


INDEX (RANGE SCAN) OF 'FINANCE_TRANS_IX2' (NON-UNIQUE) (Cost=51 Card=100035)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Merge
SQL> select /*+ index_join(FINANCE_TRANS, FINANCE_TRANS _IX2,TEMP1) */ count(*) from FINANCE_TRANS
where COMPANY_KEY between 1 and 1000 and CHRGE_CODE='BASIC';
Elapsed: 00:00:01.73
Execution Plan
---------------------------------------------------------0

SELECT STATEMENT Optimizer=CHOOSE (Cost=2037 Card=1 Bytes=11)

0 SORT (AGGREGATE)

INDEX (RANGE SCAN) OF ' FINANCE_TRANS _IX2' (NON-UN IQUE) (Cost=1811 Card=10004
Bytes=110044)

INDEX (RANGE SCAN) OF 'TEMP1' (NON-UNIQUE) (Cost=1811 Card=10004 Bytes=110044)

VIEW OF 'index$_join$_001' (Cost=2037 Card=10004 Bytes=110044)


HASH JOIN

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Bitmap Join Indexes


An index build on a table using columns from another table(s)!
Index contains the data to support a join query
Allows the query to retrieve the data from the index rather than
referencing the join tables
CREATE BITMAP INDEX temp_bixj1 ON BUDGET_FACT
(LEDGER_DIM.ledger_name)
FROM BUDGET_FACT, LEDGER_DIM
WHERE LEDGER_DIM.ledger_key = BUDGET_FACT.ledger_key
NOLOGGING
TABLESPACE INDEX_1;

Good for query performance, bad for DML operations


Consider drop/recreate
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Bitmap Join Indexes


-- Without bitmap join index
SQL> SELECT D.ledger_name, SUM(F.base_amount) FROM BUDGET_FACT F, LEDGER_DIM D
WHERE D.ledger_key = F.ledger_key AND D.ledger_name = Internal Spending
GROUP BY D.ledger_name;
Elapsed: 00:00:07.58
Execution Plan
0

SELECT STATEMENT Optimizer=CHOOSE (Cost=4667 Card=1 Bytes=29 )


1

0 SORT (GROUP BY NOSORT) (Cost=4667 Card=1 Bytes=29)

INDEX (RANGE SCAN) OF ' LEDGER_DIM_IX4' (NON-UNIQUE ) (Cost=6 Card=6125 Bytes=128625)

INDEX (FAST FULL SCAN) OF 'LEDGER_DIM_PK' (UNIQUE)(Cost=6 Card=6125 Bytes=128625)

HASH JOIN (Cost=4667 Card=1577907 Bytes=45759303)


VIEW OF 'index$_join$_002' (Cost=97 Card=6125 Bytes=128625)
HASH JOIN

TABLE ACCESS (FULL) OF BUDGET_FACT' (Cost=2631 Card=8092591 Bytes=64740728)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Bitmap Join Indexes


-- Add bitmap join index TEMP_BIXJ1
Elapsed: 00:00:00.72
Execution Plan
---------------------------------------------------------0

SELECT STATEMENT Optimizer=CHOOSE (Cost=1875 Card=1 Bytes=29 )

0 SORT (GROUP BY NOSORT) (Cost=1875 Card=1 Bytes=29)

INDEX (RANGE SCAN) OF LEDGER_DIM_IX4' (NON-UNIQUE) (Cost=6 Card=6125 Bytes=128625)

INDEX (FAST FULL SCAN) OF LEDGER_DIM_PK' (UNIQUE)(Cost=6 Card=6125 Bytes=128625)

HASH JOIN (Cost=1875 Card=1577907 Bytes=45759303)


VIEW OF 'index$_join$_002' (Cost=97 Card=6125 Bytes=128625)
HASH JOIN

TABLE ACCESS (BY INDEX ROWID) OF BUDGET_FACT' (Cost=1712 Card=8092591 Bytes=64740728)


BITMAP CONVERSION (TO ROWIDS)
BITMAP INDEX (SINGLE VALUE) OF 'TEMP_BIXJ1'

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Management
Why is it a big deal in a warehouse?
Index maintenance during DML

Benchmark loads with/without indexes present


Specific attention to bitmap index maintenance
Consider invalidate/rebuild or drop/recreate
Partition level maintenance provides granular control
Provide APIs (stored procs) to ETL application

Virtual Indexes! undocumented feature


Not intended for standalone usage
Part of Tuning Packs Virtual index Wizard
Allows CBO to evaluate a new index without having to create it!

Index monitoring (watch out from DBMS_STATS)


Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Management - Case Study


Loading in non partitioned table
1.2 billion existing records and 3 indexes
Load 100 million records
Last 30% of records require some lookups on first
70% of loaded records (1 of the indexes is needed)

Slow load performance


Indexes can be dropped to improve loading
Need to be recreated
Not available for other queries
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Management - Case Study


No partitioning

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Management - Case Study


The solution
Partition the table and indexes
Invalidate specific index partitions (mark
unusable)
No inline index maintenance while data is loading
Rebuild required index partition to help last part of the
load (granular control)

Rebuild remaining index partitions


Other partitions and related indexes available for
querying while loading new data
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Index Management - Case Study


Partitioned table and indexes

P1

P2

P3

P4

P5

Unusable partitioned Indexes: No inline index maintenance


Usable partitioned Indexes: Rebuilt by the application using stored
procedures after a load has been completed
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

P6

Constraint Management
Do I need foreign keys in a warehouse?
My code takes care of data integrity
Our data is clean

Trace to see what happens while loading


Consider /*+ append */ hint behavior
Consider some of these options
Deferred constraints
check that constraint is satisfied only at commit time
Useful when loading in no particular order

Disable/enable, validate/novalidate
Enable in parallel!
Metalink Note:124848.1

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Statistics Management
Be careful what/when you analyze
Manage statistics on big tables
Do I really need to analyze all the tables?
Determine frequency/timing, estimate sample
Consider faking statistics
Consider providing APIs to ETL application
Ability to analyze before/after certain loads

Index monitoring
Watch out from dbms_stats.gather_table_stats and
dbms_stats.gather_index_stats behavior
Procedures mark analyzed indexes as USED!
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Oracle Business Intelligence EE

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

From this

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

To this

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Gartner Magic Quadrant for BI Platforms


2008

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

2009

Traditional, non-Integrated Solution


ETL

OLAP
Engine
-Analytic Apps

Mining
Engine
- Analysis

ETL

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Reporting
Engine

P
o
r
t
a
l

N0n-Integrated Solution
Siloed reports, multiple islands of information
No knowledge sharing
Duplication of effort and reports
Security issues
Different formats and tools, local data copies
Access and distribution is problematic

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Oracles Approach
Integrated BI Database
BI, Data Mining and OLAP functions in the database

Integrated BI Tools
Common BI Technology Platform

Integrated Analytic Applications


Industry Specific pre-build Apps
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Comprehensive DW and BI stack (source - Oracle)


Oracle BI
Applications

Oracle Daily
Business Intelligence

Oracle BI Suite Enterprise Edition

OWB/ODI

Oracle Corporate
Performance Management

Oracle BI Suite Standard Edition

Oracle Partitioning

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

PeopleSoft Enterprise
Performance Management

Oracle Real-Time Decisions

Oracle Data Mining

Oracle OLAP

Oracle BI Applications
Auto

Comms. Complex Consumer


& Media
Mfg.
Sector

Energy

Financial
Services

(source - Oracle)
Insurance

High
Tech

Life
& Health Sciences

Public
Sector

Travel
& Trans

Sales
Analytics

Service &
Contact Center
Analytics

Marketing
Analytics

Financial
Analytics

Supply Chain
Analytics

Pipeline
Analysis

Churn
Propensity

Campaign
Scorecard

Receivables /
Payables Analysis

Supplier
Performance

Employee
Productivity

Triangulated
Forecasting

Customer
Satisfaction

Response
Rates

Customer
Profitability

Inventory
Analysis

Compensation
Analysis

Sales Team
Effectiveness

Resolution
Rates

Product
Propensity

Product
Profitability

Procurement
Cycle Times

Compliance
Reporting

Up-sell /
Cross-sell

Service Rep
Effectiveness

Loyalty and
Attrition

Regulatory
Compliance

Inventory
Availability

Workforce
Profile

Discounting
Analysis

Service Cost
Analysis

Market Basket
Analysis

Expense
Management

Employee
Expenses

Turnover
Trends

Lead
Conversion

Service
Trends

Campaign ROI

Cash Flow
Analysis

BOM Analysis

Return on
Human Capital

Prebuilt adapters:

Siebel

Oracle

SAP

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

PeopleSoft

Workforce
Analytics

Other Operational
& Analytic Sources

OBI Apps

(source - Oracle)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

OBI Suite Offerings


Enterprise Edition Plus (OBI EE Plus)
BI Server
BI Answers
BI Interactive Dashboards
BI Publisher (a.k.a XML Publisher)
BI Delivers
BI Disconnected Analytics
MS Office Add-In
Hyperion
Source:
http://www.oracle.com/technology/products/bi/enterprise-edition.html
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

OBIEE Answers Ad Hoc Querying

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

OBIEE Interactive Dashboards

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

BI Publisher Pixel-Perfect Reporting

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

BI Server Overview
Integrates information from various sources
Native RDBMS support for Oracle, SQL Server, DB2 and Teradata

Sophisticated data access, aggregation, and calculation


engine
Three separate layers
Physical
Business Model
Presentation

Mapping of physical data schemas (tables & joins) to a


logical business model
Security services
Represent multiple physical data sources as a single,
simplified data structure to end user tools
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

BI Server Overview Admin Tool

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

BI Server Physical Layer


exact depiction of the
target physical objects
mapping of physical data
schemas (tables, joins)
to a logical business model

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

BI Server Business Model Layer


Layer of abstraction that sits on top
of the physical data
Simplifies complex database structures

firewall between users and the


mechanics of the physical data access
layer
Drill-Paths
e.g. Year->Qtr->Month->Week->Day

Derived Business Measures or Calculations


e.g. support for time-series analysis, moving
averages, weighted averages, rolling sums,
cumulative calculations, etc.
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Customer

BI Server Presentation Layer


Sits on top of Business Model
Can build many views into the
Business Model:
simplified Presentation for
end-users
complex Presentation for
power-users/report writers
(complex calculations, etc.)
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Oracle Data Integrator (ODI)


Sunopsis Acquisition
Oracle friendly architecture (leverages RDBMS)
Data Movement and Transformation from
Multiple Sources to Heterogeneous Targets
Key Differentiators

Transformations leverage RDBMS


Declarative Design (automatically generates the Data
Flow whatever the sources and target DB)
Knowledge Modules

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Oracle Data Integrator Architecture


Service Interfaces and Developer APIs
Runtime

Design-Time
User
Interfaces
Data Flow
Generator

Designer
Operator

Knowledge Module
Interpreter

Data Flow
Generator
Runtime
Session
Interpreter

Agent
Data Flow
Conductor

Thin
Client

Knowledge Modules

Data Flow

Metadata Management
Master
Repository

Work
Repositories

Runtime
Repositories

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Java design-time environment


Runs on any platform
Thin client for browsing
Metadata
Java runtime environment
Runs on any platform
Orchestrates the execution
of data flows
Metadata repository
Pluggable on many RDBMS
Ready for deployment
Modular and extensible
metadata

Terminology

ETL/ELT projects are designed in the Designer tool


Transformations in ODI are defined in objects called Interfaces
Interfaces are stored into Projects
Interfaces are sequenced in a Package that will be ultimately
compiled into a Scenario for production execution

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Interface

An Interface will define


Where the data is sent to (the Target)
Where the data is coming from (the Sources)
How the data is transformed from the Source format to the Target
format (the Mappings)
How the data is physically transferred (the path taken) from the
sources to the target (the data Flow)
Source and target are defined using Metadata imported from the
databases and other systems
Mappings are expressed in SQL
Flows are defined in Templates called Knowledge Modules (KMs)
An interface may have more than one source but only populates a
single target
To populate several targets, you need several interfaces
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Staging Area
A separate, dedicated area in an RDBMS where ODI
creates its temporary objects and executes some of the
transformation rules
By default, ODI sets the staging area on the target data
server
- can be on the source or a 3rd RDBMS or the Sunopsis
Memory Engine
- cannot be placed on non relational systems (Flat files,
ESBs, etc.)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Metadata

ODI is strongly based on the relational paradigm


In ODI, data are handled through tabular structures
defined as datastores
Datastores are used for all type of real data
structures: database tables, flat files, XML files, JMS
messages, LDAP trees, etc
The definition of these datastores (the metadata) will
be used in the tool to design the data integration
processes.
Defining the datastores is the starting point of any
data integration project
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Two Methods for Reverse Engineering


Standard reverse-engineering
Uses JDBC connectivity features to retrieve metadata,
then writes it to the ODI repository
Customized reverse-engineering
Read metadata from the application/database system
repository, then writes these metadata in the ODI
repository
Uses a technology-specific strategy, implemented in a
Reverse-engineering Knowledge Module (RKM)
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Knowledge Modules - KMs

Knowledge Modules are templates of code that define


integration patterns and their implementation
They are usually written to follow Data Integration best
practices, but can be adapted and modified for project
specific requirements
Example steps of a KM:
When loading data from a heterogeneous environment:
1.
2.
3.
4.

create a staging table


to load the data, use SQL loader
create the CONTROL file for SQL loader
when finished with the integration, remove the CONTROL file and
the staging table

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

A Knowledge Module is
made of steps
Each step has a name
and a template for the
code to be generated

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Knowledge Modules
Hot-Pluggable: Modular, Flexible, Extensible
Pluggable Knowledge Modules Architecture
Reverse
Engineer Metadata

Journalize
Read from CDC
Source

Load
From Sources to
Staging

Check
Constraints before
Load

Integrate
Transform and Move
to Targets

Service
Expose Data and
Transformation
Services

Reverse
W
W S
S

W
S

Staging Tables

Load
CDC
Sources

Integrate
Target Tables

Check

Journalize

Services

Error Tables

Sample out-of-the-box Knowledge Modules


SAP/R3
Siebel

Log Miner

SQL Server
Triggers

DB2 Journals

Oracle
DBLink

DB2 Exp/Imp

JMS Queues

Oracle
SQL*Loader

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Check MS
Excel

Check
Sybase

TPump/
Multiload

Type II SCD

Oracle Merge
Siebel EIM
Schema

Oracle Web
Services

DB2 Web
Services

KM Types

Models

Interfaces

KM Type

Description

LKM

Loading

Assembles data from source datastores to the staging area.

IKM

Integration

Uses a given strategy to populate the target datastore


from the staging area.

CKM

Check

Checks data in a datastore or during an integration process.

RKM

Reverseengineering

Retrieves the structure of a data model from a database. Only


needed for customized reverse-engineering.

JKM

Journalizing

Sets up a system for Changed Data Capture to reduce the


amount of data that needs to be processed.

SKM

Web Services

Defines the code that will be generated to create Data Web


Services (Exposing data as a web service)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Staging Area on Target-Example ETL


Target (Oracle)
Source (Sybase)

Staging Area
Transform & Integrate

ORDERS

1
LINES

Extract/Join/Transform
Join/Transform

2
CORRECTIONS
File

C$_0

C$_1

Extract/Transform

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

I$_SALES

SALES

Staging Area on Target-Example ETL


Target (Oracle)
Source (Sybase)

Staging Area

ORDERS

LKM_1
LINES

LKM SQL to Oracle

LKM_2
CORRECTIONS
File

IKM_1

C$_0

IKM_1

C$_1

I$_SALES

IKM Oracle Incremental Update

LKM File to Oracle (SQLLDR)

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

SALES

IKM Oracle Incremental Update

Requirements
To run an interface, you need at least the
following:
A target table
An Integration Knowledge Module
A Loading Knowledge Module if there is a remote
source

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Code Generation
When we ask ODI to Execute the transformations, ODI will
generate the necessary code for the execution (usually
SQL code)
The code is stored in the repository
The execution details are available in the Operator
Interface:
Statistics about the jobs (duration, number of records
processed, inserted, updated, deleted)
Actual code that was generated and executed by the
database
Error codes and error messages returned by the
databases if any
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Overview: 6 steps to Production


1.
2.
3.

Retrieve/Enrich metadata
Design transformations
Orchestrate data flows

4. Generate/Deploy data flows


5. Monitor executions
6. Analyze impact / data lineage

Development

Production

Development Servers and Applications

Production Servers and Applications

CRM

CRM

Data
Warehouse

Data
Warehouse
Legacy

Legacy

ERP

ERP
ESB

Files / XML

ESB

ODI Design-Time Environment


User Interfaces
Administrators

Design-time
Design-time
Repositories
Repositories

ODI Runtime Environment


Agent
Data Flow
Conductor

Designers
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.

Runtime
Repository

Files / XML

Agent
Data Flow
Conductor

User Interfaces
Operator
Metadata
Navigator

Q&A
Thank you for attending
If you have follow-up questions I will be here for the rest of
the day or can be contacted by email -andreas.katsaris@arisant.com

Copyright 2009 Arisant LLC.


Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.