Beruflich Dokumente
Kultur Dokumente
and
Business Intelligence
The Complete Story
Andreas Katsaris
Arisant, LLC
Interactive
Dashboards
Publisher
Delivers
Oracle BI
Enterprise Data
Warehouse
Bulk E-LT
Oracle Data Integrator
E-LT Agent
Other
Sources
SAP/R3
E-LT
Metadata
PeopleSoft
Oracle
EBS
Siebel
CRM
Agenda
Warehouse definition
Why Bother?
Life before Integrated warehousing and BI
Terminology
Managed
Reporting
INFORMATION
Staging
Area
Warehouse
Data Mart
Transformation Transformation
Extraction
& Load
& Load
Notification
End
Users
Analysis
Ad-Hoc
Queries
KNOWLEDGE
Dashboards
Balanced
Scorecards
Ultimate Goal
Timely, Accurate, Fast delivery of knowledge
3NF
Star Schema
OLAP cubes
(rewrite, refresh)
Better Performance
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Sourcing
-
-
-
-
-
-
-
Replication
Transportable Tablespaces
Streams
CDC (Change Data Capture)
External Tables (access to flat files)
Data Pump
SqlLoader (access to flat files)
Replication
Snapshot logs on source system
Snapshot technology can be extended to
produce/preserve exact row changes
Resistance from source system owners
usually complain about performance impact
space requirements for snapshot logs
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Transportable Tablespaces
copying of datafiles (tablespaces in read only mode)
move both table and index data avoids index rebuilds
Cross platform support
When to consider:
Lots of source system changes
Required tables can be placed in specific tablespace(s)
No timestamps so incremental extraction not possible
Other options not allowed (streams, replication, CDC)
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Streams
Capture
Staging/Propagation
Consumption
Parallel processing of log files
Non-intrusive
No triggers or snapshot logs
uses redo log/archive log
External Tables
Files stored outside the database
Read only in 9i Read/Write in 10g(CTAS)
compressed and encrypted in 11g
DML/index creation not allowed
Can be read with SQL as if they were tables
May be joined with database tables
Filtering allowed with WHERE clause
Can be used to load staging area or act as a staging
area themselves
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Data Pump
Next generation export/import tools
Various interfaces
expdp / impdp
Web based GUI via Database Control
DBMS_DATAPUMP
Resumable Operations
Instance level
init.ora: RESUMABLE_TIMEOUT = 3600
Session level
ALTER SESSION ENABLE RESUMABLE <TIMEOUT secs>;
ALTER SESSION DISABLE RESUMABLE;
Used for:
DDL
CREATE INDEX, CTAS
DML
INSERT/UPDATE
Detection:
Look in DBA_RESUMABLE where STATUS='SUSPENDED'
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
PL/SQL
Easy to write, flexible, fast deployment
Extraction/Loading using Direct Path
Operations
CTAS operations with NOLOGGING
INSERT SELECT with APPEND and PARALLEL hints
Ideal for incremental extractions (e.g. timestamp based)
Transformations/Loading
Pipelined functions, MERGE, Multitable INSERT, INSERT
FIRST, pivot, myriad of Analytical functions
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
PL/SQL
BULK operations
BULK COLLECT
bulk binds with SELECT statements
fetch into user defined array or PL/SQL table
FORALL
bulk binds with INSERT, UPDATE, and DELETE
statements
RETURNING INTO
Retrieving DML Results into a Collection
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Maintenance
Maintenance
DML/DDL Operations using :
Direct Path Operations (NOLOGGING, append
hint)
Parallelism
Partitioning
ILM (Oracle ILM tool)
Advanced Compression
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Performance
Dimensional modeling/star schemas
Partitioning
Parallelism
Deferred/disabled constraints
Index Management (unuse/rebuilt)
Advanced compression (test first)
Materialized Views
OLAP Cubes/Cube Organized Materialized Views
Data Caching
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Schema
FACT table
skinny and large
Contains numeric measurements and FKs to dimensions
FK columns bitmap indexes
Consider partitioning and archiving aspects
Dimensions
Denormalized small and wide
Contain business attributes relating to the measurements
Avoid snowflake if possible
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Schema
Intuitive to end-users
Mirrors the way people think about the business
High Query Performance
DIMENSION 1
REGION
DIMENSION 2
PRODUCT
DIMENSION 3
TIME
FACT
SALES
DIMENSION 4
ENTITY
Parallelism
Queries requiring large table scans, joins, or
partitioned index scans
Creation of large indexes
Creation of large tables (including
materialized views)
Bulk inserts, updates, merges, and deletes
Understand related init.ora parms and set
appropriately
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Parallelism
Types:
Parallel Query
Parallel DDL
Parallel DML
Implementation:
via a hint
/*+ PARALLEL(<table name>,<degree of parallelism>) */
Partitioning
Goal
Rolling window operations
Increased performance
Types
Range Partitioning
Hash Partitioning
List Partitioning
Composite Partitioning
Partitioning
Partition Pruning (a.k.a partition elimination)
evaluate WHERE clause to eliminate unneeded
partitions
Reduces unnecessary I/O
Improved query performance
Interval Partitioning
Automatic creation of range-based partitions
REF Partitioning
Partition detail table based on the master-table key
Interval Partitioning
Minimizes periodic partition maintenance (No need to
create new partitions)
Partition segments allocated as soon as new data
arrives
Local indexes are created and maintained as well
Requires at least one range partition
Range key value determines the range high point
Partitioning key can only be a single column, and either
DATE or NUMBER datatype
CREATE TABLE sales()
PARTITION BY RANGE (sales_date)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
( PARTITION p1 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
REF Partitioning
Advanced Compression
Table Compression
Table Scan Performance: 2x faster
Storage Savings: 2x smaller
DML Performance: 5% slower
CREATE TABLE SALES_FACT () COMPRESS FOR ALL OPERATIONS;
RMAN Compression
~40% faster than compressed backups in 10g
Slightly better compression ratio than in 10g
RMAN> CONFIGURE COMPRESSION ALGORITHM zlib;
RMAN> backup as COMPRESSED BACKUPSET database archivelog all;
Advanced Compression
5% - most active
60% - historical
Optimizing Querying
Star Queries
Materialized Views
Cube Organized Materialized Views
Index Merge
Bitmap Join Indexes
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Star Queries
A join between a fact table and a number of dimension
tables
Require the existence of bitmap indexes on FK columns on
the FACT table
retrieve exactly the necessary rows from the fact table using
n merged bitmap indexes
join this result set to the dimension tables
Implementation:
star_transformation_enabled=true|false|temp_disable (system or
session modifiable)
hints for stubborn queries
/*+ STAR_TRANSFORMATION */
/*+ FACT */
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Materialized Views
Pre-aggregation of data
Changes to the underlying tables reflected in mview
Fast/Complete refresh
Refresh can be on demand or scheduled
DBMS_ADVISOR package
APIs to recommend materialized views and indexes
DBMS_ADVISOR.TUNE_MVIEW()
API to tune existing materialized views and indexes
Tuned version in USER_TUNE_MVIEW
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Materialized Views
Query rewrite makes mviews transparent to end-users
Query (report, ETL
code, etc)
Create materialized
view enable
query rewrite
Select .
Without query
rewrite
Refresh mode ?
INSERT/
Snapshot logs
UPDATE/
DELETE
Base tables
Build cubes
Index Merge
Merge two separate indexes
Avoid creating a new concatenated index
Implementation via a hint
/*+ index_join(table_name,index1, index2) */
Index Merge
SQL> select count(*) from FINANCE_TRANS where COMPANY_KEY between 1 and 1000 and CHRGE_CODE
= 'BASIC';
Elapsed: 00:03:44.03
Execution Plan
---------------------------------------------------------0
0 SORT (AGGREGATE)
Index Merge
SQL> select /*+ index_join(FINANCE_TRANS, FINANCE_TRANS _IX2,TEMP1) */ count(*) from FINANCE_TRANS
where COMPANY_KEY between 1 and 1000 and CHRGE_CODE='BASIC';
Elapsed: 00:00:01.73
Execution Plan
---------------------------------------------------------0
0 SORT (AGGREGATE)
INDEX (RANGE SCAN) OF ' FINANCE_TRANS _IX2' (NON-UN IQUE) (Cost=1811 Card=10004
Bytes=110044)
Index Management
Why is it a big deal in a warehouse?
Index maintenance during DML
P1
P2
P3
P4
P5
P6
Constraint Management
Do I need foreign keys in a warehouse?
My code takes care of data integrity
Our data is clean
Disable/enable, validate/novalidate
Enable in parallel!
Metalink Note:124848.1
Statistics Management
Be careful what/when you analyze
Manage statistics on big tables
Do I really need to analyze all the tables?
Determine frequency/timing, estimate sample
Consider faking statistics
Consider providing APIs to ETL application
Ability to analyze before/after certain loads
Index monitoring
Watch out from dbms_stats.gather_table_stats and
dbms_stats.gather_index_stats behavior
Procedures mark analyzed indexes as USED!
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
From this
To this
2009
OLAP
Engine
-Analytic Apps
Mining
Engine
- Analysis
ETL
Reporting
Engine
P
o
r
t
a
l
N0n-Integrated Solution
Siloed reports, multiple islands of information
No knowledge sharing
Duplication of effort and reports
Security issues
Different formats and tools, local data copies
Access and distribution is problematic
Oracles Approach
Integrated BI Database
BI, Data Mining and OLAP functions in the database
Integrated BI Tools
Common BI Technology Platform
Oracle Daily
Business Intelligence
OWB/ODI
Oracle Corporate
Performance Management
Oracle Partitioning
PeopleSoft Enterprise
Performance Management
Oracle OLAP
Oracle BI Applications
Auto
Energy
Financial
Services
(source - Oracle)
Insurance
High
Tech
Life
& Health Sciences
Public
Sector
Travel
& Trans
Sales
Analytics
Service &
Contact Center
Analytics
Marketing
Analytics
Financial
Analytics
Supply Chain
Analytics
Pipeline
Analysis
Churn
Propensity
Campaign
Scorecard
Receivables /
Payables Analysis
Supplier
Performance
Employee
Productivity
Triangulated
Forecasting
Customer
Satisfaction
Response
Rates
Customer
Profitability
Inventory
Analysis
Compensation
Analysis
Sales Team
Effectiveness
Resolution
Rates
Product
Propensity
Product
Profitability
Procurement
Cycle Times
Compliance
Reporting
Up-sell /
Cross-sell
Service Rep
Effectiveness
Loyalty and
Attrition
Regulatory
Compliance
Inventory
Availability
Workforce
Profile
Discounting
Analysis
Service Cost
Analysis
Market Basket
Analysis
Expense
Management
Employee
Expenses
Turnover
Trends
Lead
Conversion
Service
Trends
Campaign ROI
Cash Flow
Analysis
BOM Analysis
Return on
Human Capital
Prebuilt adapters:
Siebel
Oracle
SAP
PeopleSoft
Workforce
Analytics
Other Operational
& Analytic Sources
OBI Apps
(source - Oracle)
BI Server Overview
Integrates information from various sources
Native RDBMS support for Oracle, SQL Server, DB2 and Teradata
Customer
Design-Time
User
Interfaces
Data Flow
Generator
Designer
Operator
Knowledge Module
Interpreter
Data Flow
Generator
Runtime
Session
Interpreter
Agent
Data Flow
Conductor
Thin
Client
Knowledge Modules
Data Flow
Metadata Management
Master
Repository
Work
Repositories
Runtime
Repositories
Terminology
Interface
Staging Area
A separate, dedicated area in an RDBMS where ODI
creates its temporary objects and executes some of the
transformation rules
By default, ODI sets the staging area on the target data
server
- can be on the source or a 3rd RDBMS or the Sunopsis
Memory Engine
- cannot be placed on non relational systems (Flat files,
ESBs, etc.)
Metadata
A Knowledge Module is
made of steps
Each step has a name
and a template for the
code to be generated
Knowledge Modules
Hot-Pluggable: Modular, Flexible, Extensible
Pluggable Knowledge Modules Architecture
Reverse
Engineer Metadata
Journalize
Read from CDC
Source
Load
From Sources to
Staging
Check
Constraints before
Load
Integrate
Transform and Move
to Targets
Service
Expose Data and
Transformation
Services
Reverse
W
W S
S
W
S
Staging Tables
Load
CDC
Sources
Integrate
Target Tables
Check
Journalize
Services
Error Tables
Log Miner
SQL Server
Triggers
DB2 Journals
Oracle
DBLink
DB2 Exp/Imp
JMS Queues
Oracle
SQL*Loader
Check MS
Excel
Check
Sybase
TPump/
Multiload
Type II SCD
Oracle Merge
Siebel EIM
Schema
Oracle Web
Services
DB2 Web
Services
KM Types
Models
Interfaces
KM Type
Description
LKM
Loading
IKM
Integration
CKM
Check
RKM
Reverseengineering
JKM
Journalizing
SKM
Web Services
Staging Area
Transform & Integrate
ORDERS
1
LINES
Extract/Join/Transform
Join/Transform
2
CORRECTIONS
File
C$_0
C$_1
Extract/Transform
I$_SALES
SALES
Staging Area
ORDERS
LKM_1
LINES
LKM_2
CORRECTIONS
File
IKM_1
C$_0
IKM_1
C$_1
I$_SALES
SALES
Requirements
To run an interface, you need at least the
following:
A target table
An Integration Knowledge Module
A Loading Knowledge Module if there is a remote
source
Code Generation
When we ask ODI to Execute the transformations, ODI will
generate the necessary code for the execution (usually
SQL code)
The code is stored in the repository
The execution details are available in the Operator
Interface:
Statistics about the jobs (duration, number of records
processed, inserted, updated, deleted)
Actual code that was generated and executed by the
database
Error codes and error messages returned by the
databases if any
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Retrieve/Enrich metadata
Design transformations
Orchestrate data flows
Development
Production
CRM
CRM
Data
Warehouse
Data
Warehouse
Legacy
Legacy
ERP
ERP
ESB
Files / XML
ESB
Design-time
Design-time
Repositories
Repositories
Designers
Copyright 2009 Arisant LLC.
Oracle, JD Edwards, and PeopleSoft are registered trademarks of Oracle and/or its affiliates.
Runtime
Repository
Files / XML
Agent
Data Flow
Conductor
User Interfaces
Operator
Metadata
Navigator
Q&A
Thank you for attending
If you have follow-up questions I will be here for the rest of
the day or can be contacted by email -andreas.katsaris@arisant.com