Sie sind auf Seite 1von 26

Oracle ETL Architecture for

Enterprise Data Warehousing and


Business Intelligence Reporting
John Lucas - GHX

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Purpose:
• A Case Study – review how GHX
implemented an Enterprise BI
Architecture built around the Oracle
DBMS at its core
• Review aspects of Oracle DBMS that
allow for robust and near real-time
data refreshes
• Discuss key features of Oracle that
provide strong ETL functionality
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Background:
• John Lucas, MBA
• Enterprise Business Intelligence
Architect at Global Healtchare
Exchange, Inc. (jlucas@ghx.com)
• 13+ years in Enterprise BI Architecture,
Modeling and Development
• Experience with a variety of product
platforms: ETL tools, Reporting
applications, DBMS and Data
Warehouse Appliances.
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Requirement
• A great, new business requirement was given:
• integrate multiple systems
• provide the basis for a Revenue Recognition
application for Accounting, Finance, Operations
and Sales
• Provide Business Intelligence:
• on the key business processes in the value chain
of the company
• on order lifecycles - from sales opportunity to
delivery and finally revenue recognition
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Requirement

• Would have to support key Financial, Sales,


Operations and Accounting activities
• Needed to support several Java UI
applications
• Not just reporting, so data quality was
paramount
• Required to support hourly user interaction –
so a robust and possibly near real-time data
integration solution would be needed
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Requirement

• Major source systems included Oracle


Financials and Siebel CRM systems – with
complicated data models
• Early deliverables would require sourcing
approx. 40 tables from each of the major
source systems
• Then quickly create the application,
integration and reporting layer tables – the
end design would require over 100 tables
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Situation

• Time was of the essence

• Technical challenges (geographically diverse


systems and firewalls)

• Complicated process and data modeling


requirements

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Situation

• Strong and Experienced Business Objects and


Dashboarding environment – with numerous
internal and external users
• Relative green field in terms of Enterprise
Data Warehousing – a lot of opportunity
• Would need to utilize Agile development –
with quick and numerous releases to
production – (generally once a week)

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Evaluation

• Considered ETL with Informatica and Business


Objects
• Experienced with Oracle ETL from prior work
with a high-volume, Internet e-Commerce
company
• Decided that Oracle 10g native ETL provided
the functionality, flexibility, reliability and
speed that we required

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Game Changers
• Oracle functionality was key:
• Dbms_scheduler – a true job scheduler
• External tables – ability to source flat files
• Merge statements – simplified DML operations
• Sys.utl_file writes – ability to write flat files
• Analytic window functions – ability to de-
duplicate on key, keep running totals, etc.
• Clean, multiple schema design – logical separation
of base, integration and reporting layers
• Dbms_datapump – fast moves between 10g DBs
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Game Changers

• Oracle functionality was key:


• In concert, these elements allowed us to create an
architecture that could provide incremental updates
to nearly all 100 tables in the EDW data model –
every hour of the day
• Also used moving extract windows on the unloads
from source – usually several days
• Sometimes we would observe data issues that would
need to be resolved in the source systems
• Rolling extract windows allowed these to be self-
correcting in the EDW, rarely requiring manual fixes
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Design

•While racing I realized good BI Architecture is like a Triathlon


• you have to do well in all three disciplines –
(swim/bike/run) or (dbms/etl/reporting)
•Oracle database and native ETL -
• seamless integration of the first two components
•Allows for a robust architecture to effectively serve the
reporting layer as well

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Design

• Create separate schemas for:


• base source systems
• data integration
• targeted data marts and application layers
• Base layers mirror source systems – with
additional metadata columns
• A master proc would call all of the individual
schema sub-procs

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
The Design

• A final EDW master proc would call the


constituent schema procs
• A dbms_scheduler job would execute the master
proc once an hour
• Nearly all base, integration and reporting layer
tables use the merge statement – combining
insert/update functionality into a single DML
statement

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Source-to-Base Process Flow

Sys.Utl_File Formatted flat


called in procs files are
Source to write deltas created (one
Systems - to external for each table
Oracle 9i and directory extracted
10g from

Formatted flat Merge


files become procedures to
external upsert delta DW Base Layer
tables to DW data into base Target Tables
base layer tables

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
base One dbms_scheduler calls the master proc which initiates
the entire process of the hourly delta update

base
merge
proc
base
int
merge
merge proc
base edw
proc int
Etc...
base merge mart
merge int proc
proc
base
int
merge
base proc
Some procedures execute full refreshes of tables, other
procs take a “days-back-delta” parameter which is passed
from the master down to subsequent sub-procs
base
GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
DBMS_SCHEDULER – where it all starts
• Create a DBMS_SCHEDULER job for each major
job execution time frequency
• hourly, daily, weekly, etc.
• Use master or “wrapper” procs to group multiple
procs to execute for a given scheduler frequency
• Determine whether you want the individual
procs to execute independently – by modifying
exception handling parameters

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
DBMS_SCHEDULER – where it all starts
begin
dbms_scheduler.create_job (
job_name => 'daily_rl_revrec_schedule',
job_type => 'plsql_block',
job_action => 'begin rl_revrec_owner.daily_schedule_sp; end;',
start_date => trunc(sysdate)+7/24,
repeat_interval => 'freq=daily',
end_date => NULL,
enabled => TRUE,
comments => 'Daily execution of rl_revrec daily procedures.');
dbms_scheduler.create_job (
job_name => 'hourly_rl_revrec_schedule',
job_type => 'plsql_block',
job_action => 'begin rl_revrec_owner.master_rl_revrec_mproc; end;',
start_date => trunc(sysdate,'hh')+1/24,
repeat_interval => 'freq=hourly',
end_date => NULL,
enabled => TRUE,
comments => 'EDW Master RevRec Job');
end;

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
DBMS_SCHEDULER – Enterprise Manager

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
SYS.UTL_FILE – extracting to flat files in ext. directories
CREATE OR REPLACE procedure edw_bl_exch_master_extract( p_days_back number default 3 ) as
f utl_file.file_type ;
l_days_back number ;
begin
l_days_back := nvl(p_days_back,3) ;
-- Writing to file ext_trx_aggr_tp_health.txt
f := utl_file.fopen('ORA_EXT_EXCH','ext_trx_aggr_tp_health.txt','W',32767) ;
for s in ( select
':trx_aggr_tp_health:'
|| chr(9) || sender_eid
|| chr(9) || to_char(last_partition_key,'yyyymmddhh24miss')
as field_1
from
v_edw_trx_aggr_tp_health
where last_partition_key >= trunc(sysdate) - l_days_back )
loop
utl_file.put_line(f,s.field_1) ;
end loop ;
utl_file.fclose(f) ;
end ;

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Base External Tables – to source flat files into DW
CREATE TABLE EXT_TRX_AGGR_TP_HEALTH

(EXT_TABLE VARCHAR2(4000 BYTE),


…)

ORGANIZATION EXTERNAL

( TYPE ORACLE_LOADER
DEFAULT DIRECTORY ORA_EXT

ACCESS PARAMETERS

( RECORDS DELIMITED BY NEWLINE


BADFILE ORA_EXT_LOG:'ext_trx_aggr_tp_health.bad'

DISCARDFILE ORA_EXT_LOG:'ext_trx_aggr_tp_health.dsc'

LOGFILE ORA_EXT_LOG:'ext_trx_aggr_tp_health.log'
SKIP 0

FIELDS TERMINATED BY "\t"

MISSING FIELD VALUES ARE NULL


REJECT ROWS WITH ALL NULL FIELDS

ext_table char,) )
LOCATION (ORA_EXT:'ext_trx_aggr_tp_health.txt')

REJECT LIMIT UNLIMITED

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Base Merge Procs – to merge external data into base
CREATE OR REPLACE procedure BL_EXCH_OWNER.trx_aggr_tp_health_mproc as
merge into trx_aggr_tp_health b
using ext_trx_aggr_tp_health a
on (a.sender_eid = b.sender_eid
and a.receiver_eid = b.receiver_eid
and a.account_num = b.account_num
and a.tran_type = b.tran_type)
when matched then update
set
b.edw_start_dt = b.edw_start_dt

when not matched then
insert (
b.sender_eid

)
values (
a.sender_eid
);
Commit ;

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
Oracle Analytic Functions – Great for ETL
AVG * SUM *
CORR * VAR_POP *
COVAR_POP * VAR_SAMP *
COVAR_SAMP * VARIANCE *
COUNT *
CUME_DIST
DENSE_RANK
FIRST
FIRST_VALUE *
LAG
LAST
LAST_VALUE *
LEAD
MAX *
MIN *
NTH_VALUE*
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions *
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
ETL Comparison
ETL Activity Oracle ETL Informatica /OTS
De-duplication on key Analytic window functions Sort / Rollup components

Sums, running totals Analytic window functions Sort / Rollup components

Flat file loads External tables DBMS api calls


Fast data moves dbms- Data pump DBMS api calls – unload to
to-dbms file / reload
Surrogate key support Sequences Often requires custom code

Upsert as single DML Merge statements Usually requires separate


operation Insert/Updates
Temp data staging Database based File based
Scheduler Dbms_scheduler Can cost extra $$
Near real time execution Available with scheduled Can cost extra $$
procedures

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
In Conclusion:
• Success!
• We successfully deployed
our EDW and Revenue
Recognition application
• We passed an external audit
– where our design was
reviewed and approved

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.
In Conclusion:
• Tremendous value proposition
if you can maximize the use of
native Oracle ETL – can save
$100Ks ++
• Realize that there exists
tremendous ETL and data
integration functionality
provided in your existing Oracle
10g database

GLOBAL HEALTHCARE EXCHANGE, LLC (“GHX”) CONFIDENTIAL AND PROPRIETARY INFORMATION PROVIDED FOR REFERENCE PURPOSES ONLY. ANY USE OR REPRODUCTION MUST BE
EXPRESSLY PRE-APPROVED IN WRITING BY GHX.

Das könnte Ihnen auch gefallen