Sie sind auf Seite 1von 37

OWB Best Practices and Advanced Techniques

Mark Rittman, Rittman Mead Consulting


UKOUG Conference & Exhibition 2007
Who Am I?

• Oracle BI&W Architecture and Development Specialist


• Co-Founder of Rittman Mead Consulting
• Oracle BI&W Project Delivery Specialists
• 10+ years with Discoverer, OWB etc
• Oracle ACE Director, ACE of the Year 2005
• Writer for OTN and Oracle Magazine
• Longest-running Oracle blog
• http://www.rittmanmead.com/blog
• Chair of UKOUG BIRT SIG
• Co-Chair of ODTUG BI&DW SIG
• Second year of OU BI Masterclasses
• 18 countries visited in 2006-7
Rittman Mead Consulting

• Oracle BI&DW Project Specialists


• Consulting, Training, Support
• Works with you to ensure OWB project
success
• Small, focused team
• OWB, Oracle BI, DW
technical specialists
• Clients in the UK, Europe, USA
OWB Best Practices and Advanced Techniques

• Oracle Warehouse Builder is a full-lifecycle DW tool


• Data Modeling
• Data Mapping
• Metadata creation and management
• OLAP integration
• BI Tool metadata creation
• Data Quality
• It does everything...
... but how do you use it effectively?
... and what do the more advanced
features do?
OWB Project Key Success Factors

• The project delivers on time


• The DW meets the users requirements (even if they change)
• The data quality is acceptable
• The developers are motivated and enjoy working on the project
• The users accept the resulting data warehouse
• The performance of the ETL and queries is within limits
• The project budget is kept to
• The ETL and data warehouse is sustainable
OWB Project Objectives

• A simple, flexible design


• A robust ETL process
• A design that meets user objectives
• A single repository that drives ETL and reporting
• Happy developers
• Happy users
• Happy budget owners
Ten OWB Best Practices (and Advanced Techniques)

1.Adopt a RAD (Agile) Methodology


2.Create an Effective Development Environment
3.Use a Top-Down ETL Design Approach
4.Use a Dimensional Design
5.Use Dimension and Fact ETL Templates
6.Use the Data Quality Option to Assess and Manage Data Quality
7.Leverage Advanced OWB Features
8.Design Code Promotion Early
9.Welcome Evolving Requirements
10.Deliver Data and Reports To Users Frequently
1: Adopt a RAD Methodology

• An observation based on what OWB projects work


• Develop software in short amounts
• Multiple development cycles
• Emphasis on face-to-face communication
• Co-location (development centres)
• Test-first development
• Embrace rapidly changing requirements
• Follow Agile/Extreme programming principles
• Communication, Feedback, Simplicity, Courage, Respect
2: Create an Effective Development Environment

• Ensure you environments are as expected


• Ensure your code works
puts "DR project: $v_project";
• Reduce effort required to achieve this puts "MAKEOWB_DIR: $v_makeowb_dir";

• Create a repeatable process set OMBCONTINUE_ON_ERROR on

OMBCONNECT $v_dr/$v_dr_pw@$v_host:$v_port:$v_sid;
• Use OMB Plus code OMBCC '$currentProject';
• Script everything OMBCONNECT RUNTIME '$v_rt' USE PASSWORD '$v_rt_pw';

• Script the environment setup foreach modName [list ODS STG DW

• Script code promotion #######################################################


# Create Deployment Action Plan for each module

• Script diagnostics and reports #######################################################

if { $v_mdp_alt == "Y" } {
• Take same approach to the ETL process OMBCREATE TRANSIENT DEPLOYMENT_ACTION_PLAN
'DROP_MAPPING_DEPLOY_PLAN'

• Script everything OMBCREATE TRANSIENT DEPLOYMENT_ACTION_PLAN


'CREATE_MAPPING_DEPLOY_PLAN'
set v_mdp_alt N
• Run pre-flight checks }

• Log all ETL runs in a Control table OMBCC '$modName'

foreach mapName [OMBLIST MAPPINGS] {


if { [string range $mapName 0 2] != "NR_" } {
incr v_ctr
3: Use a Top-Down ETL Design

• Use Process Flows


• Create master process, decompose to sub-processes
• Create as “stubs” that are placeholders
• Fill in detail from the top down
• Keep it simple
• Code for what’s needed now, not what might
be needed
• Product working software
• Ensure “stub” mappings and process flows
execute right from the start
• Ensure your design and build process delivers
working daily builds
• Create a framework
• Audit tables, Control tables
• Pass through execution/batch IDs
Start at Master Process Flow, Work Down

• Start at master process flow, gradually decompose through dims and facts
• Ensure each process flow executes from start (add stub code)
• Add input parameters to each mapping for exec ID, pass to start of PF
4: Use Data Modeling and ETL Templates

• Adopt a conformed dimensional design


• Kimball Bus Architecture, leverages OWB and Oracle DW features
• Select ROLAP or MOLAP storage based on performance need (and budget)
• Loading facts and dimensions is a fairly standard process
• Should be able to always perform the same steps
• Stage
• Conform (and add defaults)
• Identify new and changed
• Transform
• Load new
• Load updated
• Use templates and standard naming for each of these steps
Example Dimension Template
Start

Load data from source. Populates the _STG table. An optional filter
should be added to the mapping.

100 - Stage data


If required, trim, capitalise, check for nulls and format the data. This
mapping should be run as row based, all erroneous records will be
rejected, but the mapping will continue. This may not be possible
with extremely high volumes of data

This step may not be required if the data is being read from a 200 - Conform
trusted source. data
250 - Populate
Populates the _CNF table. error table with
records not
loaded during
transform
This step should compare the conformed data with the master data 300 - Identify all
set the changed data
This step will not be required if only changed data is being read, for
example is CDC is being employed.

Populates the _DLT table. 400 - Split data


into new and
changed records
Populates the _NEW and UPD tables.

Apply any business rules to data. This may be the same for new
and changed records, or they may have different requirements. If it 500 - Transform
is the same then consider using Pluggable Mappings. data

Populates the _NEWn and _UPDn tables.

600 - Load new


data
Populates the DIM_ table.

700 - Load
This step should take care of any SCD requirements.
changed data
Populates the DIM_ table.

End
5: Use OWB Data Quality Option

• Data quality issues cause 50% of ETL projects to be delayed


• Use OWB Data Quality Option to manage, assess and improve data quality
• Initially assess data quality using the Data Profiler
• Use this to derive data rules
• Use data rules to ensure data quality
• Either correct the data automatically
• Or use data rules within mappings
• Use Data Auditors to monitor data quality during ETL runs
• Optionally, run remedial code if quality is poor
Automatic Data Corrections

• See example at
http://www.oracle.com/technology/pub/articles/rittman-owb.html
6: Promote Code Re-Use

• Identify the ETL processes that are repeated throughout the build
• Try to design these upfront and re-use the code
• Avoid views and PL/SQL procedures
• Use Pluggable Mappings if possible
• Preserves the logic within OWB
• Graphical representation
• Metadata and lineage preserved
• Note: adding a pluggable mapping
copies it into the mapping; will need
to re-synchronize if pluggable
mapping changes
7: Leverage OWB Advanced Features

• OWB comes with many ETL accelerators


• Dimension and cube loaders
• SCD2 handling
• Pivots, unpivots
• Match-Merge
• Key Lookup
• Splitters, Filters
• OLAP Integration
• Discoverer Integration
• Some are “free”, some
require Enterprise ETL
or Data Quality licenses
SCD2 Type-2 Dimension Handling

• Automatically handle the creation of historical records


• End-date old records, create new records
• All transparent to developer, just select trigger and start/end date columns
• Warning - only use from 10.2.0.3 upwards (bugs in earlier versions)
Match-Merge Operator

• Merges two or more rowsets


• Uses fuzzy logic to perform match
• Standardized Edit Distance
• Jaro-Winkler
• Etc
• Single row output for each
matched set of records
• XREF table for debugging
• Useful for householding,
customer matching etc
Key Lookup

• Automatic retrieval of lookup values


• Translates into (outer/inner) join in SQL mapping
• More descriptive, faster to develop
Fact and Dimension Loading

• Used for loading data into a dimension or cube


• Dimension operator is actually a pluggable mapping
• Populates surrogate key using dimension
• Handles SCD2 and SCD3 updates
• Merge of new and changed records
• Cube operator is also a
pluggable mapping
• Looks up surrogate keys
based on business keys
in fact table
• No need to separately
look up surrogate keys
Data Rules and Error Handling

• Tables that have Data Rules applied can use Error Handling feature
• Additional columns added to tables to handle errors
• Either ignore, report or move to error table
• From 10.2.0.3, can use DML Error Logging feature of Oracle DB 10.2
8: Design Code Promotion Early

• Consider how many environments you will need


• Development, Test, UAT, Production
• Consider whether you will need to deploy to >1 database
• Multiple Control Centers
• Single Repository, multiple environments
• Switch between deploy environments using Configurations
• Multiple Repositories, multiple environments
• Conceptually easier, import/export using MDL
• Use OMB*Plus scripting to automate
Deploying using a Single Repository

• Single repository for design metadata and dev environment


• Second repository created for production, only Control Center tables used

Data Warehouse Development Server Data Warehouse Production Server

OWB Repository Control Control


Center Center
Staging Layer Staging Layer
Service Service
ODS Layer (Java) ODS Layer (Java)
Dimensional Layer Dimensional Layer

Developer Workstation

Design Control Repository


Center Browser
Center Manager
OWB10gR2 Configurations Feature

• Multi-Configuration (an Enterprise ETL feature) can help


with managing multiple locations, Control Centers
• Create an additional Configuration
• Associated with the production
Control Center
• Modules are associated with
production locations when
the configuration is active
Deploying using Multiple Repositories

• A separate copy of design and control center metadata in each environment


• Code promoted between environments using MDL exports and imports

Data Warehouse Development Server Data Warehouse Production Server

OWB Repository Control OWB Repository Control


Center Center
Staging Layer Staging Layer
Service Service
ODS Layer (Java) ODS Layer (Java)
Dimensional Layer Dimensional Layer

Developer Workstation

Design Control Repository


Center Browser
Center Manager
Scripting Multi-Repository Code Promotion

• Robust projects rely on build automation


• Quickly integrate changes
• Enable regression testing puts "connecting to trainx_repos repository"
• Allows creation of daily build OMBCONNECT trainx_repos/password@winxpvm:1521:ora10g

puts "exporting TRAINING_PROJECT module to MDL file"


• OMB*Plus is the OWB scripting language OMBEXPORT MDL_FILE 'c:\training_project.mdl' FROM
PROJECT 'TRAINING_PROJECT' OUTPUT LOG TO 'c:
• Based on TCL \training_project.log'

puts "repository exported ok"


• Access to all repository OMBDISCONNECT

features puts "disconnected from trainx_repos"


puts "connecting to trainx_repos_prod"
• Create/amend objects OMBCONNECT trainx_repos_prod/password@winxpvm:
1521:ora10g

• Deploy objects and mappings puts "importing TRAINING_PROJECT project from MDL file"
OMBIMPORT MDL_FILE 'c:\training_project.mdl' USE
• Import/Export between repositories UPDATE_MODE MATCH_BY NAMES OUTPUT LOG TO 'c:
\training_project.log'

• Combine with WSH or Unix shell puts "Changing context to the TRAINING_PROJECT project"
OMBCC 'TRAINING_PROJECT'
scripts to automate build process puts "Connecting to the DEFAULT_CONTROL_CENTER"
OMBCONNECT CONTROL_CENTER

OMBCOMMIT
9: Welcome Evolving Requirements

• RAD and Agile methdologies “welcome” evolving requirements


• The nature of BI and data warehousing
• Need to ensure the project is not “brittle”
• Use metadata management features of OWB
• Impact Analysis
• Data Lineage
• Change Propagation
Managing Changes to OWB Metadata

• Metadata Dependency Manager


– Requires Enterprise ETL Option
– Interactively determine impact
and lineage of a data item
– Interactively propagate
metadata changes
• “Free” Metadata Management
using the Repository Browser
Interactive Impact Analysis

• Right-click on any object, establish impact or lineage


– Impact displays dependent objects and mappings
– Lineage shows what objects were used to populate
• Note: Ensure you “Expand All” on each object on canvas
• Metadata Dependency Manager
provides alternate UI
Change Propagation

• Introduce changes to source objects,


propagate change through dependencies
– Highlight column, select “Propagate Change”
– Change required attributes + propagate change
• Gotchas:
– Ensure you select “Show Full Impact” before propagating
– Some transformations (PL/SQL mostly) block propagation
10: Deliver Data and Reports Frequently

• Use the Discoverer Integration to rapidly deliver data and reports


• Generate the EUL from within OWB
• Folders, items, hierarchies
• Complex folders, Item Classes, Calculations
• Sanity-check the design with users
• Can they produce the reports they need?
• Do the figures look right?
• Is performance acceptable
• Involve the users throughout the project
• Reduce risk
• Increase rate of acceptance
BI Metadata Creation

• Create OracleBI Discoverer EUL elements within OWB


• Business Areas
• Folders
• Drill paths (hierarchies)
• Item Classes
• Lists of values
• Alternate Sort Orders
• Registered Functions
• Create manually, or derive
• Turns dimensions in to drill paths
• Keeps all BI elements in one repository
• Requires Enterprise ETL Option
• Still need to create EUL using Discoverer Admin
Review Prototype Reports with Users

• Establish whether projected deliverables are what they expect


• Can they produce the reports they require?
• Is performance acceptable?
• Do the figures look correct?
• Provides immediate feedback before
too much time is invested in ETL
• Just in case you need to revise
the design
Other Miscellaneous Best Practices

• Run the repository database in ARCHIVELOG mode


• It’s an OLTP application, you may need to recover to a point-in-time
• Apply patches as they become available
• Bugs are fixed (SCD2), functionality becomes available (DML Error Logging)
• Create a set of naming standards at the outset of the project
• Used for OWB, Used for Database
• For example: all mappings are prefixed with MAP_
• Extend the naming to Operators in Mappings, e.g. Joiner groups
• Define a security policy (FGAC vs. simple security on repository objects)
• Give each developer their own username, i.e. don’t all log on as OWB_REPOS
• Design and build each Mapping with performance in mind
• Beware of the cost of using Transformations in Set based mappings
• Don’t hide functionality in your Mappings
• Use Filter Operators to restrict data sets, not predicates in Join Operators
• Create modular, single process mappings - Debugging large mappings is hard
Further Reading

• The Rittman Mead Blog


• http://www.rittmanmead.com/oracle-warehouse-builder
• The OWB Product Development Blog
• http://blogs.oracle.com/owb
• Oracle Technology Networki
• http://otn.oracle.com
OWB Best Practices and Advanced Techniques

Mark Rittman, Rittman Mead Consulting


UKOUG Conference & Exhibition 2007

Das könnte Ihnen auch gefallen