Sie sind auf Seite 1von 51

Oracle Data Integration Developer Day

Introduction to Oracle Enterprise Data Quality (EDQ)


Oracle EMEA Data Integration Solutions (DIS)
1

Copyright 2011, Oracle and/or its affiliates. All rights


reserved.

The following is intended to outline our general product


direction. It is intended for information purposes only,
and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or
functionality, and should not be relied upon in making
purchasing decisions.
The development, release, and timing of any features or
functionality described for Oracles products remains at
the sole discretion of Oracle.

Copyright 2011, Oracle and/or its affiliates. All rights


reserved.

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

Oracle Data Integration Solutions


Best-in-class Heterogeneous Platform for Data Integration

Custom
Applications

Oracle
Applications

MDM
Applications

Business
Intelligence

Activity
Monitoring

SOA
Platforms

Comprehensive Data Integration Solution


SOA Abstraction Layer

Process Manager

Data Services

Data Federation

Oracle Data Integrator

Oracle GoldenGate

Enterprise Data Quality

ELT/ETL

Real-time Data

Data Profiling

Data Transformation

Log-based CDC

Data Parsing

Bulk Data Movement

Bi-directional Replication

Data Cleansing

Data Lineage

Data Verification

Match and Merge

Storage

Service Bus

Data Warehouse/
Data Mart

Copyright 2011, Oracle and/or its affiliates. All rights


reserved.

OLTP
System

OLAP Cube

Flat Files

Web 2.0

Web and Event


Services, SOA

Data Integration
Key Component of Oracle Fusion Middleware

Applications

Middleware

Database

Infrastructure &
Management

Pervasive Data Integration


Data Integration Integrated and Embedded with Software and Hardware

Applications

Middleware

Database

Infrastructure &
Management

Data Integration in the Marketplace


Sorted by Vertical Markets (sample customers only)
Communications

Finance / Banking

Media

Services

Energy/Industrial

Insurance / Health

Retail

Other

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

Your Data is Changing


Companies

Individuals

In one hour

240 businesses will change


addresses
150 business telephone numbers will
change or be disconnected
112 directorship (CEO, CFO, etc.)
changes will occur
20 corporations will fail
12 new businesses will open their
doors
4 companies will change their name

Products

In one hour

5,769 individuals in the US will


change jobs
2,748 individuals will change
address
515 individuals will get married
263 individuals will get divorced
186 individuals will declare a
personal bankruptcy

In one year

On average 20% duplicates in


product data
90% product introductions fail
Retailers lost 40 billion or 3.5%
of total sales lost each year due
to item info inefficiencies
60% error rate for all invoices
generated
Global Data Sync will realize
30% lower IT costs

Master data changes at rate of 2% per month


Compounded, 2% monthly change is 27% per year, 61% in two years, 104% in three years!!!

Source: D&B, US Census Bureau, US Department of Health and Human Services, Administrative Office of the US Courts,
Bureau of Labor Statistics, Gartner, A.T Kearney, GMA Invoice Accuracy Study

Two Facts about Data Quality

Clear view on data


The Data Quality Challenge is an iceberg
The biggest DQ threats are the ones we do not see.
Data Profiling lowers the water line and draws a clear view
of the quality issues

Known Data
Issues

Risk manageable
Business rules tractable
Expectations clear
High business user involvement

Risk unmanageable
Business rules unknowable
Missed expectations
Little business user involvement

Suspected
Data Issues
Unexpected
Data Issues

Two Facts about Data Quality

Data quality decays


Data value decays
Data is an asset which value decays over time
Business events can make this worse
Quality is not a one shot process but a constant effort in the
enterprise processes.
Data Quality needs to be pervasive and continuous.

Example of Data Quality Issues


A Simple Customer Table Sample
Non Standard
formats

Matching Records

Name

Address

City

State

Zip

Phone

Email

Bob Williams

36 Jones Avenue

Newton

MA

02106

617 555 000

bob.williams@yahoo.com

Robert Williams

36 Jones Av.

MA

02106

617555000

Burkes, Mike and Ilda

38 Jones av.

Nweton

MA

02106

617-532-9550

mburkes@gmail.com

Jason Bourne,
Bourne & Cie.

76 East 51st

Newton

MA

617-536-5480

6175541329

Mis-fielded
data
Multiple Names
Mixed business
and contact
names

Typos
Missing Data

Data Quality Delivers Value Across All Industries


Financial
Services

Single view of high quality customer data drives accurate customer insight and
improved marketing effectiveness
Supports compliance and reporting KYC requirements

Retail

Harmonizes customer data from multiple channels to improve sales and


marketing effectiveness
Enhance online product search for ECommerce

Telco

Improves customer insight for revenue optimization and targeted customer


retention
Effective compliance and risk mitigation for next generation services

Energy & Utilities


Utilities

Healthcare

Government

Expands understanding of network assets and customer delivery points


Improves management of regulatory compliance and reporting requirements

Delivers a comprehensive view of patient for care and billing


Manages patient, epidemiology, diagnosis and treatment data quality across
systems and organizations
Single view of citizen for better internal information sharing, service delivery,
licensing, provision of child care, and fraud detection
Reduce costs through system rationalisation

Business Impact of Data Quality

With Bad Data


Reduced ROI
Increased project risk, time
and cost
Expensive downstream
consequences wrong
shipment, wrong invoices,
incorrect parts
Only 30% of BI/DW
implementations fully succeed.
The top two reasons for failure?
Budget constraints and data
quality.

With Good Data


Increased ROI on existing
systems
Increased agility
Increased efficiency
Increased customer
satisfaction
Increased scalability

#1 reason CRM projects fail:


Data Quality

Data integration and data quality are


fundamental prerequisites for the
successful implementation of enterprise
applications, such as CRM, SCM, and
ERP.

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

Introducing Enterprise Data Quality

Enterprise Data Quality


Profiling

Analysis

Parsing

Standardization

Match/Merge

Reporting

Case Management

EDQ Key Capabilities


Highly configurable to match specific
business needs
Case management tools for tracking
and web-based KPI reporting for
increased productivity
End-to-end enterprise customer data
management, intuitive for business
users
Finely configurable to match the
specific needs of the business
Entirely written in Java; open API
Multi-threaded client/server application

Oracle EDQ Differentiators

Integrated DQ
Solution

Seamless integration of all core DQ capabilities


profiling, cleansing, matching & reporting all on
same page with instant feedback
Engineered for business users
Integrated team collaboration and management

Modern
Architecture

Easy to configure and integrate DQ Services


Scalable from project to enterprise deployment
Batch and real time operation
Modern, open architecture (Java, SOA, etc.)
Collaborative, multi-user project support

Ease of use

Agnostic of data-domain, vertical market & application


Brings DQ out of the back-office
Users can monitor what matters to them
Business context to gain understanding and
consensus

17

Delivering and maintaining Data Quality

Profile get to know your data, weak spots, problems,


priorities
Improve match, merge, standardize, enrich get data
right at a point in time
DQ Firewall never pollute the reservoir again,
safeguards, QA on spoke systems, QA on hub, QA on
transport
Governance process visibility & metrics, how well is
process working, where can it be improved

Data Integration Use Cases

Data Migration/Data Load to Data Warehouse


When loading data to staging area on DW
For analysis of data flowing across the ETL platform
When loading data to a Data Mart or Data Store

BI projects
For Data Governance once loaded on the target system

19

Data Consolidation Move & Improve Data


Replicate Data (Batch/Real-Time) & Cleanse Along the Way
Solution

Oracle Data Integrator

ETL/E-LT Process
Sources

Parsing, Cleansing, Standardization,


Matching

Target

Oracle Data Integration Solutions for


consolidating data into Data Hubs or
consolidated Applications and Oracle
GoldenGate for real-time synchronization
with transactional systems
Data Migrations (w/Downtime) multiphased approach with Data Cleansing &
Profiling
Application Migrations requiring heavy
data transformations, including match
and merge, or other data manipulation

Benefits
Near real time replication with deep
transformations (App-to-App)
Fix the data before copying (i.e.: put an
end to garbage in garbage out)

Data Quality & Governance


Profile, Cleanse and Govern Business Data
Solution
Profile business data to find the bad data
and assess quality over time
Cleanse, match and merge data before it
gets loaded into the Data Mart or Data
Warehouse
Repair data during batch processing flows

Benefits
Stop the garbage-in-garbage-out cycle!
Improve the trust of data within business
marts and data warehouses
Enable IT to deliver value to the lines of
business during every data integration flow
Improve business efficiency with match and
de-duplicate redundant records

21

EDQ Product Architecture

All Java Server (Stateless)


Java Webstart Client
Applications
Fully integrated with a single
repository and UI
Batch and Real-time
Execution
Connects to virtually any
source/target of data
Platform Independent

Key Features

Fully Unicode Compliant

Key Features
Comprehensive DQ Functionality with a single UI and
Repository

Key Features
Provided Extensions
for Customer Data
and Locales
Integrated Global
Address Verification
Highly Extensible

UI Overview
Launchpad of an EDQ Server:

Configure DQ
Processes

Review Match
Results

Manage DQ
Issues

Monitor Data
Quality

Server and User Administration

Director
Collaborative Configuration Environment:

Match Review
Independent End User Review of Matches
Review
Matching
Records

Review Overview

Manual Decisions
Configurable Decision Workflow
Full Audit Trail and Comments
Match / No Match Decisions Remembered

Decision and Comment


History

Issue Manager

Easy-to-use way of assigning review work


Links into any DQ Results
Supports email notifications
Simple issue tracking capabilities

Dashboard

Stakeholder Reporting of Data Quality KPIs in web browser UI


Supports results published from both Real-time and Batch processes
Graphical Trend Analysis
User-Configurable

Oracle Enterprise Data Quality Audit

Validate data against business rules


Publish results to data quality
dashboard

31

Profiling Understand the data first

Interactive
exploration of
data, identifying
distribution and
outlying values
with drill-downs
Identify and quantify
issues in data

Transformation Data Improvement


Fully configurable data transformation rules
Operates in both Batch and Real-Time
Full control over data updates
Original data always preserved (and all steps in between)
Source data may either be staged and processed or streamed through the process

Use profiling results to create


your own data improvement
rules

Use provided processors for


common tasks such as
address standardization

Oracle Enterprise Data Quality Standardization


Name: Dr Ellen Van Der Heijde

Title: Dr
First: Ellen
Last: Van Der Heijde
Gender: Female

Name: Mr RJ & Mrs FB MacDonald

Title: Mr
First: R
Middle: J
Last: MacDonald
Gender: Male

Title: Mrs
First: F
Middle: B
Last: MacDonald
Gender: Female

Standardize, Transform and Parse


Split names and name elements
Identify individuals and
businesses

Derive additional attributes

Name: Jalila Abdul-Alim (Do Not Call)

First: Jalila
Last: Abdul-Alim
Gender: Female
Note: Do Not Call

Matching

Designed for business users

Flexible matching engine for any data with many comparison algorithms
Provided template match processors for individual, entity and address matching
Easy reuse of configured match processors
Fully configurable outputs (Links, Groups, Master and Slaves, Best Record)
Operates in both Batch and Real-Time

Oracle Enterprise Data Quality - Matching

Title: Mr
First: Robert
Last: Fulmar
Gender: Male
DoB: 12/05/1978
Phone: 555-120-1329
Address:
9405 Main St
Fairfax
Virginia
22030

First: Bob
Last: Fulmar
Gender: Male
Email: chem291_rjf@barker.edu

Title: Dr
First: R
Last: Fulmer
DoB: 01/01/1978
Email: chem291_rjf@barker.edu
Address:
9407 Main Street
Fairfax
VA
22031-4001

Title: Dr
First: Robert
Last: Fulmar
Gender: Male
DoB: 12/05/1978
Email: chem291_rjf@barker.edu
Phone: 555-120-1329
Address:
9407 Main St
Fairfax
VA
22031-4001

Match & Merge data from disparate sources


Create best record based on survivorship rules

Reporting

Highly flexible reporting interface


- Export any Results views automatically to database/file
- 1-click export of results to Excel from the Director client

Dashboard reporting provides stakeholder view of Data Quality


KPIs with trend analysis
Example reports

Automatic Matches / Review Matches / Non-Matching Records


Match Group Size Report
Match Rule Report
Data Validity Report
Profiling Report
Etc.

Reporting - Immediate drilldown reporting in


Director
Results Browser

Reporting - Collect important results into Results


Books

Reporting - Output Results Books


Export Results Books as part of an automated job
1-click Export of a Results Book to Excel

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

The Product Data Problem Unstructured & Non-Standard

What is this?
10hp motor 115V Yoke mount
MOT-10,115V, 48YZ,YOKE
mtr, ac(115) 10 horsepower 115volts

This 10hp yoke mounted motor is rated for


115V with a 5 year warranty
10 Caballos, Motor, 115 Voltios

Item
Classification
Power
Voltage
Mounting

Motor
26101600
10 horsepower
115
Yoke

TEAO HP = 10.0 1725RPM 115V 48YZ YOKE MTR


Product data is much more variable and unpredictable than other data types

Motor, TEAO, 1725 RPM, 48YZ, 15 Voltios,


Montaje de Yugo, hp = 10

42

42

2011 Oracle Corporation

Oracle EDQ for Product Data Core Differentiation

Semantic-based

Semantic-based recognition targets meaning in context to


handle extreme variability of product data
Identifies exceptions for rapid learning

Scalable

Enforce standards across thousands of product categories


Enforce standards across many systems & processes

Integrated Governance

Integrated dashboards and data remediation


Improved oversight and exception management

Maintained by Business
User

IT does not have to write laborious, ever changing


programmatic rules
Improves process across all lines of business

43

2011 Oracle Corporation

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

Complexities of Address Verification

Addresses are incomplete


Items are misspelled
Abbreviations are incorrect
Incorrect postal codes assigned to locations
Items are in wrong fields city is really a street, for example
Non-address items are present
Address is not formatted correctly for the country
Character set is mixed or incorrect for country or language
Availability, reliability and completeness of reference data

2012 Oracle Corporation Proprietary and Confidential

45

EDQ Address Verification


EDQ Match and
Merge

Siebel UCM or
other App

EDQ Parse and


Standardize

Parse
Transliterate
Validate
Format

Add
latitude/longitude
coordinates

Verify

Geocode

Global Knowledge Repository Data Packs


EDQ Profile and
Audit

EDQ Address Verification Server

Verify Get the address correct


Worldwide address cleansing over 240 countries all populated countries on earth
The most advanced error-tolerant parsing algorithms

Geocode Attach a location to a correct address


Generates a latitude/longitude coordinate for any address worldwide
Leverages the most comprehensive multi-source geographical reference data

2012 Oracle Corporation Proprietary and Confidential

46

EDQ AV Loqate OEM


Unsurpassed
Coverage &
Performance

Breadth and depth Reference data for 240+


countries all populated countries on earth!
Multiple address and geocode data sources
Great throughput and scalability

Robust Parsing
and Validation

Corrects misplaced data in wrong fields


Unhindered by extraneous non address and
junk data

Ease of
Integration /
Time To Market

Pre-integrated with EDQ platform


Quick and easy installation
Platform independent

2012 Oracle Corporation Proprietary and Confidential

47

Program Agenda

Oracle Data Integration Solutions (DIS)


Data Quality
Introducing Enterprise Data Quality (EDQ)
EDQP EDQ for Product Data
EDQ AV Address Validation
Oracle Data Integrator (ODI) Integration

<Insert Picture Here>

Integrated Data Quality with ODI


Oracle Data Quality Runtime with Data Integrator
Best of breed quality

Oracle Data Integrator

ETL/E-LT Process
Sources

Parsing, Cleansing, Standardization,


Matching

Target

Proven, scalable DQ engine


Rich capabilities for cleansing,
standardization, validation,
match and merge
Extensible by customers

Out-of-box integration
ODI integrates with Quality
functions via pre-built ODI
OpenTool
Drag and drop graphical icon for
inserting DQ flows into ODI

Next Generation Data Warehousing


EDQ for ODI bulk data transfer to Exadata

Data Quality as part of DW bulk loading


Optimize data loading strategy
Ensures data loaded into is fit for purpose
Understand, improve, protect and control
data as part of integration process
Applies to both party (e.g. customer) and
product data
tx4

tx3

tx2

Oracle Data
Integrator

tx1

Non-Invasive Real TimeOracle


Transaction Feeds

GoldenGate
EMP

DEPT

DIM

Batch Feeds, Incremental


Updates and in-DB
transformations via ELT

DIM

FACT
50

EMP

DEPT

DIM

DIM

Das könnte Ihnen auch gefallen