You are on page 1of 36

How to Improve IQ in the Boardroom : Framework for Managing Information Quality

October 21, 2008 Bratislava

Dr. Jan Mrazek President Adastra Group jan.mrazek@adastragrp.com

Lack of Data Quality Effects on Credibility

Wrong and Misguided Business Decisions Failure to meet regulatory requirements Negative impact on customer relationships High costs of reworks and fixes All in all: negative impact on bottom line

Additional Data Quality Impacts


Lack of confidence in the data and inability to act on it Much effort and resources required to reconcile numbers Low adoption by users Inability to fully leverage previous investments Inability to compete in timely fashion Difficulty to justify additional investments Difficulty in reconciling results (to source, to GL, etc) Instability of processes which translates into:
More failures Higher costs Lower availability of information to the users Longer time to market to deliver solutions

Data Governance

Technical

Business

High demands from the Data Governance structure and Data Stewards Potential for them to become a bottleneck or simply be ineffective

Some real life examples

Sub-prime mortgages ABCPs CDOs

Case study: IQ Audit EE bank acquired by large multinational bank


Business impact modeling on retail segment portfolio only
Biz area Issue area
Missing or invalid customer contact information (address, postal code, telephone) Missing single customer view, historical data, errors in target variables and predictors for data mining (cross-sell, up-sell, attrition management) Collections Missing or invalid customer contact information (address, postal code, telephone) Collateral data not linked to contract, not reevaluated Underwriting/ Provisioning Operational efficiency Missing single customer view, missing historical data, repayment schedules, contact data, tax code validity, collateral data Missing data governance, business involvement / definitions / metadata

Per item cost


EUR 960,000 per campaign EUR 0.63 1.7 mil per campaign

5-year cost (exc. NPV calculation)


EUR 19.2 mil

Marketing/Sales

EUR 23.3 mil

n/a Inability to stress test n/a n/a

EUR 3.4 mil for PI loans only Est. several million EUR EUR 1.65 mil for PI loans only Est. 40% of IT budget

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to Data Profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

Adastra IQ Assessment
Thoroughly investigates current data processes Delivers an objective Information Quality scorecard based on a quantitative assessment of organizations data and information environment Allows organizations to compare industrys best practices against their companys IQ standards and implementations Allows organizations to assess improvements in the quality over time Is the first step towards IQ ongoing improvements and establishement of IQ conscious culture Optional Business Impact Analyses

Adastra IQ Assessment - Milestones


Data Capture
Data Entry Standards On-Line Entry Edits On-Line Validation (Timeliness) On-Line Validation (Process) Data History Availability External Data Sources Change Management Processes Policy and Procedures

Source Data
Domain Analysis Business Rules Analysis Nulls / Blanks Analysis Uniqueness Analysis Relationships Analysis Pattern / Mask Analysis Non-Standard Data Analysis Change Management Processes Policy and Procedures

Data Profiling and Modeling


Domain Analysis Object Definition Analysis Verification of Data Definitions Event Trigger Analysis Entity Integrity and Referential Integrity Analysis Change Management Processes Policy and Procedures

Data Extraction, Transformation & Loading


Application of Business Rules Table Extract / Views Analysis Data Quality Validation of ETL ETL Tool Usage Source Data Management Data Auditing and Issue Resolution Data Integration Change Management Processes Policy & Procedures

IQ Processes
Data Stewardship On-Going Data Validation Analysis Measurement of Data Quality Improvement Error Discovery Analysis Data Reconciliation Analysis Linkage of Quality and Reward Project Management Methodologies Change Management Processes Policy & Procedures

Assessment Scorecard Measures


Industry Leading

81 100

Processes and procedures are aligned with industry best practices to improve and maintain information quality. All methodologies are clearly documented and enforced through data stewards to which accountability is held for information quality. Most processes and procedures are aligned with industry best practices to improve and maintain information quality. All methodologies are clearly documented and enforced through data stewards to which accountability is held for information quality. Most processes and procedures are aligned with industry best practices to improve and maintain information quality. All methodologies are clearly documented and enforced through respective owners. Accountability and responsibility for information quality is not firmly assigned. Processes and procedures are in place to improve and maintain information quality. Methodologies are sparsely documented and enforced through respective owners. Accountability for information quality is unassigned. Few processes and procedures are in place to improve and maintain information quality. Set methodologies are undeveloped to streamline these processes. Accountability for information quality is unassigned.

Top Tier

61 80

Acceptable

41 60

Some Improvement Warranted Significant Improvement Needed

21 40

1 - 20

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to Data Profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

DQ Tools
Many tools on the market I makes good sense to pay attention to the following aspects:
Data Connectivity Data Profiling Features Validation and Measurement Reporting Features Metadata Specific Capabilities Performance Security Infrastructure Compatibility Usability Completeness of Vision Total Cost of Ownership

Example of profiling output - Ataccama


Profiling output attribute TAXCODE
scope all records
Extreme Values

total inspected 4 163 217


Frequency Analysis

percentage of valid 92.49%

count of invalid 312 781

Data Quality Record Grading

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to data profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

DQ ETL Components
Adastra has developed a comprehensive set of ETL components designed to address DQ related data processing within ETL workflows Our ETL component suite integrates with three major ETL vendors (Informatica, IBM, Ab Initio) The list of components include:
Operational Reconciliation Data Validation Error Logging

Operational Reconciliation
As a data extract is received, it is necessary to verify that such data is indeed valid for processing Our generic components perform the following verifications:
The extract follows the order of execution The extract is not a previous repeat The extract is complete The extract belongs to the expected period The extract truly contains the number of records produced by the source system There were no transmission errors or inappropriate data manipulations performed on the extract

These components are performed immediately upon the landing of the extract, in order to allow any serious data issues to be addressed timely and outside of the critical window

Data Validation
Utilizing Data Profiling tool to generate Data Validation rules This saves time and guarantees a proper flow and integrity of processes In the absence of this capability, we have generic components that are metadata driven and are leveraging confirmed knowledge obtained from Data Profiling process Alternatively the same can be achieved by a more sophisticated technologies (i.e. Ataccama Data Quality Center) in a fully automated, controlled and metadata driven way outside of the standard ETL process

Error Logging
In the event that any data quality issues are identified from the previous step, those are logged This is a fundamental step to support the following aspects:
Document all data quality issues before any data cleansing is performed (e.g.: applying a default) Support measurements and reporting Support feedback to the source systems for possible data corrections

The error logs are persistent in database tables

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to data profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

DQ Management Cycle

DQ Reporting Architecture
Data Sources
Data Quality
Engine
Cleanse

RDBs and Flat Files

Reports via web

Match Enrich
Scorecard/Monitor

Mainframe and Midrange

Metrics Repository Workbench

Packaged Applications

Drilling Down on DQ Dashboards


Different users need to look at Data Quality at a different level As such, it is imperative to have the flexibility to serve those needs and be able to report Data Quality Scores across a series of dimensions Each user group commonly has a single view or a limited view in terms of analytics. However, below one can see the full potential of such drill down capability:
by by

Exception Reporting is also available and will notify the respective user groups when specific events happen, such as Data Quality going below a specific threshold

Data Quality Dashboard

High level overview of key data areas A management tool for DQ Manager/Business Sponsor of DQ Program Based on defined Business rules Tracking against defined KPIs

Data Quality KPIs for one (business) data entity

A management tool for Data Steward/Business Owner of a given data entity Compound KPIs in selected categories User-defined KPIs, report structure and target DQ levels Using pre-defined business rules

Detailed Analysis of a DQ KPI - Address Validity

Analytical tool for Data Stewards/ Technical Analysts Detailed breakdown of a given KPI Allows Data Stewards/ Data Quality Managers to take action

Detailed Analysis of a DQ KPI - Address Consistency

Allows Data Stewards/ Data Quality Managers to take action, eg.: To apply validation on input Change existing business process Correct a malfunctioning ETL/interface from one of the systems, etc

Business Rules define Data Quality

Ability to support complex, hierarchical business rules Configurable by users, no Coding High performance to execute defined rules on tens of millions of records Ability to execute business rules in realtime

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to data profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

MDM Hub Architectural Approaches


There are 4 major architectural approaches for implementing an MDM Hub as part of the enterprise architecture

Consolidation (Central Master) Registry Coexistence Transaction Hub


They vary by: Amount of master data attributes stored in a central location Data latency between operational systems and master repository Degree of synchronization of master data across the enterprise Impact on operational systems data storage and functionality

Consolidation Approach
Transaction Systems MDM Hub Solution
Data Quality Metadata Cleansing Front End Standardization Identification Unification

Accounting

Dictionaries Etalons

Master Data Repository

Billing

Data Integration (ETL, EAI, SOA)

The master data is physically stored in a central repository It is cleansed, standardized, de-duplicated and unified in a batch mode The master data repository forms a golden record for all downstream systems The master data will be as current as the latest batch run The operational systems continue to maintain their version of the master data With this approach only the downstream processes benefit from the master data. This may include reporting, analytics, marketing campaigns, data mining, etc. Often the consolidation approach is implemented as an extension of the EDW environment Data Marts

CRM

ERP

Operational Data Store

Enterprise Data Warehouse

Data Integration

Registry Approach
Transaction Systems MDM Hub Solution
Data Quality Metadata Cleansing Front End Standardization Identification Unification

Accounting

Dictionaries Etalons

Master Data Repository

Billing

Data Integration (ETL, EAI, SOA)

Only the master data identifiers are stored in the repository together with their relationships and deduplication groups The rest of the master data is stored in its original location in the operational systems The MDM Hub maintains a set of rules for reconstructing and assembling the master data at runtime The master data retrieved is always up to date The performance may not be good for large amounts of master data accessed due to runtime data federation The operational systems continue to maintain their version of the master data With this approach only the downstream processes benefit from the master data. This may include reporting, analytics, marketing campaigns, data mining, etc. Data Marts

CRM

ERP

Operational Data Store

Enterprise Data Warehouse

Data Integration

Coexistence Approach
Transaction Systems MDM Hub Solution
Data Quality Metadata Cleansing Front End Standardization Identification Unification

Accounting

Dictionaries Etalons

Master Data Repository

Billing

Data Integration (ETL, EAI, SOA)

The consolidation often evolves in a coexistence approach The master data is physically stored in a central repository. It is cleansed, standardized, de-duplicated and unified in a batch mode. The master data will be as current as the latest batch run The master data repository forms a golden record for all downstream systems and some upstream systems The difference with the consolidation approach is that the data is published and some of the operational systems may synchronize their data with the master data The master data is synchronized across multiple systems

Data Marts CRM

ERP

Operational Data Store

Enterprise Data Warehouse

Data Integration

Transaction Hub Approach


Transaction Systems MDM Hub Solution
Data Quality Metadata Cleansing Front End Standardization Identification Unification

Accounting

Dictionaries Etalons

Master Data Repository

The master data is physically stored in a central repository. It is cleansed, standardized, de-duplicated and unified in a batch mode as well as at runtime The master data repository forms a golden record for all downstream systems and some upstream systems Some of the upstream systems give up the maintenance of the master data to the MDM Hub. They directly access the transaction hub for all master data management

Billing

Data Integration (ETL, EAI, SOA)

Data Marts CRM

ERP

Operational Data Store

Enterprise Data Warehouse

Data Integration

Practical steps where to start and what to do

Determine where you are IQ Assessment Get used to data profiling Revisit your down-stream ETL processes Automate Data and Business Logic Profiling Implement in gradual stages MDM Architecture Enhance your MDM by more general Rule-based Engine capable to handle transactions in real time

Extended MDM & IQM Architecture


Transaction Systems MDM & IQM Hub Solution
Data Quality Metadata Cleansing Front End Standardization Identification Unification

Accounting

Dictionaries Etalons

Master Data Repository

Billing

Data Integration (ETL, EAI, SOA)

Data Marts CRM

ERP

Operational Data Store

Enterprise Data Warehouse

Data Integration

Thank You

CANADA

CZECH REPUBLIC

SLOVAKIA

GERMANY

BULGARIA

Adastra Corporation Le Parc Office Tower 8500 Leslie St. Markham Ontario, L3T 7M8 Canada info@adastracorp.com www.adastracorp.com

Adastra, s.r.o. Nile House Karolinsk 654/2 Praha Czech Republic info@adastra.cz www.adastra.cz

Adastra, s.r.o. Francisciho 4 Bratislava Slovakia info@adastracorp.sk www.adastracorp.sk

Adastra GmbH Bockenheimer Landstrasse 17/19 Frankfurt a. Main Germany info@adastracorp.de www.adastracorp.de

Adastra Bulgaria EOOD 29 Panayot Volov str., 5th floor Sofia Bulgaria info@adastracorp.com www.adastracorp.com