Sie sind auf Seite 1von 4

Feature

Achieving Data Warehouse Nirvana


The Critical Role of Information Controls

Christopher Reed is solution Would one buy a house when the stability of Root Causes of Information Quality Issues
consultant at Infogix Inc., the foundation is uncertain? Would one make a While several factors can be attributed to the
leading solution consulting payment if the accuracy of the bill is in question? information quality issues, the following are the
efforts. He works with Fortune If the answer is no, then why would any major causes of information errors within data
500 companies to assist in organization settle for making business decisions warehouses:
the creation of information based on inaccurate and inconsistent data • Changes in the source systems—Changes in
control solutions throughout warehouse information? A number of studies1, 2, 3 the source systems often require code changes
the enterprise. In addition to show that much of the data warehouse in the ETL process. For example, the ETL
his work at Infogix, Reed was information available to business users is not process corresponding to the credit risk data
an architectural consultant at accurate, complete or timely. Despite significant warehouse in a particular financial institution
Unisys, where he consulted investment in data warehouse technologies and has approximately 25 releases each quarter.
with customers on deploying efforts to ensure quality, the trustworthiness Even with appropriate quality assurance
mission critical applications. of data warehouse information at best remains processes, there is always room for error. The
questionable.4, 5 Current approaches to restore following list outlines the types of potential
Yaping Wang, CISA, PMP, trust in data warehouse information are often errors that can occur because of changes in the
is product consultant at heroic efforts of the individuals responsible for ETL processes:
Infogix, where she leads the data warehouse and include: – Extraction logic excludes certain types of data
client service projects that • Manual or semiautomated balancing, tracking that were not tested.
provide assessment, advisory, and reconciliation to prove accuracy – Transformation logic may aggregate two
implementation and other • Ad hoc queries of data sources to support different types of data (e.g., car loan
services in automated “audit needs” and boat loan) into a single category
information control domains. • Extensive research and remediation to identify, (e.g., car loan). In some cases, transformation
diagnose and correct issues logic may exclude certain types of data,
Angsuman Dutta is unit These approaches provide short-term respites resulting in incomplete records in the data
leader of the customer but are not sustainable in the long run. The warehouse.
acquisition support team at increased labor cost for manual processes and – Similar issues are also observed with the
Infogix. Since 2001, he has the high processing cost for reruns when errors loading process.
assisted numerous industry- are identified late in the process increase ongoing • Process failures—Current processes may
leading enterprises in their operational costs. The cumbersome and costly fail due to system errors or transformation
implementation of automated processes for supporting audit needs also create errors, resulting in incomplete data loading.
information controls. organizational stress. Frequently, a large number System errors may include abends due to the
of data warehouse projects are abandoned unavailability of source system/extract or the
because of the high costs of efforts to ensure incorrect format of the source information.
information quality.6 Transformation errors may result from
While standardized tools, such as those for incorrect formats.
extraction, transformation and loading (ETL) • Changes/updates in the reference data—
and data quality processes, solve part of the Outdated, incomplete or incorrect reference
problem, there is an urgent need for adopting a data will lead to errors in the data warehouse
systematic approach for establishing trust in data information. For example, errors in the sales
warehouse information. The proposed approach commission rate table may result in erroneous
outlines a framework for ensuring the integrity of calculation of the commission amount.
data warehouse information by using end-to-end • Data quality issues with the source system—
information controls. The source system’s data may be incomplete or

ISACA JOURNAL VOLUME 4, 2010 1


inconsistent. For example, a customer record in the source error resolutions when controls detect errors. Such requests
system may have a missing zip code. A similar source system are often met by querying a myriad number of log files,
related to sales may use an abbreviation of the product e-mail chains and data warehouse tables. This increases the
names in its database. Incompleteness and inconsistency workload of the data warehouse resources and increases the
in source system data will lead to quality issues in the data rift between audit and data warehouse teams.
warehouse. Current approaches are not scalable and sustainable. There
is an urgent need to use automated information controls
Current Approach and Cost of Quality for verifying, balancing, reconciling and tracking the data
The current focus in most data warehouse initiatives is to warehouse information. Ideally, information controls should
use ETL tools to standardize the data transfer process and to be independent of the underlying application and should have
use data quality solutions to detect and correct incomplete the ability to store an audit trail of the information transfer
and inconsistent data. While these efforts result in significant process and its validation results.
improvements, data warehouse teams rely on a number of
manual/semiautomated processes to balance and reconcile Three Pillars for Ensuring Data Warehouse Quality
the data warehouse information with the source system Successful and cost-effective data warehouse quality initiatives
information. Some of the techniques currently used by various in Fortune 500 organizations are founded on three critical
organizations are: pillars, as shown in figure 1.
• Developing an independent script that compares the record
count and amount information from the source system Figure 1—Three Critical Pillars for Data Warehouse Quality
with the record count and amount information from the
data warehouse. These scripts are often executed either on
an ad hoc basis or scheduled to run after the data load is
complete.
• Creating a control table and then populating the control
table with the totals from the data warehouse and the source
system as part of the ETL process. Checks are performed
after the completion of the load process.
While these methods are somewhat effective in detecting
the errors, they rely heavily on the ETL process, which is often
the source of the error. More important, these approaches
are not effective when the transactions from source systems
are either split into several transactions or combined into a
single transaction. Such scenarios require advanced logic for
balancing the information between the source system and
data warehouse. In addition, the inability to reconcile detail-
level information using scripts or ETL processes does not • Data quality (DQ) tools—Identify, correct and standardize
allow users to pinpoint the exact issue, resulting in significant incomplete and inconsistent source system information prior
manual research and resolution efforts. In addition to the high to loading to data warehouses. The focus of these solutions
operational costs related to research and reruns to ensure primarily has been on validating customer addresses and
quality, the current approach impacts the morale of the data product names. These solutions often do not address the
warehouse team and the confidence of the business users. quality issues of the financial transactions.
The problem of data quality exacerbates when the data • ETL tools—Extract and transform source system
warehouse information is used for storing and reporting information and load it into the data warehouse. The
financial information. In this scenario, internal audit requests primary focus has been standardizing and increasing the
evidence of controls’ operation and documentation related to efficiency of the data transfer process.
2 ISACA JOURNAL VOLUME 4, 2010
• Information control (IC) solutions—Verify, balance, balance the total amount and the amounts at the record key
reconcile and track information as the source system data level. The control should also be able to verify that the data
traverse through various points in the ETL process to the being loaded to the data warehouse are not duplicates and
data warehouse. The focus has been to independently are within the set thresholds (i.e., a source file on average
ensure the accuracy, consistency and completeness of contains 1,000 records and has a total amount of US $2.5
the information at both an aggregate and transaction level. million with a tolerance of +/- 10 percent). A notification
Information controls not only balance and reconcile the should be sent if the tolerance is violated.
data before and after the load, but also can be expanded 2. Control X2, verification between feeds that the data are
outside the scope of the data warehouse to ensure that the accurate and complete—Ensure that the related source
data warehouse information is aligned with other critical feed information is consistent. For example, if one feed
applications such as the general ledger (GL). For example, consists of credit card payment information and another feed
although the same journal systems may feed both the data consists of account credit information based on payment
warehouse and the GL, manual adjustments in the GL system information, there needs to be a control to validate the
may cause an out-of-sync condition that could be detected consistency between these two feeds (i.e., validate that the
early if an automated information control is in place. In payment information can be reconciled with the
addition, automated information controls store the audit trail credit information).
information about the control actions and the resolutions in 3. Control X3, validation that the ETL process is accurate
case of exceptions. Figure 2 compares ETL and DQ tools with and complete—The control should monitor transactions
IC solutions from various aspects. and processes, e.g., source to ETL, data warehouse to
data mart. Validate adherence to all process dependencies.
Information Controls Framework for Data Warehouse Automated independent controls could also be used to
The proposed framework recommends a minimum of six automate ETL testing.
information controls to achieve the objectives of the data 4. Control X4, verification within the data warehouse that
warehouse quality initiatives. The locations of the information information is consistent—Many data warehouses do not
controls are depicted in figure 3. The six controls are: enforce referential integrity. Changes in the data update
1. Control X1, data warehouse to source system validation— process by downstream applications can result in data
Ensure that the data warehouse information can be discrepancies. Independent controls should be used to
balanced and reconciled with the source system. In addition ensure referential integrity is maintained by reconciling
to validating the number of records, controls should relevant information.

Figure 2—ETL Tools, DQ Tools and IC Solutions Comparison


Data Quality Tools ETL Tools Information Control Solutions
Primary use Provide an efficient means to profile and Provide an efficient means to extract, Provide an efficient means to balance,
cleanse data warehouse information transform and load information to the track and reconcile data warehouse
either before loading or after loading. source system. information with upstream and
downstream systems.
Benefits Work with a number of data types. Mostly Extract data from many sources and Read data from a number of sources.
work with one data source at a time. many formats.
Extensively use master data lists. Provide end-to-end independent control to:
Include a wide range of transformation • Verify information
Detect and clean data errors as they capabilities. • Balance information
relate to name, addresses and product • Reconcile information
names. Efficiently load data to a number of • Track information
databases.
Audit support Point-in-time audit trail Limited audit trail End-to-end detailed audit trail
Visibility support Provide visibility to IT. Provide visibility to IT. Provide visibility to business and audit.

ISACA JOURNAL VOLUME 4, 2010 3


Figure 3—Locations of the Six Information Controls
Data Mart

X1
X5
App #1

Data X3

X2 Staging/ETL X4

App #2
Data Warehouse
Data

X6
X1

Legend General Ledger


X# Control Number

Data Capture 1 Data Capture 5


Data Capture 2 Data Capture 6

5. Control X5, assurance that the data balance with information accuracy within the data warehouse and across
downstream applications or data marts—Ensure that the enterprise. Successful organizations expand the scope of
the data warehouse information can be balanced and information controls beyond the scope of the data warehouse
reconciled with the downstream processes. by developing a companywide program for ensuring
6. Control X6, validation between parallel systems and the the enterprise information quality. With an appropriate
data warehouse—Data warehouse information can also selection of tools and frameworks for information controls,
reside in other systems. For example, loan information organizations can achieve the elusive goal of having higher-
resides both in the GL and the credit risk data warehouse. quality enterprise information assets.
It is important to reconcile the information in the parallel
system with the data warehouse information. In the absence Endnotes
of such a control, the loan information in the financial 1
English, Larry; Improving Data Warehouse and Business
reports, generated from the GL system, may become out Information Quality, Wiley and Sons, USA, 2000
of sync with the loan information used for estimating the 2
Eckerson, Wayne W.; Data Quality and the Bottom Line,
capital requirements for Basel II. TDWI research series, USA, 2001
3
Friedman, Ted; Data Quality “Firewall” Enhances Value of
Conclusion the Data Warehouse, Gartner Report, USA, 2004
With the accelerating changes in the source systems to 4
Violino, Bob; “Do You Trust Your Information?,” The
support business needs, increasing reliance on data warehouse Information Agenda, 23 October 2008
information for critical business operation and decisions, and 5
Computer Sciences Corp., Technology Issues for Financial
an expanding (and ever-changing) array of regulations and Executives, USA, 2007
compliance requirements, the use of automated information 6
Gupta, Sanjeev; “Why Do Data Warehouse Projects Fail?,”
controls is no longer an option; it is the only way to ensure Information Management, 16 July 2009
4 ISACA JOURNAL VOLUME 4, 2010

Das könnte Ihnen auch gefallen