Sie sind auf Seite 1von 6

Data Warehousing/Business Intelligence (DW/BI)

Testing A Myth
Charanjit Singh
Software Engineer
Accenture Services Pvt. Ltd.
Mumbai,India
c.singh@accenture.com


AbstractA DW (data warehouse) is a collection of data designed
to support management decision-making. The success of a
DW/BI (Data warehousing/Business Intelligence) program lies in
meeting its key objective of ensuring data accuracy (DW
construction) and providing a single version of the truth through
flexibility in analysis/reporting (presentation) which requires
specialized approach to test a warehouse. The testing process is
extremely complicated and important and hence should be
treated as a project unto itself. It is hence essential that we have a
framework or methodology in place for testing DW. This paper
covers importance of DW/BI testing, testing methodologies,
suggested framework, different types of DW Testing and
common challenges faced in DW/BI Testing space.
Keywords-Data Warehouse; Business Intelligence; Testing;
DW ; Framework; test execution; functional/non-functional
testing.
I. INTRODUCTION
A data warehouse is a collection of technologies aimed at
enabling the knowledge worker (executive, manager, and
analyst) to make better and faster decisions. It is expected to
have the right information in the right place at the right time
with the right cost in order to support the right decision. The
definition of Data Warehouse: A subject-oriented, integrated
nonvolatile, and time-variant collection of data in support of
managements decisions" [1]. DW tends to have these
distinguishing features: [2]
A data warehouse is organized around major subjects,
such as customer, supplier, product, and sales.
A data warehouse is usually constructed by integrating
multiple heterogeneous sources, such as relational
databases, flat files, and on-line transaction records.
Data are stored to provide information from a historical
perspective.
The success of a DW/BI program lies in meeting its key
objective of ensuring data accuracy (DW construction) and
providing a single version of the truth through flexibility in
analysis/reporting (presentation).
It is a common best practice for any DW/BI initiative to
define the level of data accuracy expected (also known as
tolerance level) from the DW; needless to say, this varies from
application to application.
Data passes through several processes of churning
across the various layers depicted in Fig. 1 The root cause for
data inconsistency and/or inaccuracy can occur in any of these
layers, resulting in an adverse impact to the program's primary
objective. Examples of data errors include: [3]
Large volume of duplicate data or incomplete data
extracted from source systems
Incorrect cleansing (e.g., use of incorrect codes)
Incorrect integration of data from multiple sources.
Incorrect mapping of dimensions in the cube

Figure 1. Data Consolidation in DW Architecture
The testing process is extremely complicated and important
and hence should be treated as a project unto itself.
When you consider the inputs, processes, and outputs of a
source-to-target data warehouse, it becomes clear exactly how
much testing is required. Its not only those too many
components but also their integration that needs to be tested. It
is hence essential that we have a framework or methodology in
place for testing DW.
II. WHY IS DATA WAREHOUSE TESTING IMPORTANT?

How sure are you that the Information from your Data
Warehouse is Correct, Complete and Accurate? You recently
decided to implement a Data Mart, a Data Warehouse or a
Business Intelligence environment as a base for business
critical decisions.
Your concerns:
How can I be sure there are no anomalies in the huge
amount of migrated data?
Are decisions based on correct data?
Is security sufficient on each user level?
Does it comply with my needs?

Testing IT systems assures you as a business or IT
responsible that your BI solution avoids that these concerns
becomes issues [4]
As much as I would want you to believe that testing a data
warehouse is a wondrous and mysterious process, it's really not
that different than any other testing project. DWH Testing
should be done in parallel with the DWH-Development life
cycle (V-model approach) in every test area.
We specify six test areas within two domains: the Back
Room (IT related, hidden for users) and the Front Room
(visible for users) as shown in Fig. 2
Back room process checks for data accuracy.
Even if data is accurate in the Data Warehouse, the
presented and delivered data for different users must
also be checked in the Front Room Processes
III. DATA WAREHOUSING TESTING V/S APPLICATION
TESTING
Why and how is testing for DW/BI different from testing
for other technologies? Part of the answer lies in definition of
what constitutes DW/BI.
BI may be defined as "the result of in-depth analysis of
detailed business data; includes database and application
technologies as well as analysis practices." [5]. BI is a broad
category of application programs and technologies for
gathering, storing, analyzing and providing access to data to
help enterprise users make better business decisions.
Following is a set of reasons why testing for data
warehouse is a subject of specialization
User-Triggered v/s System triggered: Most of the
production/Source system testing is the processing of
individual transactions, which are driven by some input
from the users. This is highly contrasting to the
scenario in a warehouse where, most of the testing is
system triggered as per the scripts for ETL
('Extraction, Transformation and Loading').
Batch v/s Online Gratification: Unlike a transaction
system where the transactions are processed online,
data warehouse is loaded in a batch mode (most of the
time) and most of the action happens in the back-end.
Volume of Test Data & Possible scenarios: The test
data in a transaction system in the extreme scenario is a
very small sample of the overall production data.
Whereas, a Data Warehouse has large volumes of test
data as one tries to fill-up maximum possible
combinations and permutations of dimensions and
facts.
Test Scenarios: Due to large volume of data and
complex transformations logic one has to be creative in
designing the test scenarios to gain a high level of
confidence.
Test Data Preparation: This is linked to the point of
possible test scenarios and volume of data. Given that a
data- warehouse needs lots of both, the effort required
in preparing the same is much more.
Test Script Preparation: In case of transaction systems,
users/business analysts typically test the output of the
system. However in case of data warehouse, as most of
the action is happening at the back-end, most of the
'DW data Quality testing' and 'Extraction,
Transformation and Loading' testing is done by
running separate stand-alone scripts. These scripts
compare pre-Transformation to post Transformation.



Figure 2. Testing Techniques in DW Architecture
IV. HOW TO TEST A DATA WAREHOUSE: A BRIEF
SYNOPSIS
A. Analyze Source Documentation
As with many other projects, when testing a data warehouse
implementation, there is typically a requirements document of
some sort. These documents can be useful for basic test
strategy development. Many times there are other documents,
known as source-to-target mappings, which provide much of
the detailed technical specifications.
B. Develop Strategy and test plans
As you analyze the various pieces of source documentation,
you'll want to start to develop your test strategy. I've found that
from a lifecycle and quality perspective it's often best to seek
an incremental testing approach when testing a data warehouse.
This essentially means that the development teams will deliver
small pieces of functionality to the test team earlier in the
process. The primary benefit of this approach is that it avoids
an overwhelming "big bang" type of delivery and enables early
defect detection and simplified debugging.
C. Test Development and Execution
Depending on the stability of the upstream requirements
and analysis process it may or may not make sense to do test
development in advance of the test execution process. If the
situation is highly dynamic, then any early tests developed may
largely become obsolete. In this situation, an integrated test
development and test execution process that occurs in real time
can usually yield better results. For example, a few data
warehouse test categories might be:
Record counts
Duplicate checks
Referential data validity
Referential integrity
Error and Exception Logic
Incremental and historical process
Control column values and default values
Control column values and default values
V. DW/BI TESTING METHODOLOGY
A typical DW/BI project can be viewed as comprising two
broad components as depicted in Fig. 3: constructing the DW
using extract, transform, load (ETL) technologies and
presenting of the same for analysis purposes with online
analytical processing (OLAP) technologies. [6]
The testing strategy of any data warehouse needs to be
comprehensive in terms testing both of these technologies.
It is essential to define a framework that carries the extent,
scope and approach to DW/BI testing.

Figure 3. DW/BI Project Components
Another key data warehouse test strategy decision is the
analysis-based test approach versus the query-based test
approach.
The pure analysis-based approach would put test
analysts in the position of mentally calculating the expected
result by analyzing the target data and related specifications.
The query-based approach involves the same basic
analysis but goes further to codify the expected result in the
form of a SQL query. This offers the benefit of setting up a
future regression process with minimal effort.
Fig.4 illustrates various test stages in the DW test
execution process. It is essential that the ETL and OLAP
processes go though all these testing stages except that the
integration testing is not relevant to OLAP. All the testing
stages needs to executed/performed by the testing team except
the user acceptance testing which will be performed by the
business users.
A. Suggested Framework
It is essential to define a framework that carries the
extent, scope and approach to DW/BI testing. Fig. 5 depicts a
suggested framework.
Fig.5 for framework would be comprised of assets/job
aids that facilitate efficient planning and execution of DW/BI
testing, such as
SQL queries against source and target databases
(varying)
SQL queries to compare data at each stage of
transformation (varying)
Custom-built, reusable test utilities (e.g., Excel
macros) to populate data from source systems and
reports automate comparison and flash data errors.
Templates to track defects/test results.
Test artifacts: test strategy, test plan and test cases; a
common and largely re-usable templates of these
documents can prove handy in gaining speed in
initiating testing for new functional
areas/reports/projects



Figure 4. DW/BI Test Execution Process

Figure 5. DW/BI Testing Framework
VI. TYPES OF DW/BI TESTING
On a very high level, DW Testing can be classifies as below.
A. Functional testing:
As the name suggests this deals with the functionality of the
application and it mainly revolves around data in the
warehouse. It included Testing of business rules, data
integration from multiple sources, business scenarios etc.
Business Requirement documents, Design documents, data
models, Use case documents are the basis for this.
The test phases in functional testing include
Unit Testing
System Testing
Integration Testing
User acceptance Testing.
B. Non-Functional testing:
Non-functional testing deals with things like Load,
performance of the processes, security of the data in the
warehouse.
The test phases include
Performance Testing
Security Testing.
C. ETL Testing
Extract Transform Load is the process to enable
businesses to consolidate their data while moving it from place
to place
Extract - The process of reading data from a source
file.
Transform - The process of converting the extracted
data from its previous form into the form it needs to be
in so that it can be placed into another database.
Load - The process of writing the data into the target
database.
Key Areas of ETL Testing
Constraint Testing: The objective is to validate unique
constraints, primary keys, foreign keys, indexes, and
relationships.
Source to Target Counts: The objective of the count
test scripts is to determine if the record counts in the
source match the record counts in the target.
Source to Target Data Validation: No ETL process is
smart enough to perform source to target field-to-field
validation. This piece of the testing cycle is the most
labor intensive and requires the most thorough analysis
of the data.
Field-to-field testing is a constant value being
populated during the ETL process? It should not be
populated unless it is documented in the requirements
and subsequently documented in the test scripts.
Transformation and Business Rules: Tests to verify all
possible outcomes of the transformation rules, default
values, straight moves and as specified in the Business
Specification document.
Views: Views created on the tables should be tested to
ensure the attributes mentioned in the views are correct
and the data loaded in the target table matches what is
being reflected in the views.
Sampling: Sampling will involve creating predictions
out of a representative portion of the data that is to be
loaded into the target table; these predictions will be
matched with the actual results obtained from the data
loaded for business Analyst Testing.
Performance: It is the most important aspect after data
validation. Performance testing should check if the
ETL process is completing within the load window.
Volume: Verify that the system can process the
maximum expected quantity of data for a given cycle
in the time expected.

D. OLAP Testing
On Line Analytical Processing provides an advanced
analysis environment that enables decision making and
business modeling. The OLAP tools are used for collecting,
managing, processing and presenting multidimensional data for
analysis and management purposes [7]
Key Areas of OLAP Testing

Data Cube Validation: The data cube should be tested
for correctness, completeness and validity. This
includes verifying the objects, joins, cardinality, SQL
parsing etc.
Look and Feel of the Report: This test is to ensure that
the reports are according to the standards. It is better to
pre-define a standard report format which can be used
as a basis for all the reports (if not already present).
Fonts, colors, charts etc must be displayed as per
standards.
Data Analysis using Drill feature: The testing of this
feature ensures that the large amount of analytical data
that are available at a high level in the reports can be
dynamically analyzed using the drill functionality.
Report Level Security: This testing is mandatory to
ensure that there is no unauthorized access to any of
the reports. Access can be provided to users based on
their role, department, region etc.
Data Level Security: The same report can be viewed
by different users with restriction on the data viewed.
VII. CHALLENGES
There are many challenges to the development of the
specialized skills required for DW/BI testing:
Lack of awareness. As a general practice, testers plan
their career in such a way that they specialize and
equip themselves with technical skills for the tools
involved in test execution (e.g.QTP) and/or test
management (e.g., Quality Center), with very little
endeavor to develop skills in the underlying
technology. A good understanding of ETL/OLAP tools
and technologies is an essential skill for DW/BI testing
Absence of tools. The DW/BI marketplace is flooded
with many tools and vendors, each attempting to
replace the other in the three layers of DW/BI:
database, ETL and OLAP. There are no popular
ETL/OLAP testing tools in the market that offer
features for automated testing or functional testing..
Lack of standard approach/methodology. While
standard methodologies exist for testing as a whole,
there seems to be no industry-wide view on the
suggested approach and/or methodology for DW/BI
testing. An ideal methodology should include a test
strategy, a test plan and test cases that cover thorough
testing of the various phases of data movement
Creating test cases and test data that provide adequate
coverage to each of the phases is very critical for
ensuring a comprehensive quality assurance (QA) of
the DW
VIII. SUMMARY
Testing for DWBI is a niche skill that demands a good
blend of ETL / OLAP technical skills (or the least a
good understanding of them) and thorough testing
skills.
Unlike other technologies, there are no tools currently
available that can be used for DWBI Testing.
In the absence of such tools, it is essential to define and
develop a framework for DWBI Testing that
comprehensively covers the various layers and stages
of data transformation.
IT Services firm need to encourage their work forces to
adopt this as a preferred skill and promote ways to
advancement of these skills.
Consolidation of ETL / OLAP tool vendors could
prove to be the beginning for development of DWBI
Testing tools.
ACKNOWLEDGMENT
The author gratefully acknowledges the support from
Accentures Data warehouse /Business Intelligence Capability
and Testing capability for knowledge support and productive
discussions.

REFERENCES

[1] W.H.Inmon,John Wiley & Sons,"Building the Data Warehouse",2002
[2] Paulraj Ponniah-"Data Warehousing Guide for IT Professionals"
[3] Ralph Kimball, Laura Reeves, Margy Ross,Warren Thornhwaite "The
Data Warehouse Lifecycle Toolkit",2007
[4] "CTG Data Warehouse testing" 2004,unpublished
[5] Sid Adelman, Joyce Bischoff, Jill Dych, Douglas Hackney, Sean
Ivoghli,"Impossible Data Warehouse Situations: Solutions from the
Experts", Page 347,2003.
[6] Ralph Kimball, Laura Reeves, Margy Ross, Warren Thornhwaite ,"The
Data Warehouse Lifecycle Toolkit", Page 110,2007
[7] Devlin, Barry. Addison-Wesley ,"Data warehouse: from architecture to
implementation",1997

Das könnte Ihnen auch gefallen