Sie sind auf Seite 1von 6

An Introduction to Data Warehouse Testing

New Zealand Tester Magazine Wayne Yaddow

Data warehouse testing is on the increase and qualified testers are in demand. The reasons are
clear. Prominent among them is the increase in business mergers, data center migrations,
compliance regulations, and management's increased focus on data and data driven decision
makings related to business intelligence (BI) initiatives. Among data warehouse testing focus is
the ETL process, BI engines and applications that rely on data warehouses.

Data-driven decisions are proving to be accurate and they must be. In this context, testing data
warehouse implementations have become of utmost significance. Organizational decisions
highly depend on the enterprise data in data warehouses and must be of utmost quality. Complex
business rules and transformation logic are built using ETL logic, demand diligent and thorough
testing. This article addresses challenges for DWH testing like voluminous data, heterogeneous
sources, temporal inconsistency and estimation challenges.

Preparing for the data warehouse testing process

A good understanding of data modeling and source to target data mappings help equip the QA
analyst with information to develop an appropriate testing strategy. Hence, its important that
during the projects requirement analysis phase, the QA team works to understand the data
warehouse implementation to the greatest extent. Data warehouse testing strategies will, in most
cases, be a consortium of several smaller strategies. This is due to the nature of data warehouse
implementations.

Different stages of the data warehouse implementation (source data profiling, data warehouse
design, ETL development, data loading and transformations, etc.), require the testing teams
participation and support. Unlike some traditional testing, test execution does not start at the end
of the data warehouse implementation. In short, test execution itself has multiple phases and is
staggered throughout the life cycle of the data warehouse implementation.

While basic QA philosophies hold true when testing a data warehouse implementation, its
important for test teams to understand that testing a data warehouse is different from most other
software testing. Since a data warehouse primary deals with data, a major portion of the test
effort is spent on planning, designing and executing tests that are data oriented. Such tests
include SQL queries, validating that ETL sessions executes as expected, that ETL tool and store
procedure exceptions are handled effectively, application performance meets the SLAs and
finally, ensuring that data integration points are working as expected. Planning and designing
most of the test cases requires the test team to have experience in SQL and performance testing.
It will also be helpful when the team members have experience in debugging performance
bottlenecks.
Data warehouse testing goals and related verification methods

Primary goals for verification over all data warehouse project testing phases include:

Data completeness. Ensure that all expected data is loaded by means of each ETL
procedure.
Data transformations. Ensure that all data to be transformed is completed correctly
according to business rules and design specifications.
Data quality. Ensure that the ETL process correctly rejects, substitutes default values,
corrects, ignores and reports invalid data.
Performance and scalability. Ensure that data loads and queries perform within
expected time frames and that the technical architecture is scalable.
Integration testing. Ensure that the ETL process functions well with other upstream and
downstream processes.
User-acceptance testing. Ensure the data warehousing solution meets users current
expectations and anticipates their future expectations.
Regression testing. Ensure existing functionality remains intact each time a new release
of ETL code and data is completed.
Listed below are a few of the many reasons to thoroughly test the data warehouse and use a QA
process that is specific to data and ETL testing:

Source data is often huge in volume and from varied types of data repositories
The quality of source data cannot be assumed and should be profiled and cleaned
Inconsistent and redundancy may exist in source data
Many source data records may be rejected; ETL / stored procedure logs will contain
messages that must be acted upon
Source field values may be missing where they should always be present
Source data history, business rules and audits of source data may not be available
Enterprise-wide data knowledge and business rules may not be available to verify data
Since data ETLs must often pass through multiple phases before loading into the data
warehouse, extraction, transformation and loading components must be thoroughly tested
to ensure that the variety of data behaves as expected, within each phase
Heterogeneous sources of data (e.g., mainframe, spreadsheets, Unix files) will be updated
asynchronously through time then incrementally loaded.
Transaction-level traceability will be difficult to attain in a data warehouse
The data warehouse will be a strategic enterprise resource and heavily replied upon.
Testing Phases To Be Considered For The Data Warehouse Test Strategy
Figure 1 shows a representative data warehouse implementation from identification of source
data (lower left) to report and portal reporting (upper left). In between, several typical phases of
the end to end data warehouse development process are depicted such as source extract to
staging, dimension data to the operational data store (ODS), fact data to the data warehouse and
report and portal functions extracting data for display and reporting. The graphic illustrates that
all data load programs and resulting data loads should be verified throughout the end-to-end QA
process.

Figure 3.3 End to End Data Warehouse Process and Associated Testing

An end to end data warehouse test strategy is important for documenting the approach to test the
warehouse implementation process. The strategy typically contains a high-level understanding of
what the eventual testing workflow will be. The strategy will be used to verify and ensure that
the data warehouse system meets its design specifications and other requirements.

QA skills that are helpful for data warehouse testing

Understanding fundamental concepts of data warehousing and its place in an information


management environment
Knowing the role of the testing process as part of data warehouse development
Development of data warehouse test strategies, test plans and test cases what they are
and how to develop them, specifically for data warehouses and decision support systems
Creating effective test cases and scenarios based on business and user requirements for
the data warehouse
Participating in reviews of the data models, data mapping documents, ETL design and
ETL coding; provide feedback to designers and developers
Participating in the change management process and documenting relevant changes to
decision support requirements

This book aims to share the results of lessons learned during QA on many data warehouse
implementation projects. For example,

Formal QA data track verifications should begin early in the ETL design and data load
process and continue through deployment and into production.

Given early access to the ETL development environment, testers can assess the quality of
early data loads and offer valuable feedback to development teams. Such early access can
dramatically aid preparations for formal testing and identify issues early.

Where projects utilize offshore or contract test teams, they may discover the need for
more adequate and representative samples of data (production data, if possible) for test
planning and test case design.

For all project stakeholders, data models, database design documents (LLDs), ETL
design and data source to target mapping documents need to be kept in sync until
transition

Data warehouse test automation (particularly for regression testing) and associated tools
are key to success to support the increasingly important new lifecycle models such as
agile and iterative.

Planning for Data Warehouse Testing

Writing a test plan is a key to the entire data warehouse testing effort. The plan will help test
engineers to validate and verify data requirements from end to end (source to target data
warehouse). A primary purpose of a formal test program is to verify data requirements as stated
in the:
Business requirements document,
Data models for source and target schemas
Source to target mappings
ETL design documents

As requirement documents specifications are the what for ETL development, the test plan can
serve as the what for the test process. The test plan describes what QA staff will develop to
verify that the data warehouse meets requirements. Properly constructed, the test plan is a
contract between the QA team and all other project stakeholders.

In addition to the data requirements, we list below further considerations for the test planning
effort.
Configuration management system
Project schedule
Data quality verification process
Incident and error handling system
QA staff resources estimates and training needs
Testing environment budget and plan
Test tools
Test objectives
QA roles and responsibilities
Test deliverables
Test tasks
Defect reporting requirements
Entrance criteria, that should be met before formal testing commences
Exit criteria, that should be met before formal testing is completed

Planning Tests for Common Data Warehouse Issues

Following are some of the issues / defects that may be found during data warehouse testing.
Planning in advance for how to identify issues during tests is important.

Inadequate ETL and stored procedure design documentation to aid in test planning
Field values are null when specified as Not Null.
Field constraints and SQL not coded correctly for the ETL tool
Excessive ETL errors discovered after entry to formal QA
Source data does not meet table mapping specifications (ex., dirty data)
Source to target mappings: 1) often not reviewed before implementation, 2) are in error
or 2) not consistently maintained throughout the development lifecycle
Data models are not adequately maintained during the development lifecycle
Duplicate field values are found in either source or target data when defined in mapping
specifications to be DISTINCT
ETL SQL / transformation errors leading to missing rows and invalid field values
Constraint violations exist in source (perhaps could be found through data profiling)
Target data is incorrectly stored in nonstandard formats
Primary or foreign key values are incorrect for important relationship linkages
A key element contributing to the success of the data warehouse solution is the ability of the test
team to plan, design and execute a set of effective tests that will help identify multiple issues
related to data inconsistency, data quality, data security, failures in the extract, transform and
load (ETL) process, performance related issues, accuracy of business flows and fitness for use
from an end user perspective.

Overall, the primary focus of testing should be on the end to end ETL process. This includes,
validating the loading of all required rows, correct execution of all transformations and
successful completion of the cleansing operation. The team also needs to thoroughly test SQL
queries, stored procedures or queries that produce aggregate or summary tables. Keeping in tune
with emerging trends, it is also important for test team to design and execute a set of tests that are
customer experiencecentric.

Das könnte Ihnen auch gefallen