Sie sind auf Seite 1von 2

Testing Data Warehouse Applications− the ETL Perspective

Abstract: This paper takes a look at the i. All transformation logics work as
different strategies to test a data warehouse designed from source till target
application. It attempts to suggest various ii. Boundary conditions are satisfied−
approaches that could be beneficial while testing e.g. check for date fields with leap
the ETL process in a DW. A data warehouse is a year dates
critical business application and defects in it iii. Surrogate keys have been generated
results is business loss that cannot be accounted properly
for. Here, we walk you through some of the basic iv. NULL values have been populated
phases and strategies to minimize defects. where expected
v. Rejects have occurred where
expected and log for rejects is
Introduction: This is an era of global created with sufficient details
competition and ignorance is one of the greatest
vi. Error recovery methods
threats to modern business. As such
vii. Auditing is done properly
organizations across the globe are relying on IT
services for strategic decision-making. A data
c) That the data loaded into the target is
warehouse implementation is one such tool that
complete:
comes to the rescue. Given the criticality of a
i. All source data that is expected to
DW1 application, a defect-free DW
get loaded into target, actually get
implementation is a dream come true for any
loaded− compare counts between
organization. As QA2 and testing personnel, our
source and target and use data
role is to ensure this thereby leading to
profiling tools
maximized profits, better decisions and customer
ii. All fields are loaded with full
satisfactions. A bug in the system traced at a
later stage not only increases the cost associated contents− i.e. no data field is
with rework, but also associates with it the use of truncated while transforming
incorrect data to make strategic decisions. iii. No duplicates are loaded
Hence, pre-implementation defect detection iv. Aggregations take place in the
should be ensured. In light of the above target properly
discussion, let us take a look into the various v. Data integrity constraints are
strategies involved in the testing cycle for a DW properly taken care of
application.
System testing: Generally the QA team owns
this responsibility. For them the design
The DW Testing Life Cycle: As with any document is the bible and the entire set of test
other piece of software a DW implementation cases is directly based upon it. Here we test for
undergoes the natural cycle of Unit testing, the functionality of the application and mostly it
System testing, Regression testing, Integration is black-box. The major challenge here is
testing and Acceptance testing. However, unlike preparation of test data. An intelligently designed
others there are no off-the-shelf testing products input dataset can bring out the flaws in the
available for a DW. application more quickly. Wherever possible use
production-like data. You may also use data
Unit testing: Traditionally this has been the task generation tools or customized tools of your own
of the developer. This is a white-box testing to to create test data. We must test for all possible
ensure the module or component is coded as per combinations of input and specifically check out
agreed upon design specifications. The developer the errors and exceptions. An unbiased approach
should focus on the following: is required to ensure maximum efficiency.
a) That all inbound and outbound directory Knowledge of the business process is an added
structures are created properly with appropriate advantage since we must be able to interpret the
permissions and sufficient disk space. All tables results functionally and not just code-wise.
used during the ETL3 are present with necessary The QA team must test for:
privileges.
i. Data completeness− match source
b) The ETL routines give expected results:
to target counts
ii. Data aggregations− match terms of business. Also the load windows,
aggregated data against staging refresh period for the DW and the views created
tables and/or ODS4 should be signed off from users.
iii. Granularity of data is as per
specifications Performance testing: In addition to the above
iv. Error logs and audit tables are tests a DW must necessarily go through another
generated and populated properly phase called performance testing. Any DW
v. Notifications to IT and/or business application is designed to be scaleable and
are generated in proper format robust. Therefore, when it goes into production
environment, it should not cause performance
Regression testing: A DW application is not a problems. Here, we must test the system with
one-time solution. Possibly it is the best example huge volume of data. We must ensure that the
of an incremental design where requirements are load window is met even under such volumes.
enhanced and refined quite often based on This phase should involve DBA team, and ETL
business needs and feedbacks. In such a situation expert and others who can review and validate
it is very critical to test that the existing your code for optimization.
functionalities of a DW application are not
messed up whenever an enhancement is made to Finally a few words of caution to end with.
it. Generally this is done by running all Testing a DW application should be done with a
functional tests for existing code whenever a new sense of utmost responsibility. A bug in a DW
piece of code is introduced. However, a better traced at a later stage results in unpredictable
strategy could be to preserve earlier test input losses. And the task is even more difficult in the
data and result sets and running the same again. absence of any single end-to-end testing tool. So
Now the new results could be compared against the strategies for testing should be methodically
the older ones to ensure proper functionality. developed, refined and streamlined. This is also
true since the requirements of a DW are often
Integration testing: This is done to ensure that dynamically changing. Under such
the application developed works from an end-to- circumstances repeated discussions with
end perspective. Here we must consider the development team and users is of utmost
compatibility of the DW application with importance to the test team. Another area of
upstream and downstream flows. We need to concern is test coverage. This has to be reviewed
ensure for data integrity across the flow. Our test multiple times to ensure completeness of testing.
strategy should include testing for: Always remember, a DW tester must go an extra
i. Sequence of jobs to be executed mile to ensure near defect free solutions.
with job dependencies and
scheduling Acronyms:
ii. Re-startability of jobs in case of 1. DW− Data Warehouse
failures
2. QA− Quality Assurance
iii. Generation of error logs
3. ETL− Extraction, Transformation and
iv. Cleanup scripts for the environment
Loading
including database
This activity is a combined responsibility and 4. ODS− Operational Data Store
participation of experts from all related
applications is a must in order to avoid References:
misinterpretation of results. 1. Data Warehousing− Soumendra Mohanty
2. Strategies for testing data warehouse
Acceptance testing: This is the most critical part applications− Jeff Theobald, DW review
because here the actual users validate your Magazine, June 2007 issue
output datasets. They are the best judges to 3. The Data Warehouse Toolkit− Ralph Kimball
ensure that the application works as expected by
them. However, business users may not have
proper ETL knowledge. Hence, the development
and test team should be ready to provide answers
regarding ETL process that relate to data
population. The test team must have sufficient
business knowledge to translate the results in

Das könnte Ihnen auch gefallen