Sie sind auf Seite 1von 28

Automated ETL Testing

with Py.Test
Kevin A. Smith
Senior QA Automation Engineer
Cambia Health Solutions
Agenda
Overview
Testing Data Quality
Design for Automation & Testability
Python and Py.Test
Examples
2
Database Applications
Taking data out of an OnLine Transaction Processing
(OLTP) system and putting into an OnLine Analytical
Processing (OLAP) system involves Extracting the data,
Transforming the data and then Loading that data into
another database (ETL)
When Testing an ETL Application:
Extract
Transform
Compare
3
Cambia EVR Application
4
Testing Data Quality
Data Completeness
Ensures that all expected data is loaded.
Data Integrity
Ensures that the ETL application rejects, substitutes default values
or corrects and reports invalid data.
Data Transformation
Ensures that all data is transformed correctly according to business
rules and/or design specifications.
5
Testing Techniques Stare & Compare
Validate data
transformations
manually.
This step is usually
required to bootstrap
an ETL test automation
project
6
Testing Techniques Golden Files
Use well-known test
data and golden file
comparison as a
testing oracle.
This technique is very
powerful for automated
testing of printed
output.
7
Testing Techniques Self-Verifying
Oracle in Test Scripts
model.
Necessary if you want
to test any aspects of
the ETL application
running in production.
8
Design for Automation
Control
How well the application can be controlled from the test tools.
Visibility
How well are intermediate data and results visible to the test tools.
9
Design for Testability - Visibility
10
Test Tools - Rules of Thumb
1. Do not re-invent the wheel.
2. No test tool will do everything you need - customize
3. No one test tool will solve all of your test problems tool
box.
4. Do not expect your business experts or developers to
be able to create great tests, even with tools.
5. Do not use one-off technology for testing.
6. Do not use the built-in test module to your ETL
development tool.
11
Tool Requirements
Support Customization
Support Source to Target Data Mapping
Support Complex Logical Calculations
Support database connections
Support CSV and XML
Existing Tool
Customizable
Leverage Existing Knowledge
Multi-OS (AIX, Windows)
12
Python and Py.Test
Support Oracle and Sybase databases with 3
rd
party
libraries:
PyODBC, cx_Oracle
Native support for CSV files and XML
Strong support for containers (Tuple, List, Dict)
Easy learning curve for non-programmers
13
What is Py.Test?
Searches Disk for Tests
Sequences and Executes Tests
Captures Output
Captures Exceptions
Reports Results
Interfaces to Extend/Customize Behavior
Command Line Processing
Test Search/Sequencing/Selecting
Test Handling (Fixtures)
Reporting
14
Database Support
conn = cx_Oracle.connection(user_name,
password,
server_name)
crsr = conn.cursor()
query_string = <<<embedded sql statement>>>
crsr.execute(query_string)
for row in crsr.fetchall():
key = str(row[0]) + _ + str(row[1])
results[key] = {source : row,
target : (Missing,)}
15
CSV File Support
import csv
csv_data = csv.reader(open(data.csv,
newline=),
delimiter=|)
for row in csv_data():
key = str(row[0]) + _ + str(row[1])
results[key] = {source : row,
target : (Missing,)}
16
Row Comparison
for value in results.values():
assert value[source] == value[target]
17
Test Patterns
Database Schema
Row Counting
Simple Source to Target Mapping
Complex Source to Target Mapping
18
Database Schema
table_names = ('OUTPUT_CD_TRNSLTN', 'OUTPUT_DRAG_DT',
'OUTPUT_NTWK', 'OUTPUT_PH_NUM')
def test_dev_schema():
"""
Test the development database.
"""
schemas = []
crsr = Database.get_cursor('DEV')
for table in table_names:
schemas.append(get_table_dict(crsr,
'dev', table, out_dir, base_dir))
crsr.close()
generic_schema_compare(schemas, 'Development')
19
Database Schema (contd)
def generic_schema_compare(results, title):
"""
Generic table comparison test.
"""
test_rslt = True
for schema in results:
if schema[source'] != schema[target']:
schema[source'].show_diffs(schema[target'])
test_rslt = False
assert test_rslt, title + ' schema differences'
20
Row Counting
crsr.execute("""
Select count(*)
From FEP_PMT.FEP_CLM
Where FDS_BAT_ID = :arg_1 and
DISP_CD in ('1','2','9') and
AMT_PAID < 0""",
arg_1 = fds_bat_id)
for row in crsr.fetchall():
pass
actual = row[0]
assert actual == 0, 'Negative claims found,
invalid incoming data'
21
Complex Source to Target
for key, val in get_claim_lines.items():
expected_contract_adj_amt = 0
# calculate the expected contractual adjustment amount
# walk the fields by field name
for i in range(1,6):
# calculate the base name of this hag "row"
hag_base_name = 'HAG'+ str(i) + '_ADJ_'
if [hag_base_name + 'CDE'] == 'CO':
if(val[hag_base_name + 'RSN1'] != '23' and
val[hag_base_name + 'RSN1'] != '171'):
expected_contract_adj_amt += val[hag_base_name + 'AMT1']

if(val[hag_base_name + 'RSN2'] != '23' and
val[hag_base_name + 'RSN2'] != '171'):
expected_contract_adj_amt += val[hag_base_name + 'AMT2']

# now compare the calculation to the amount retrieved from the table
if round(val['CNTRCTL_ADJSTMT_AMT'], 4) != round(expected_contract_adj_amt, 4) :
print('Claim_trans_disp_line: ' + key + ' did not calculate correctly.')
print('actual:', round(val['CNTRCTL_ADJSTMT_AMT'], 4),
'Expected:', round(expected_contract_adj_amt, 4))
print()
test_result = False
assert test_result, 'Incorrect contractual adjustment calculations'
22
In-memory Data Representation
key = str(row[0]) + _ + str(row[1])
results = {} # create a dict to hold in-memory
# tables of source and target data
results = {key 1 : {source : row,
target : row},
key 2 : {source : row,
target : row,
source 1 : row,
source 2 : row},
key 3 : {source : row,
target : (Missing,)},
key 4 : {source : (Added,),
target : row}}
23
Customized Test Output
24
Customizations
Shared Database Connection Pool
Database connection parameters, including obfuscated login
information
INI-file Processing
File directories for XML, CSV, baseline and output logging files.
Default values for command line options, such as logical database
name mapping
Command Line Option Processing
Batch ID
Database Names
Standard Test Routines
Source to Target Mapping
Database Schema Testing
25
Team
James Bass UTi
William Buse Cambia Health Solutions
Matthew Pierce Cambia Health Solutions
Venkatesh Marada Cambia Health Solutions
Kanthi Kondreddi Cambia Health Solutions
Bhargavi Kanakamedala Cambia Health Solutions
Tim Rilling Cambia Health Solutions
Gordon Krenn Cambia Health Solutions
Tim Peterson Cambia Health Solutions
26
Upcoming Work
Detailed XML File Tests
Test Results Load Directly to Rally.
Golden-file Comparison with Definable Filtering
Golden File Comparison for PostScript
27
References
Python
http://www.python.org/
http://en.wikipedia.org/wiki/Python_(programming_language)
Py.Test
http://www.pytest.org/
Oracle Python Library
http://cx-oracle.sourceforge.net/html/
Python ODBC Library
https://code.google.com/p/pyodbc/
Companion paper
http://tinyurl.com/kofo3rv/
28

Das könnte Ihnen auch gefallen