Beruflich Dokumente
Kultur Dokumente
When data is moved using flat files between enterprises or organizations within
enterprise, it is important to perform a set of file ingestion validations on the
inbound flat files before consuming the data in those files.
File name validation
Files are ftp'ed or copied over to a specific folder for processing. These files usually have a specific
naming convention so that the process consuming the file is able to understand the contents and
date. From a testing standpoint, the file name pattern needs to be validated to verify that it meets the
requirement.
Example: A government agency that gets files from multiple vendors on a periodic basis. The arriving files
should follow a naming convension of 'CompanyCode_ContentType_DateTimestamp.csv'. However, the
files coming in from a specific vendor do not have have the correct company name.
Example: A financial reporting company generates files with a header that contains the summary amount
with the line items having the detailed split. The sum of the amounts in the line items should match the
summary amount in the header.
Example: A pharma company gets a set of files from a vendor on a daily basis. The process consuming
this files expects the complete set of files to be available before processing
1. A file that were supposed to come yesterday was delayed. It came in sometime after today's file arrived
ETL Validator comes with Component Test Case and File Watcher which can be used to test Flat
Files.
Flat File Component: Flat file component is part of the Component Test Case. It can be used to
define data type anddata quality rules on the incoming flat file. The data in the flat file can also be
compared with data from the database.
File Watcher: Using File Watcher test plans can be triggered automatically when a new file comes
into a directory so that the test cases on the file can be executed automatically before the files are
used further by the consuming process.
SFTP Connection: Makes it easy compare and validate flat files located in a remote SFTP
location.
Example: Data for the comments column has more than 4000 characters in the inbound flat file while the
limit for the corresponding column in the database is only 2000 characters.
Example: Date of Birth is a required data element but some of the records are missing values in the
inbound flat file.
ETL Validator provides the capability to specify data type checks on the flat file in the flat file
component. Based on the data types specified, ETL Validator automatically check all the records
in the incoming flat file to find any invalid records.
Example: Values in the country_code column should have a valid country code from a Country Code
domain.
select distinct country_code from address
minus
select country_code from country
Example: Consider a file import process for a CRM application which imports contact lists for existing
Accounts. The contact list are CSV files with a column having the corresponding account_id. Lets assume
that the contact list can be loaded into a database table for the purpose of validation.
1. Count of null or unspecified dimension keys in a Fact table:
SELECT count(account_id) FROM contacts where account_id is null
2. Count of invalid foriegn key values in the contact list :
SELECT account_id FROM contacts
minus
SELECT s.account_id FROM accounts s, contacts c where s.account_id=c.account_id
ETL Validator supports defining of data quality rules in Flat File Component for automating the data
quality testing without writing any database queries. Custom rules can be defined and added to the
Data Model template.
Example 1: Compare column counts with values (non null values) between source and target for each
column based on the mapping.
Source Query (assuming the flat file data is loaded into 'customer' table for validation)
ETL Validator comes with Flat File Component and Data Profile Component as part
of Component Test Case for automating the comparison of flat file and target data. It takes care of
loading the flat file data into a table for running validations.
Data Profile Component: Automatically computes profile of the flat file data and target query
results - count, count distinct, nulls, avg, max, min, maxlength and minlength.
Component Test Case: Provides a visual test case builder that can be used to compare multiple
flat files and target.
Example: In a financial company, In a financial company, the interest earned on the savings account is
dependent the daily balance in the account for the month. The daily balance for the month is part of an
inbound CSV file for the process that computes the interest.
1. Review the requirement and design for calculating the interest.
2. Implement the logic using your favorite programming language.
3. Compare your output with data in the target table.
Example: In a financial company, the interest earned on the savings account is dependent the daily
balance in the account for the month.
1. Review the requirement for calculating the interest.
2. Setup test data in the flat file for various scenarios of daily account balance.
3. Compare the transformed data in the target table with the expected values for the test data.
ETL Validator comes with Component Test Case which can be used to test transformations
using the White Box approach or the Black Box approach.
Visual Test Case Builder: Component test case has a visual test case builder that makes it easy
to rebuild the transformation logic for testing purposes.
Workschema: ETL Validator's workschema stores the test data from source and target queries.
This makes it easy for the tester to implement transformations and compare using a Script
Component.
Benchmark Capability: Makes it easy baseline the target table (expected data) and compare the
latest data with the baselined data.
The goal of performance testing is to validate that the process consuming the
inbound flat files is able to handle flat files with the expected data volumes and
inbound arrival frequency.
Example 1: The process ingesting the flat file might perform well when the data when there are only a few
records in the file but perform bad when there is large number of rows.
Example 2: The flat file ingestion process may also perform bad as the data volumes increase in the
target table.