Sie sind auf Seite 1von 10

What are Flat Files?

Flat files are extensively used for exchanging data between


enterprises or between organizations within an enterprise. Flat
files come in two forms - delimited files such as CSV (comma
separated) files or fixed width files.

What is Flat File Testing?


Flat File testing is the process of validating the quality of data in
the flat file as well as ensuring that the data in the flat file has
been consumed appropriately by the application or ETL
process.

Challenges in Flat File Testing?


Testing of inbound flat files presents unique challenges because
the producer of the flat file is usually different organizations
within an enterprise or an external vendor. Consequently, there
might be differences in the format and content of the
files since there is no easy way to enforce the data type and
data quality constraints on the data in the flat files. Issues in flat
file data can cause failures in the consuming process. While
the file processing requirements are different from project to
project, the focus of this use case is to list out some of the
common checks that need to be performed for validating flat
files.

Flat File Testing Categories


FLAT FILE INGESTION TESTING

When data is moved using flat files between enterprises or organizations within
enterprise, it is important to perform a set of file ingestion validations on the
inbound flat files before consuming the data in those files.
File name validation
Files are ftp'ed or copied over to a specific folder for processing. These files usually have a specific
naming convention so that the process consuming the file is able to understand the contents and
date. From a testing standpoint, the file name pattern needs to be validated to verify that it meets the
requirement.

Example: A government agency that gets files from multiple vendors on a periodic basis. The arriving files
should follow a naming convension of 'CompanyCode_ContentType_DateTimestamp.csv'. However, the
files coming in from a specific vendor do not have have the correct company name.

Size and Format of the flat files


Although, flat files are generally delimited or fixed width, it is common to have a header and footer in
these files. Sometimes, these headers have a rowcount that can be used to verify that the file
contains the entire data as expected.
Some of the relevant checks are:
Verify that the size of the file is within the expected range where applicable.
Verify that the header, footer and column heading rows have the expected format and have the
expected location within the flat file.
Perform any row count checks to cross check the data in the header with the values in the
delimited data.

Example: A financial reporting company generates files with a header that contains the summary amount
with the line items having the detailed split. The sum of the amounts in the line items should match the
summary amount in the header.

File arrival, processing and deletion times


Files arrive periodically into a specific network folder or an ftp location before getting consumed by a
process. Usually, there are specific requirements that need to be met regarding the file arrival time,
order of arrival and retaining them.

Example: A pharma company gets a set of files from a vendor on a daily basis. The process consuming
this files expects the complete set of files to be available before processing
1. A file that were supposed to come yesterday was delayed. It came in sometime after today's file arrived

causing issues due to difference in the order of processing the files.


2. After the files gets processed, it is supposed to be moved to a specific directory where it is to be
retained for a specified period of time and deleted. However, the file did not get copied over.
3. Compare the transformed data in the target table with the expected values for the test data.

Automate file ingestion testing using ETL Validator

ETL Validator comes with Component Test Case and File Watcher which can be used to test Flat
Files.
Flat File Component: Flat file component is part of the Component Test Case. It can be used to
define data type anddata quality rules on the incoming flat file. The data in the flat file can also be
compared with data from the database.
File Watcher: Using File Watcher test plans can be triggered automatically when a new file comes
into a directory so that the test cases on the file can be executed automatically before the files are
used further by the consuming process.
SFTP Connection: Makes it easy compare and validate flat files located in a remote SFTP
location.

FLAT FILE DATA TYPE TESTING


The purpose of Data Type testing is to verify that the type and length of the data
in the flat file is as expected.
Data Type Check
Verify that the type and format of the data in the inbound flat file matches the expected data type for
the file. For date, timestamp and time data types, the values are expected to be in a specific format
so that they can be parsed by the consuming process.
Example: An ID column of the flat file is expected to have only numbers. However, few rows in the flat file
have characters.

Data Length Check


Length of string and number data values in the flat file should match the maximum allowed length for
those columns.

Example: Data for the comments column has more than 4000 characters in the inbound flat file while the
limit for the corresponding column in the database is only 2000 characters.

Not Null Check


Verify that any required data elements in the flat file have data for all the rows.

Example: Date of Birth is a required data element but some of the records are missing values in the
inbound flat file.

Automate flat file data type testing with ETL Validator

ETL Validator provides the capability to specify data type checks on the flat file in the flat file

component. Based on the data types specified, ETL Validator automatically check all the records
in the incoming flat file to find any invalid records.

FLAT FILE DATA QUALITY TESTING


The purpose of Data Quality tests is to verify the accuracy of the data in the
inbound flat files.
Duplicate Data Checks
Check for duplicate rows in the inbound flat file with the same unique key column or a unique
combination of columns as per business requirement.
Example: Business requirement says that a combination of First Name, Last Name, Middle Name and
Data of Birth should be unique for the Customer list flat file.
Sample query to identify duplicates (assuming that the flat file data can be imported into a
database table)
SELECT fst_name, lst_name, mid_name, date_of_birth, count(1) FROM Customer GROUP BY
fst_name, lst_name, mid_name HAVING count(1)>1

Reference Data Checks


Flat file standards may dictate that the values in certain columns should adhere to a values in a
domain. Verify that the values in the inbound flat file conforms to reference data standards.

Example: Values in the country_code column should have a valid country code from a Country Code
domain.
select distinct country_code from address
minus
select country_code from country

Data Validation Rules


Many data fields can contain a range of values that cannot be enumerated. However, there are
reasonable constraints or rules that can be applied to detect situations where the data is clearly
wrong. Instances of fields containing values violating the validation rules defined represent a quality
gap that can impact inbound flat file processing.
Example: Date of birth (DOB). This is defined as the DATE datatype and can assume any valid date.
However, a DOB in the future, or more than 100 years in the past are probably invalid. Also, the date of
birth of the child is should not be greater than that of their parents.

Data Integrity Checks


This check addresses "keyed" relationships of entities within a domain. The goal is to idenfity orphan
records in the child entity with a foreign key to the parent entity.
1. Count of records with null foriegn key values in the flat file
2. Count of invalid foriegn key values in the flat file that do not have a corresponding primary key in
the parent flat file or database table

Example: Consider a file import process for a CRM application which imports contact lists for existing
Accounts. The contact list are CSV files with a column having the corresponding account_id. Lets assume
that the contact list can be loaded into a database table for the purpose of validation.
1. Count of null or unspecified dimension keys in a Fact table:
SELECT count(account_id) FROM contacts where account_id is null
2. Count of invalid foriegn key values in the contact list :
SELECT account_id FROM contacts
minus
SELECT s.account_id FROM accounts s, contacts c where s.account_id=c.account_id

Automate flat file data quality testing using ETL Validator

ETL Validator supports defining of data quality rules in Flat File Component for automating the data
quality testing without writing any database queries. Custom rules can be defined and added to the
Data Model template.

FLAT FILE DATA COMPLETENESS TESTING


Data in the inbound flat files is generally processed and loaded into a database.
In some cases the ouput may also be another flat file. The purpose of Data
Completeness tests are to verify that all the expected data is loaded in the target
from the inbound flat file. Some of the tests that can be run are : Compare and
Validate counts, aggregates (min, max, sum, avg) and actual data between the flat
file and target.
Record Count Validation
Compare count of records of the flat file and database table. Check for any rejected records.
Example: A simple count of records comparison between the source and target tables.
Source Query (assuming the flat file data is loaded into 'customer' table for validation)
SELECT count(1) src_count FROM customer
Target Query
SELECT count(1) tgt_count FROM customer_dim

Column Data Profile Validation


Column or attribute level data profiling is an effective tool to compare source and target data without
actually comparing the entire data. It is similar to comparing the checksum of your source and target
data. These tests are essential when testing large amounts of data.
Some of the common data profile comparisons that can be done between the flat file and target are:
Compare unique values in a column between the flat file and target
Compare max, min, avg, max length, min length values for columns depending of the data type
Compare null values in a column between the flat file and target
For important columns, compare data distribution (frequency) in a column between the flat file
and target

Example 1: Compare column counts with values (non null values) between source and target for each
column based on the mapping.
Source Query (assuming the flat file data is loaded into 'customer' table for validation)

SELECT count(row_id), count(fst_name), count(lst_name), avg(revenue) FROM customer


Target Query
SELECT count(row_id), count(first_name), count(last_name), avg(revenue) FROM customer_dim
Example 2: Compare the number of customers by country between the source and target.
Source Query (assuming the flat file data is loaded into 'customer' table for validation)
SELECT country, count(*) FROM customer GROUP BY country
Target Query
SELECT country_cd, count(*) FROM customer_dim GROUP BY country_cd

Compare entire flat file and target data


Compare data (values) between the flat file and target data effectively validating 100% of the data. In
regulated industries such as finance and pharma, 100% data validation might be a compliance
requirement. It is also a key requirement for data migration projects. However, performing 100% data
validation is a challenge when large volumes of data is involved. This is where ETL testing tools
such as ETL Validator can be used because they have an inbuilt ELV engine (Extract, Load,
Validate) capabile of comparing large values of data.
Example: Write a source query on the flat file that matches the data in the target table after
transformation.
Source Query (assuming the flat file data is loaded into 'customer' table for validation)
SELECT cust_id, fst_name, lst_name, fst_name||','||lst_name, DOB FROM Customer
Target Query
SELECT integration_id, first_name, Last_name, full_name, date_of_birth FROM Customer_dim

Automate flat file data completeness testing using ETL Validator

ETL Validator comes with Flat File Component and Data Profile Component as part
of Component Test Case for automating the comparison of flat file and target data. It takes care of
loading the flat file data into a table for running validations.
Data Profile Component: Automatically computes profile of the flat file data and target query
results - count, count distinct, nulls, avg, max, min, maxlength and minlength.
Component Test Case: Provides a visual test case builder that can be used to compare multiple
flat files and target.

FLAT FILE DATA TRANSFORMATION TESTING


Data in the inbound Flat File is transformed by the consuming process and
loaded into the target (table or file). It is important to test the transformed data .
There are two approaches for testing transformations - white box testing and
blackbox testing
Transformation testing using White Box approach
White box testing is a testing technique, that examines the program structure and derives test data
from the program logic/code.
For transformation testing, this involves reviewing the transformation logic from the flat file data
ingestion design document and corresponding code to come up with test cases.
The steps to be followed are listed below:
Review the transformation design document
Apply transformations on the flat file data using SQL or a procedural language such as PLSQL to
reflect the ETL transformation logic
Compare the results of the transformed data with the data in the target table or target flat file.
The advantage with this approach is that the tests can be rerun easily on a larger data set. The
disadvantage of this approach is that the tester has to reimplement the transformation logic.

Example: In a financial company, In a financial company, the interest earned on the savings account is
dependent the daily balance in the account for the month. The daily balance for the month is part of an
inbound CSV file for the process that computes the interest.
1. Review the requirement and design for calculating the interest.
2. Implement the logic using your favorite programming language.
3. Compare your output with data in the target table.

Transformation testing using Black Box approach


Black-box testing is a method of software testing that examines the functionality of an application
without peering into its internal structures or workings. For transformation testing, this involves
reviewing the transformation logic from the mapping design document setting up the test data
appropriately.
The steps to be followed are listed below:

Review the requrements document to understand the transformation requirements


Prepare test data in the flat file to reflect different transformation scenarios
Come with the transformed data values or the expected values for the test data from the previous
step
Compare the results of the transformed test data in the target table with the expected values.
The advantage with this approach is that the transformation logic does not need to be
reimplemented during the testing. The disadvantage of this approach is that the tester needs to
setup test data for each transformation scenario and come up with the expected values for the
transformed data manually.

Example: In a financial company, the interest earned on the savings account is dependent the daily
balance in the account for the month.
1. Review the requirement for calculating the interest.
2. Setup test data in the flat file for various scenarios of daily account balance.
3. Compare the transformed data in the target table with the expected values for the test data.

Automate data transformation testing using ETL Validator

ETL Validator comes with Component Test Case which can be used to test transformations
using the White Box approach or the Black Box approach.
Visual Test Case Builder: Component test case has a visual test case builder that makes it easy
to rebuild the transformation logic for testing purposes.
Workschema: ETL Validator's workschema stores the test data from source and target queries.
This makes it easy for the tester to implement transformations and compare using a Script
Component.
Benchmark Capability: Makes it easy baseline the target table (expected data) and compare the
latest data with the baselined data.

FLAT FILE INGESTION PERFORMANCE TESTING

The goal of performance testing is to validate that the process consuming the
inbound flat files is able to handle flat files with the expected data volumes and
inbound arrival frequency.

Example 1: The process ingesting the flat file might perform well when the data when there are only a few
records in the file but perform bad when there is large number of rows.
Example 2: The flat file ingestion process may also perform bad as the data volumes increase in the
target table.

End-to-End Data Testing of Flat File ingestion


Integration testing of the inbound flat file ingestion process and the related applications involves the
following steps:
Estimate expected data volumes in each of the source flat files for the consuming process for the
next 1-3 years.
Setup test data for performance testing either by generating sample flat files or getting sample flat
files.
Execute the flat file ingestion process to load the test data into the target.
Executing the flat file ingestion process again with large data in the target tables to identify
bottlenecks.

Das könnte Ihnen auch gefallen