Sie sind auf Seite 1von 13

Q1- What is the Difference between a ODS and Staging Area ?

ODS :-Operational Data Store which contains data .


ods comes after the staging area
eg:In our e.g lets consider that we have day level Granularity in the OLTP & Year level
Granularity in the Data warehouse.
If the business(manager) asks for week level Granularity then we have to go to
the oltp and summarize the day level to the week level which would be pain taking.So
wat we do is that we maintain week level Granularity in the ods for the data,for
abt 30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
ANS 2- ODS is nothing but the Operational Data Store which holds the data when the business gets started. It
means , it holds the history of data till yesterdays data(depends upon customer requirement). Sometimes, it
holds the real time data also.

Staging is the area before ODS which holds only 15 / 30 / 45 / 90 days data based on customer requirement.
In some of the project , staging is used as Pre-Production data. That depends.

Q2- Lets suppose we have some 10,000 odd records in source system and when
load them into target how do we ensure that all 10,000 records that are loaded

to target doesn't contain any garbage values. How do we test it. We can't
check every record as number of records are huge ?

1-(ETL TOOL) Go into workflow monitor after showing the status succeed click right button go into the property and
you can see there no of source row and success target rows and rejected rows
2ND TOAD TO VALIDATE THE RESULT
It requires 2 steps:

1.Select count(*) from source


Select count(*) from target

2. If source and target tables have same attributes and datatype

Select * from source


MINUS
Select * from target
Else
We have to go for attribute wise testing for each attribute according to design doc.
3RD USING EXCEL
copy paste the table data into excel sheet and compare the source and target using macros.

If file is too large say for example table has 1 lac records,import the table data into flat file say for example
.csv or .txt file.

Then spli those files into more chunks and convert them to excel and again compare using macros or simple
excel formula.

Q3- What are the ETL Tester Responsibilities?

A ETL Tester primarily test source data extraction, business transformation logic and target table loading . There are
so many tasks involved for doing the same , which are given below -

1. Stage table / SFS or MFS file created from source upstream system - below checks come under this :

a) Record count Check


b) Reconcile records with source data
c) No Junk data loaded
d) Key or Mandatory Field not missing
e) duplicate data not
f) Data type and size check

2) Business transformation logic applied - below checks come under this :

a) Business data check like telephone no cant be more than 10 digit or character data
b) Record count check for active and passing transformation logic applied
c) Derived Field from the source data is proper
d) Check Data flow from stage to intermediate table
e) Surrogate key generation check if any

3. Target table loading from stage file or table after applying transformation - below check come under this

a) Record count check from intermediate table or file to target table


b) Mandatory or key field data not missing or Null
c) Aggregate or derived value loaded in Fact table
d) Check view created based on target table
e) Truncate and load table check
f) CDC applied on incremental load table
g) dimension table check & history table check

h) Business rule validation on loaded table


i) Check reports based on loaded fact and dimension table

Q4- Find

out the Number of Columns in Flat File


In ETL testing if the src is flat file, then how will you verify the data count validation?
A-1 The metadata is always on the top row in a flat file. the delimiters (,or ; or @)seperate each columns
metadata.
A 2- by using "WC" unix command we can find the count of a file
A 3 - To find the count in flat file, you have to import that file into excel sheet (use Data->Import External
Data->import data) and find the count of records using count() [goto Insert->function->count->] in excel....
A 4 - Not always (in my experience, quite rarely in fact). Most often the flat file is just that, a flat file. If in
UNIX, the wc command is great, in Windows, one could open the file in notepad and CTRL+END to the
last row wherein one will see the row count, or, as stated below, use Excel. Or you could just have your
ETL program count for you. Those methods vary based on which ETL software you are using (SSIS,
Informatica, etc)

Q 5 - What

is a mapping, session, worklet, workflow, mapplet?

A - Session: A session is a set of instructions that tells the Informatica Server how and when to move data from
sources to targets.
Mapplet: Mapplet is the set of transformation which we can make for reusability.It is a whole logic.
Workflow: it is the pipeline which pass or flow the data from source to target.

A2 - A mapping represents dataflow from sources to targets.


A mapplet creates or configures a set of transformations.

A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.

A worklet is an object that represents a set of tasks.

A session is a set of instructions to move data from sources to targets.

Q 6 CACHE FILES ?
A-

Dynamic and Static Cache in Informatica

In Informatica Lookup transformation we have the option to the cache the Lookup
table(Cached Lookup).If we dont use the lookup cache its is called as Uncached
Lookup.
In Uncached lookup we do lookup on the base table and will return output values based on
the Lookup condition. If the lookup condition is matching it returns the value from
Lookup table or cache .And if lookup condition is not satisfied then it returns either
NULL or default value. This is how Uncached Lookup works
Now we will see what a Cached Lookup is!
In Cached lookup the Integration Service creates a Cache whenever the first row in the
Lookup is processed. Once a Cache is created the Integration Service always queries
the Cache instead of the Lookup Table. This saves a lot of time.
Lookup Cache can be of different types like Dynamic Cache and Static Cache
What is a Static Cache?
Integration service creates Static Cache by default while creating lookup cache. In Static
Cache the Integration Service does not update the cache while it processes the
transformation. This is why its called as Static.
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration
Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else
returns Null or Default value.
In Static Cache the important thing is that you cannot insert or update the cache.
What is a Dynamic Cache?
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The
Integration Service dynamically inserts or updates data in the lookup cache and passes
the data to the target. The dynamic cache is synchronized with the target.

Q 7 - What is Data warehouse?

A - In 1980, Bill Inmon known as father of data warehousing. "A Data warehouse
is a subject oriented, integrated ,time variant, non volatile collection of data in
support of management's decision making process".

Subject oriented : means that the data addresses a specific


subject such as sales, inventory etc.

Integrated : means that the data is obtained from a variety of


sources.

Time variant : implies that the data is stored in such a way


that when some data is changed.

Non volatile : implies that data is never removed. i.e.,


historical data is also kept.

Q 8 - What is the difference between database and data warehouse?


AA database is a collection of related data.
A data warehouse is also a collection of information as well as a
supporting system.

A database is a collection of related data. Where as Data


Warehouse stores historical data, the business users take their

decisions based on historical data only.

In Data base we can maintain only current data which was not more
than 3 years But in datawarehouse we can maintain history data it
means from the starting day of enterprise DDL commands it means
( Insert ,update,delete)we can do in Database In datawarehouse
once data loaded in Datawarehouse we can do any DDL
operatations.

Database is used for insert, update and delete operation where


asdatawarehouse is used for select to analyse the data.

Database

Data Warehouse

The tables and joins in DB are are

Tables and joins are simple

complex since they are normalized since they are de-normalized


ER Modeling techniques are used

Dimension modeling techniques

for database design

are used for database design

Optimized for write operation

Optimized for read operrations

Performance is slow for analysis

High performance for anlytical

queries

queries

Database uses OLTP concept Data warehouse uses OLAP concept,


means Data warehouse stores historical data.

A database is a collection related data and also it is related to same


data. Where as come to Data warehouse, It is collection of data
integrated from different sources and stored in one container for
taking or ( getting knowledge) managerial decisions.

In database we are using CRUD operations means create, read,


use, delete but in datawarehouse we are using select operation.

Q 9 - What are the benefits of data warehousing?

A - Historical information for comparative and competitive


analysis.

Enhanced data quality and completeness.

Supplementing disaster recovery plans with another data back


up source.

Q 10- What are the types of data warehouse?

A - There are mainly three type of Data Warehouse are :

Enterprise Data Warehouse

Operational data store

Data Mart

Q 11- What is the difference between data mining and data warehousing?

A-

Data mining, the operational data is analyzed using statistical


techniques and clustering techniques to find the hidden patterns
and trends. So, the data mines do some kind of summarization of
the data and can be used by data warehouses for faster analytical
processing for business intelligence.
Data warehouse may make use of a data mine for analytical
processing of the data in a faster way.

Data mining is one of the key concept to analyze data in


Datawarehouse.
Generally, basic testing concepts remains same across all domains. So, the
basic testing questions will also remain same. The only addition would be
some questions on domain. e.g. in case of ETL testing interview questions, it
would be some concepts of ETL, how tos on some specific type of checks /
tests in SQL and some set of best practices. Here is the list of some ETL testing
interview questions

Q 12- What is ETL?

A - ETL - extract, transform, and load. Extracting data from outside source
systems.Transforming raw data to make it fit for use by different departments.
Loading transformed data into target systems like data mart or data warehouse.

Q 13 - Why ETL testing is required?

A - To verify the correctness of data transformation against the signed off business
requirements and rules.
To verify that expected data is loaded into data mart or data warehouse without
loss of any data.
To validate the accuracy of reconciliation reports (if any e.g. in case of
comparison of report of transactions made via bank ATM ATM report vs. Bank
Account Report).

To make sure complete process meet performance and scalability requirements


Data security is also sometimes part of ETL testing
To evaluate the reporting efficiency

Q 14- What is Data warehouse?

A - Data warehouse is a database used for reporting and data analysis.

Q 15- What are the characteristics of a Data Warehouse?

A - Subject Oriented, Integrated, Time-variant and Non-volatile

Q 16- What is the difference between Data Mining and Data Warehousing?

A - Data mining - analyzing data from different perspectives and concluding it into
useful decision making information. It can be used to increase revenue, cost
cutting, increase productivity or improve any business process. There are lot of
tools available in market for various industries to do data mining. Basically, it is
all about finding correlations or patterns in large relational databases.
Data warehousing comes before data mining. It is the process of compiling
and organizing data into one database from various source systems where as data
mining is the process of extracting meaningful data from that database (data
warehouse).

Q 17 - What are the main stages of Business Intelligence.

A - Data Sourcing > Data Analysis > Situation Awareness > Risk Assessment >
Decision Support

Q 18- What tools you have used for ETL testing?


A - Data access tools e.g., TOAD, WinSQL, AQT etc. (used to analyze content of tables)
2. ETL Tools e.g. Informatica, DataStage
3. Test management tool e.g. Test Director, Quality Center etc. ( to maintain
requirements, test cases, defects and traceability matrix)
Below are few more questions that can be asked:

Q 19- What is Data Mart?


Q 20- Data Warehouse Testing vs Database Testing
Q21- Who are the participants of data warehouse testing
Q 22- How to prepare test cases for ETL / Data Warehousing testing?

Q 23- What is OLTP and OLAP


Q 24- What is look up table
Q 25- What is MDM (Master data management)

Q 26- Give some examples of real time data warehousing


Q 27 -

Das könnte Ihnen auch gefallen