Etl Real Time Q

Q1- What is the Difference between a ODS and Staging Area ?
ODS :-Operational Data Store which contains data .

ods comes after the staging area
eg:In our e.g lets consider that we have day level Granularity in the OLTP & Year level
Granularity in the Data warehouse.
If the business(manager) asks for week level Granularity then we have to go to
the oltp and summarize the day level to the week level which would be pain taking.So
wat we do is that we maintain week level Granularity in the ods for the data,for
abt 30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
ANS 2- ODS is nothing but the Operational Data Store which holds the data when the business gets started. It
means , it holds the history of data till yesterdays data(depends upon customer requirement). Sometimes, it
holds the real time data also.
Staging is the area before ODS which holds only 15 / 30 / 45 / 90 days data based on customer requirement.
In some of the project , staging is used as Pre-Production data. That depends.
Q2- Lets suppose we have some 10,000 odd records in source system and when
load them into target how do we ensure that all 10,000 records that are loaded
to target doesn't contain any garbage values. How do we test it. We can't
check every record as number of records are huge ?
1-(ETL TOOL) Go into workflow monitor after showing the status succeed click right button go into the property and
you can see there no of source row and success target rows and rejected rows
2ND TOAD TO VALIDATE THE RESULT
It requires 2 steps:
1.Select count(*) from source

Select count(*) from target
2. If source and target tables have same attributes and datatype
Select * from source

MINUS
Select * from target
Else
We have to go for attribute wise testing for each attribute according to design doc.
3RD USING EXCEL
copy paste the table data into excel sheet and compare the source and target using macros.
If file is too large say for example table has 1 lac records,import the table data into flat file say for example
.csv or .txt file.
Then spli those files into more chunks and convert them to excel and again compare using macros or simple
excel formula.
Q3- What are the ETL Tester Responsibilities?
A ETL Tester primarily test source data extraction, business transformation logic and target table loading . There are
so many tasks involved for doing the same , which are given below -
1. Stage table / SFS or MFS file created from source upstream system - below checks come under this :
a) Record count Check

b) Reconcile records with source data
c) No Junk data loaded
d) Key or Mandatory Field not missing
e) duplicate data not
f) Data type and size check
2) Business transformation logic applied - below checks come under this :
a) Business data check like telephone no cant be more than 10 digit or character data
b) Record count check for active and passing transformation logic applied
c) Derived Field from the source data is proper
d) Check Data flow from stage to intermediate table
e) Surrogate key generation check if any
3. Target table loading from stage file or table after applying transformation - below check come under this
a) Record count check from intermediate table or file to target table

b) Mandatory or key field data not missing or Null
c) Aggregate or derived value loaded in Fact table
d) Check view created based on target table
e) Truncate and load table check
f) CDC applied on incremental load table
g) dimension table check & history table check
h) Business rule validation on loaded table

i) Check reports based on loaded fact and dimension table
Q4- Find
out the Number of Columns in Flat File

In ETL testing if the src is flat file, then how will you verify the data count validation?
A-1 The metadata is always on the top row in a flat file. the delimiters (,or ; or @)seperate each columns
metadata.
A 2- by using "WC" unix command we can find the count of a file
A 3 - To find the count in flat file, you have to import that file into excel sheet (use Data->Import External
Data->import data) and find the count of records using count() [goto Insert->function->count->] in excel....
A 4 - Not always (in my experience, quite rarely in fact). Most often the flat file is just that, a flat file. If in
UNIX, the wc command is great, in Windows, one could open the file in notepad and CTRL+END to the
last row wherein one will see the row count, or, as stated below, use Excel. Or you could just have your
ETL program count for you. Those methods vary based on which ETL software you are using (SSIS,
Informatica, etc)
Q 5 - What
is a mapping, session, worklet, workflow, mapplet?
A - Session: A session is a set of instructions that tells the Informatica Server how and when to move data from
sources to targets.
Mapplet: Mapplet is the set of transformation which we can make for reusability.It is a whole logic.
Workflow: it is the pipeline which pass or flow the data from source to target.
A2 - A mapping represents dataflow from sources to targets.

A mapplet creates or configures a set of transformations.
A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.
A worklet is an object that represents a set of tasks.
A session is a set of instructions to move data from sources to targets.
Q 6 CACHE FILES ?
A-
Dynamic and Static Cache in Informatica
In Informatica Lookup transformation we have the option to the cache the Lookup
table(Cached Lookup).If we dont use the lookup cache its is called as Uncached
Lookup.
In Uncached lookup we do lookup on the base table and will return output values based on
the Lookup condition. If the lookup condition is matching it returns the value from
Lookup table or cache .And if lookup condition is not satisfied then it returns either
NULL or default value. This is how Uncached Lookup works
Now we will see what a Cached Lookup is!
In Cached lookup the Integration Service creates a Cache whenever the first row in the
Lookup is processed. Once a Cache is created the Integration Service always queries
the Cache instead of the Lookup Table. This saves a lot of time.
Lookup Cache can be of different types like Dynamic Cache and Static Cache
What is a Static Cache?
Integration service creates Static Cache by default while creating lookup cache. In Static
Cache the Integration Service does not update the cache while it processes the
transformation. This is why its called as Static.
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration
Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else
returns Null or Default value.
In Static Cache the important thing is that you cannot insert or update the cache.
What is a Dynamic Cache?
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The
Integration Service dynamically inserts or updates data in the lookup cache and passes
the data to the target. The dynamic cache is synchronized with the target.
Q 7 - What is Data warehouse?
A - In 1980, Bill Inmon known as father of data warehousing. "A Data warehouse
is a subject oriented, integrated ,time variant, non volatile collection of data in
support of management's decision making process".
Subject oriented : means that the data addresses a specific

subject such as sales, inventory etc.
Integrated : means that the data is obtained from a variety of

sources.
Time variant : implies that the data is stored in such a way

that when some data is changed.
Non volatile : implies that data is never removed. i.e.,

historical data is also kept.
Q 8 - What is the difference between database and data warehouse?

AA database is a collection of related data.
A data warehouse is also a collection of information as well as a
supporting system.
A database is a collection of related data. Where as Data

Warehouse stores historical data, the business users take their
decisions based on historical data only.
In Data base we can maintain only current data which was not more
than 3 years But in datawarehouse we can maintain history data it
means from the starting day of enterprise DDL commands it means
( Insert ,update,delete)we can do in Database In datawarehouse
once data loaded in Datawarehouse we can do any DDL
operatations.
Database is used for insert, update and delete operation where

asdatawarehouse is used for select to analyse the data.
Database
Data Warehouse
The tables and joins in DB are are
Tables and joins are simple
complex since they are normalized since they are de-normalized

ER Modeling techniques are used
Dimension modeling techniques
for database design
are used for database design
Optimized for write operation
Optimized for read operrations
Performance is slow for analysis
High performance for anlytical
queries
queries
Database uses OLTP concept Data warehouse uses OLAP concept,

means Data warehouse stores historical data.
A database is a collection related data and also it is related to same

data. Where as come to Data warehouse, It is collection of data
integrated from different sources and stored in one container for
taking or ( getting knowledge) managerial decisions.
In database we are using CRUD operations means create, read,

use, delete but in datawarehouse we are using select operation.
Q 9 - What are the benefits of data warehousing?
A - Historical information for comparative and competitive

analysis.
Enhanced data quality and completeness.
Supplementing disaster recovery plans with another data back

up source.
Q 10- What are the types of data warehouse?
A - There are mainly three type of Data Warehouse are :
Enterprise Data Warehouse
Operational data store
Data Mart
Q 11- What is the difference between data mining and data warehousing?
A-
Data mining, the operational data is analyzed using statistical

techniques and clustering techniques to find the hidden patterns
and trends. So, the data mines do some kind of summarization of
the data and can be used by data warehouses for faster analytical
processing for business intelligence.
Data warehouse may make use of a data mine for analytical
processing of the data in a faster way.
Data mining is one of the key concept to analyze data in

Datawarehouse.
Generally, basic testing concepts remains same across all domains. So, the
basic testing questions will also remain same. The only addition would be
some questions on domain. e.g. in case of ETL testing interview questions, it
would be some concepts of ETL, how tos on some specific type of checks /
tests in SQL and some set of best practices. Here is the list of some ETL testing
interview questions
Q 12- What is ETL?
A - ETL - extract, transform, and load. Extracting data from outside source
systems.Transforming raw data to make it fit for use by different departments.
Loading transformed data into target systems like data mart or data warehouse.
Q 13 - Why ETL testing is required?
A - To verify the correctness of data transformation against the signed off business
requirements and rules.
To verify that expected data is loaded into data mart or data warehouse without
loss of any data.
To validate the accuracy of reconciliation reports (if any e.g. in case of
comparison of report of transactions made via bank ATM ATM report vs. Bank
Account Report).
To make sure complete process meet performance and scalability requirements

Data security is also sometimes part of ETL testing
To evaluate the reporting efficiency
Q 14- What is Data warehouse?
A - Data warehouse is a database used for reporting and data analysis.
Q 15- What are the characteristics of a Data Warehouse?
A - Subject Oriented, Integrated, Time-variant and Non-volatile
Q 16- What is the difference between Data Mining and Data Warehousing?
A - Data mining - analyzing data from different perspectives and concluding it into
useful decision making information. It can be used to increase revenue, cost
cutting, increase productivity or improve any business process. There are lot of
tools available in market for various industries to do data mining. Basically, it is
all about finding correlations or patterns in large relational databases.
Data warehousing comes before data mining. It is the process of compiling
and organizing data into one database from various source systems where as data
mining is the process of extracting meaningful data from that database (data
warehouse).
Q 17 - What are the main stages of Business Intelligence.
A - Data Sourcing > Data Analysis > Situation Awareness > Risk Assessment >
Decision Support
Q 18- What tools you have used for ETL testing?

A - Data access tools e.g., TOAD, WinSQL, AQT etc. (used to analyze content of tables)
2. ETL Tools e.g. Informatica, DataStage
3. Test management tool e.g. Test Director, Quality Center etc. ( to maintain
requirements, test cases, defects and traceability matrix)
Below are few more questions that can be asked:
Q 19- What is Data Mart?

Q 20- Data Warehouse Testing vs Database Testing
Q21- Who are the participants of data warehouse testing
Q 22- How to prepare test cases for ETL / Data Warehousing testing?
Q 23- What is OLTP and OLAP

Q 24- What is look up table
Q 25- What is MDM (Master data management)
Q 26- Give some examples of real time data warehousing

Q 27 -

Etl Real Time Q

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Etl Real Time Q

Hochgeladen von

Copyright:

Verfügbare Formate

Q1- What is the Difference between a ODS and Staging Area ?

ODS :-Operational Data Store which contains data .

1.Select count(*) from source

2. If source and target tables have same attributes and datatype

Select * from source

Q3- What are the ETL Tester Responsibilities?

a) Record count Check

2) Business transformation logic applied - below checks come under this :

a) Record count check from intermediate table or file to target table

h) Business rule validation on loaded table

out the Number of Columns in Flat File

is a mapping, session, worklet, workflow, mapplet?

A2 - A mapping represents dataflow from sources to targets.

A worklet is an object that represents a set of tasks.

A session is a set of instructions to move data from sources to targets.

Dynamic and Static Cache in Informatica

Q 7 - What is Data warehouse?

Subject oriented : means that the data addresses a specific

Integrated : means that the data is obtained from a variety of

Time variant : implies that the data is stored in such a way

Non volatile : implies that data is never removed. i.e.,

Q 8 - What is the difference between database and data warehouse?

A database is a collection of related data. Where as Data

decisions based on historical data only.

Database is used for insert, update and delete operation where

The tables and joins in DB are are

Tables and joins are simple

complex since they are normalized since they are de-normalized

Dimension modeling techniques

for database design

are used for database design

Optimized for write operation

Optimized for read operrations

Performance is slow for analysis

High performance for anlytical

Database uses OLTP concept Data warehouse uses OLAP concept,

A database is a collection related data and also it is related to same

In database we are using CRUD operations means create, read,

Q 9 - What are the benefits of data warehousing?

A - Historical information for comparative and competitive

Enhanced data quality and completeness.

Supplementing disaster recovery plans with another data back

Q 10- What are the types of data warehouse?

A - There are mainly three type of Data Warehouse are :

Enterprise Data Warehouse

Operational data store

Data mining, the operational data is analyzed using statistical

Data mining is one of the key concept to analyze data in

Q 12- What is ETL?

Q 13 - Why ETL testing is required?

To make sure complete process meet performance and scalability requirements

Q 14- What is Data warehouse?

A - Data warehouse is a database used for reporting and data analysis.

Q 15- What are the characteristics of a Data Warehouse?

A - Subject Oriented, Integrated, Time-variant and Non-volatile

Q 17 - What are the main stages of Business Intelligence.

Q 18- What tools you have used for ETL testing?

Q 19- What is Data Mart?

Q 23- What is OLTP and OLAP

Q 26- Give some examples of real time data warehousing

Das könnte Ihnen auch gefallen