Beruflich Dokumente
Kultur Dokumente
Staging is the area before ODS which holds only 15 / 30 / 45 / 90 days data based on customer requirement.
In some of the project , staging is used as Pre-Production data. That depends.
Q2- Lets suppose we have some 10,000 odd records in source system and when
load them into target how do we ensure that all 10,000 records that are loaded
to target doesn't contain any garbage values. How do we test it. We can't
check every record as number of records are huge ?
1-(ETL TOOL) Go into workflow monitor after showing the status succeed click right button go into the property and
you can see there no of source row and success target rows and rejected rows
2ND TOAD TO VALIDATE THE RESULT
It requires 2 steps:
If file is too large say for example table has 1 lac records,import the table data into flat file say for example
.csv or .txt file.
Then spli those files into more chunks and convert them to excel and again compare using macros or simple
excel formula.
A ETL Tester primarily test source data extraction, business transformation logic and target table loading . There are
so many tasks involved for doing the same , which are given below -
1. Stage table / SFS or MFS file created from source upstream system - below checks come under this :
a) Business data check like telephone no cant be more than 10 digit or character data
b) Record count check for active and passing transformation logic applied
c) Derived Field from the source data is proper
d) Check Data flow from stage to intermediate table
e) Surrogate key generation check if any
3. Target table loading from stage file or table after applying transformation - below check come under this
Q4- Find
Q 5 - What
A - Session: A session is a set of instructions that tells the Informatica Server how and when to move data from
sources to targets.
Mapplet: Mapplet is the set of transformation which we can make for reusability.It is a whole logic.
Workflow: it is the pipeline which pass or flow the data from source to target.
A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.
Q 6 CACHE FILES ?
A-
In Informatica Lookup transformation we have the option to the cache the Lookup
table(Cached Lookup).If we dont use the lookup cache its is called as Uncached
Lookup.
In Uncached lookup we do lookup on the base table and will return output values based on
the Lookup condition. If the lookup condition is matching it returns the value from
Lookup table or cache .And if lookup condition is not satisfied then it returns either
NULL or default value. This is how Uncached Lookup works
Now we will see what a Cached Lookup is!
In Cached lookup the Integration Service creates a Cache whenever the first row in the
Lookup is processed. Once a Cache is created the Integration Service always queries
the Cache instead of the Lookup Table. This saves a lot of time.
Lookup Cache can be of different types like Dynamic Cache and Static Cache
What is a Static Cache?
Integration service creates Static Cache by default while creating lookup cache. In Static
Cache the Integration Service does not update the cache while it processes the
transformation. This is why its called as Static.
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration
Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else
returns Null or Default value.
In Static Cache the important thing is that you cannot insert or update the cache.
What is a Dynamic Cache?
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The
Integration Service dynamically inserts or updates data in the lookup cache and passes
the data to the target. The dynamic cache is synchronized with the target.
A - In 1980, Bill Inmon known as father of data warehousing. "A Data warehouse
is a subject oriented, integrated ,time variant, non volatile collection of data in
support of management's decision making process".
In Data base we can maintain only current data which was not more
than 3 years But in datawarehouse we can maintain history data it
means from the starting day of enterprise DDL commands it means
( Insert ,update,delete)we can do in Database In datawarehouse
once data loaded in Datawarehouse we can do any DDL
operatations.
Database
Data Warehouse
queries
queries
Data Mart
Q 11- What is the difference between data mining and data warehousing?
A-
A - ETL - extract, transform, and load. Extracting data from outside source
systems.Transforming raw data to make it fit for use by different departments.
Loading transformed data into target systems like data mart or data warehouse.
A - To verify the correctness of data transformation against the signed off business
requirements and rules.
To verify that expected data is loaded into data mart or data warehouse without
loss of any data.
To validate the accuracy of reconciliation reports (if any e.g. in case of
comparison of report of transactions made via bank ATM ATM report vs. Bank
Account Report).
Q 16- What is the difference between Data Mining and Data Warehousing?
A - Data mining - analyzing data from different perspectives and concluding it into
useful decision making information. It can be used to increase revenue, cost
cutting, increase productivity or improve any business process. There are lot of
tools available in market for various industries to do data mining. Basically, it is
all about finding correlations or patterns in large relational databases.
Data warehousing comes before data mining. It is the process of compiling
and organizing data into one database from various source systems where as data
mining is the process of extracting meaningful data from that database (data
warehouse).
A - Data Sourcing > Data Analysis > Situation Awareness > Risk Assessment >
Decision Support