Sie sind auf Seite 1von 8

ETL Transformations

Whenever you need value from SP that time you have to use store pro. transformation. Or whenever you would like to preform some activity on the DB after loading the data or before loading data through ETL process. It can be done through Shell scripts as well.
What are snapshots? What are materialized views & where do we use them? What is a materialized view log? A Materialized View is just that. A view which has been materialized to become a "table". The big difference between a View and a Materialized View is a View executes a query on-the-fly to return results. A Materialized View executes a query but then stores (or caches) the results. You can then query the MVIEW. This means once a Materialized view has been built queries can run significantly faster then if they were run on-the-fly directly against the underlying data. This performance benefit makes MVIEWS particularly suitable for creating aggregates (summaries of fact tables). You could for example build a daily and monthly summary of sales records from a fact table of Sales Transactions. One further benefit of MVIEWS is that (in Oracle 9i upwards) they can take part in the Query Rewrite option. This means a query to sum the sales by year (which doesnt directly exist as an MVIEW) could be automatically re-written by Oracle to query the monthly summary rather than the underlying data. Automatically improving performance. Finally MVIEWS can also be refreshed either entirely or with just the changes depending how they were designed and you can build indexes on MVIEWS. MVIEWS can be used for almost any purpose (but most often for query performance) it has nothing specifically to do with audit trail on DML operations. This is a complete misunderstanding. What is Full load & Incremental or Refresh load?

Full Load is the entire data dump load taking place the very first time. Gradually to synchronize the target data with source data there are further 2 techniques:Refresh load - Where the existing data is truncated and reloaded completely.Incremental - Where delta or difference between target and source data is dumped at regular intervals. Timsetamp for previous delta load has to be maintained.
what is the difference between etl tool and olap tool? ETL tool is ment for extraction data from the legecy systems and load into specified data base with some process of cleansing data.ex: Informatica data stage ....etc OLAP is ment for Reporting purpose.in OLAP data avaliable in Mulitidimectional model. so that u can write smple query to extract data fro the data base.ex: Businee objects Cognos....etc where do we use connected and un connected lookups...

Unconected lookups are used in cases where it is not necessary for all the rows to pass through it i.e. for each i/p row query need not be done.
What is a mapping, session, worklet, workflow, mapplet?

A mapping represents dataflow from sources to targets. A mapplet creates or configures a set of transformations. A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks. A worklet is an object that represents a set of tasks. A session is a set of instructions to move data from sources to targets.
Can Informatica load heterogeneous targets from heterogeneous sources?

Informatica can load Hetrogeneuous Targets from hetrogeneuous Sources

Can anyone please explain why and where do we exac... You can use the Lookup transformation to perform many tasks including: Get a related value. For example your source includes employee ID but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation such as gross sales per invoice or sales tax but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether rows already exist in the target. What are the various test procedures used to check...

Hi u can check out the session logfiles for the total number of records added number of records udated and number of rejected records and errors related to that so this is the answer the interviewer is expecting from usk
how do you tell aggregator stage that input data i... By enablign sorted input property in Aggregator Properties List of all the ETL and Reporting tools available in the market

ETL as I know: Informatica Data Stage Abinitio Hyperion Python Oracle Warehouse Builder REporting: Business Objects Microstrategy Siebel Analytics Cognos

How you capture changes in data if the source system does not have option of storing date/time field in source table from where you need to extract the data?
The DW database can be Oracle or Teradata. The requirement here is to pull data from source system and ETL need to device a mechanism to identify the changes or new records. The source system can be a legacy system like AS400 application or Mainframe application. List out all such methods of data capture. The ETL can be Informatica, data stage or custom etl code.
RE: What are push and pull etl strategies?

Push strategies: - ETL waiting notification for data would be processed. If the notification contain no data to be processed ETL must retrieve itself to the source. - Used in realtime ETL Pull strategies: - ETL schedule job to retrieve data - Used in scheduled ETL
RE: What is the Difference between a ODS and Staging A...

ODS :-Operational Data Store which contains data . ods comes after the staging area eg:In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse. If the business(manager) asks for week level Granularity then we have to go to the oltp and summarize the day level to the week level which would be pain taking.So wat we do is that we maintain week level Granularity in the ods for the data for abt 30 to 90 days. Note : Ods information would contain cleansed data only. ie after staging area Staging Area :It comes after the etl has finished.Staging Area consists of 1.Meta Data . 2.The work area where we apply our complex business rules. 3.Hold the data and do calculations. In other words we can say that its a temp work area.
RE: Lets suppose we have some 10,000 odd records in so...

To do this you must profile the data at the source to know the domain of all the values get the actual number of rows in the source get the types of the data in the source. After it is loaded into the target this process can be repeated i.e. checking the data values with respect to range type etc and also checking the actual number of rows inserted. If the result before and after match then we are OK. This process is automated typically in ETL tools.
RE: What are parameter files ? Where do we use them?

Parameter file is any text file where u can define a value for the parameter defined in the informatica session this parameter file can be referenced in the session properties When the informatica sessions runs the values for the parameter is fetched from the specified file. For eg : $ $ABC is defined in the infomatica mapping and the value for this variable is defined in the file called abc.txt as [foldername_session_name] ABC 'hello world In the session properties u can give in the parameter file name field abc.txt
RE: What are the different Lookup methods used in Informatica?

connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values.it does not contain retun port . Unconnected lookup can return only one column. it containn return port.
RE: Session Execution

Truncate Target table: Integration Service truncate target tables before running a session.Depending on the target database and primary key-foreign key relationships in the session target the Integration Service might issue a delete or truncate command. Post -SQL : Enter post-session SQL commands to execute commands against the target database after the Integration Service writes to the target.
RE: Surrogate key

Using the primary key in the DW doesnt actually end up serving as the unique constraint as the data inside a Warehouse represents - a snapshot of a time and hence using the same primary key as used in the Operational warehouse tends to violate this constraint. Hence we also use the Surrogate key also called the Datawarehousing key with moving data from ODS to DW in order the contraints and logical integrity is met.

RE: What is Global and Local shortcut. explain with advantages

Global shortcuts are across multiple repositories Local shortcuts are across multiple folders in the same repository.
RE: where do we use semi and non additive facts Additve: A masure can participate arithmatic calulatons using all or any demensions. Ex: Sales profit Semi additive: A masure can participate arithmatic calulatons using some demensions. Ex: Sales amount Non Additve:A masure can't participate arithmatic calulatons using demensions. Ex: temparature RE: Explain the ETL process diagram from source to final DWH using multi sources and diagram?

(sources)---------->Staging Area(Transformation)---->Target------>Interface-->Reports Data base Flat file cobol file XML file

RE: Develop a mapping to extract the employee drawing ...

I dont think this can by using a mapping. Following sql gives the dept number that gives maximum salary paid to all the employees put together.

SELECT b

.deptno
FROM (SELECT

deptno

SUM(salary) sal FROM emp GROUP BY deptno) b


WHERE

MAX(a.sal) sals FROM (select deptno sum (salary) sal FROM emp GROUP BY deptno) a)
sal (SELECT RE: If LKP on target table is taken, can we update the rows without update strategy transformation?

Update strategy transformation determines whether to insert update delete or reject a record for the target. We can bypass update strategy transformation by creating a router to divide rows based on insert update etc and connecting to one of the multiple instances of target. In the session for that target instance we can check the appropriate box to mark records for insert or update or delete
RE: User Defined Environment Variables

User Defined Environment Variables are used to pass the parameters in job level stage level and sequncer job level.
RE: What is Entity relation?? How is works with Datawa... Ans:Entity is nothing but an Object it has characteristics.We call entity in terms of Logical view.The entity is called as a table in terms of Physical view. The Entity relationship is nothing but maintaining a primary key foreign key relation between the tables for keeping the data and satisfying the Normal form. There are 4 types of Entity Relationships. 1.One-One 2.One-Many 3.Many-One 4.Many-Many. In the Datawarehouse modeling Entity Relationship is nothing but a Relationship between dimension and facts tables(ie:Primary foreign key relations between these tables). The fact table getting data from dimensions tables because it containing primary keys of dimension tables as a foreign keys for getting summarized data for each record.

RE: Extract / Transform Process

Extraction means getting data from different/single source and loading into the staging database in other words brining the data from different sources (like databases or flat diles) into the datastage(or any ETL tool) for further processing. In the extraction we will ensure the nullability and cleansing the data. Transformation: where we have to implement the following: 1.the business logics to the necessary fields 2.populate the primary and foreign keys 3.aggrecation of data. 4.Data conversion if necessary 5.implmentation of SCD's
RE: In What scenario ETL coding is preferred than Data...

We should go for an ETL tool when we have to extract the data from multiple source systems like flat file oracle COBOL etc at one instance. where PL/SQL or SQL can not fit.

2. we can update the target with out using update strategy by setting the session parameters ..if the source is a database.

3. Stop on errors 1 (if you set this option to 1 the session will be stopped after occurance of 1 error row. if it is 0 the session will not be stopped even u got n number of errors.

4. Lookups can be used for validation purpose. RE: What is the methodology and process followed for ETL testing in Data warehouse environment?

Like ETL SPEC they create a document containing the source table (Schema name) and a target table (another schema name) with the logic used in transforming source to target. We have to write database query with the logic contained in the document using source schema and take its output. Now write a simple select statement in the target schema and take its output. Compare the two outputs if they are same well and fine else it's a bug. Database knowledge is a must for ETL testing.
RE: Clean Data before Loading

Warehouse data is used as source data for data analysis and reporting. The data is organized into groups and categories (aggregation) and then summarized upon those groups (dimensions). These groups are based upon exactness.
For example "house" "houses" and "home" would fall into groups because they are not exact. But logically they are the same and should be of the same group. The process of data cleansing would correct this. This is only one example of data cleansing.

The point is that if data is not cleansed then the resulting reports and OLAP cubes will contain too many categories making them hard to read. The results would also be skewed because factual data (totals counts etc) would be distributed across the good and bad categories. Once loaded into the data warehouse it is very difficult if not impossible to change. RE: Data Cleansing Phase

The ultimate goal of data cleansing is to improve the organization's confidence in their data. First set the bar for what kind of quality you are trying to obtain. I usually shoot for a 99 level of confidence in my data. List the types of data errors that need to be addressed such as
1) Missing data - nulls zeros zero length strings and corrupted rows 2) Data that contains unwanted junk such as an apostrophe or a comma or an extra space 3) Numeric data errors such as a negative value that should be positive 4) Telephone numbers in the wrong format. Some errors are database errors and others are business rule errors.

Next write aggregate queries to find errors. (Or use an ETL tool) Analyze the query results or transformation reports and measure the impact if the errors go unfixed and so on....

RE: Data Extraction Tools

1.Informatica 2.Datastage 3.Ab-intio 4.Oracle Warehouse Builder 5.Oracle Data Integrator 6.SSIS

RE: ETL developer Responsibilities

ETL DEveloper analysis the data and develop/understand the ETL specifications. Develop mappings Unit test mappings with the test data Provide support to the code you developed when the code is in QA. Provide support for initial days when code moved to production

RE: Cognos vs Informatica

Cognos and Informatica are not same tool that does the same work. Informatica is a ETL tool. It extracts data from various sources (Oracle db2 Sybase etc) transforms it and loads data into the warehouse. Cognos is a reporting tool. It is used for business purpose to generate some reports (may be pictorial or statistical representation). It uses the datawarehouse information to generate a report. Oracle business intelligence is another reporting tool which is becoming famous today.

Delete Child Record transformations


If i delete one record in master table, then the same record is also deleted in dependent table, for this which transformation can we use.

can we lookup a table from source qualifier transformation. ie. unconnected lookup I think we cannot lookup through a source qualifier as we use the souce to look up the tables so it is not possible i think You cannot lookup from a source qualifier directly. However you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.