Sie sind auf Seite 1von 113

1.

2.

Informatica - Question - Answer

Deleting duplicate row using Informatica


Q1. Suppose we have Duplicate records in Source System and we want to load only the unique
records in the Target System eliminating the duplicate rows. What will be the approach?
Ans.

Let us assume that the source system is a Relational Database . The source table is having duplicate
rows. Now to eliminate duplicate records, we can check the Distinct option of the Source Qualifier of
the source table and load the target accordingly.
Source Qualifier Transformation DISTINCT clause

Deleting duplicate row for FLAT FILE sources


Now suppose the source system is a Flat File. Here in the Source Qualifier you will not be able to
select the distinct clause as it is disabled due to flat file source table. Hence the next approach may be

we use a Sorter Transformationand check the Distinct option. When we select the distinct option all
the columns will the selected as keys, in ascending order by default.

Sorter Transformation DISTINCT clause

Deleting Duplicate Record Using Informatica Aggregator


Other ways to handle duplicate records in source batch run is to use an Aggregator
Transformation and using theGroup By checkbox on the ports having duplicate occurring data. Here
you can have the flexibility to select the last or the first of the duplicate column value records. Apart
from that using Dynamic Lookup Cache of the target table and associating the input ports with the
lookup port and checking the Insert Else Update option will help to eliminate the duplicate records in
source and hence loading unique records in the target.

Loading Multiple Target Tables Based on Conditions


Q2. Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in
two target files one containing the EVEN serial numbers and the other file having the ODD ones.
Ans.
After the Source Qualifier place a Router Transformation . Create two Groups namely EVEN and
ODD, with filter conditions as MOD(SERIAL_NO,2)=0 and MOD(SERIAL_NO,2)=1 respectively. Then
output the two groups into two flat file targets.

Router Transformation Groups Tab

Normalizer Related Questions


Q3. Suppose in our Source Table we have data as given below:

Student Name

Maths

Life Science

Physical Science

Sam

100

70

80

John

75

100

85

Tom

80

100

85

We want to load our Target Table as:

Student Name

Subject Name

Marks

Sam

Maths

100

Sam

Life Science

70

Sam

Physical Science

80

John

Maths

75

John

Life Science

100

John

Physical Science

85

Tom

Maths

80

Tom

Life Science

100

Tom

Physical Science

85

Describe your approach.


Ans.
Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an
Expression Transformation to Decode the column taken into consideration. For more details on how
the mapping is performed please visit Working with Normalizer
Q4. Name the transformations which converts one to many rows i.e increases the i/p:o/p row count.
Also what is the name of its reverse transformation.
Ans.
Normalizer as well as Router Transformations are the Active transformation which can increase the
number of input rows to output rows.
Aggregator Transformation is the active transformation that performs the reverse action.
Q5. Suppose we have a source table and we want to load three target tables based on source rows
such that first row moves to first target table, secord row in second target table, third row in third target
table, fourth row again in first target table so on and so forth. Describe your approach.
Ans.
We can clearly understand that we need a Router transformation to route or filter source data to the
three target tables. Now the question is what will be the filter conditions. First of all we need
an Expression Transformationwhere we have all the source table columns and along with that we
have another i/o port say seq_num, which is gets sequence numbers for each source row from the
port NextVal of a Sequence Generator start value 0 and increment by 1. Now the filter condition for
the three router groups will be:
MOD(SEQ_NUM,3)=1 connected to 1st target table, MOD(SEQ_NUM,3)=2 connected to 2nd target
table, MOD(SEQ_NUM,3)=0 connected to 3rd target table.

Router Transformation Groups Tab

Loading Multiple Flat Files using one mapping


Q6. Suppose we have ten source flat files of same structure. How can we load all the files in target
database in a single batch run using a single mapping.
Ans.
After we create a mapping to load data in target database from flat files, next we move on to the
session property of the Source Qualifier. To load a set of source files we need to create a file say
final.txt containing the source falt file names, ten files in our case and set the Source filetype option
as Indirect. Next point this flat file final.txt fully qualified through Source file directory and Source
filename .
Image: Session Property Flat File
Q7. How can we implement Aggregation operation without using an Aggregator Transformation
in Informatica.
Ans.
We will use the very basic concept of the Expression Transformation that at a time we can access
the previous row data as well as the currently processed data in an expression transformation. What
we need is simple Sorter, Expression and Filter transformation to achieve aggregation at Informatica
level.
For detailed understanding visit Aggregation without Aggregator
Q8. Suppose in our Source Table we have data as given below:

Student Name

Subject Name

Marks

Sam

Maths

100

Tom

Maths

80

Sam

Physical Science

80

John

Maths

75

Sam

Life Science

70

John

Life Science

100

John

Physical Science

85

Tom

Life Science

100

Tom

Physical Science

85

We want to load our Target Table as:

Student Name

Maths

Life Science

Physical Science

Sam

100

70

80

John

75

100

85

Tom

80

100

85

Describe your approach.


Ans.
Here our scenario is to convert many rows to one rows, and the transformation which will help us to
achieve this isAggregator .Our Mapping will look like this:

Mapping using sorter and Aggregator

We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.

Sorter Transformation

Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are
populated as
MATHS: MAX(MARKS, SUBJECT='Maths')
LIFE_SC: MAX(MARKS, SUBJECT='Life Science')
PHY_SC: MAX(MARKS, SUBJECT='Physical Science')

Aggregator Transformation

Revisiting Source Qualifier Transformation


Q9. What is a Source Qualifier? What are the tasks we can perform using a SQ and why it is an
ACTIVE transformation?
Ans.
A Source Qualifier is an Active and Connected Informatica transformation that reads the rows from a
relational database or flat file source.
We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same
source database.
We can use a source filter to reduce the number of rows the Integration Service queries.
We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to
the default SQL query.
We can choose Select Distinct option for relational databases and the Integration Service adds a
SELECT DISTINCT clause to the default SQL query.
Also we can write Custom/Used Defined SQL query which will override the default query in the SQ by
changing the default settings of the transformation properties.
Aslo we have the option to write Pre as well as Post SQL statements to be executed before and after
the SQ query in the source database.
Since the transformation provides us with the property Select Distinct , when the Integration Service

adds a SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows
returned by the Database to the Integration Service and hence it is an Active transformation.
Q10. What happens to a mapping if we alter the datatypes between Source and its corresponding
Source Qualifier?
Ans.
The Source Qualifier transformation displays the transformation datatypes. The transformation
datatypes determine how the source database binds data when the Integration Service reads it.
Now if we alter the datatypes in the Source Qualifier transformation or the datatypes in the source
definition and Source Qualifier transformation do not match, the Designer marks the mapping as
invalid when we save it.
Q11. Suppose we have used the Select Distinct and the Number Of Sorted Ports property in the SQ
and then we add Custom SQL Query. Explain what will happen.
Ans.
Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source
Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation.
Hence only the user defined SQL Query will be fired in the database and all the other options will be
ignored .
Q12. Describe the situations where we will use the Source Filter, Select Distinct and Number Of Sorted
Ports properties of Source Qualifier transformation.
Ans.
Source Filter option is used basically to reduce the number of rows the Integration Service queries so
as to improve performance.
Select Distinct option is used when we want the Integration Service to select unique values from a
source, filtering out unnecessary data earlier in the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion so as
to use the same in some following transformations like Aggregator or Joiner, those when configured for
sorted input will improve the performance.
Q13. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the
OUTPUT PORTS order in SQ transformation do not match?
Ans.
Mismatch or Changing the order of the list of selected columns to that of the connected transformation
output ports may result is session failure.
Q14. What happens if in the Source Filter property of SQ transformation we include keyword WHERE
say, WHERE CUSTOMERS.CUSTOMER_ID > 1000.
Ans.
We use source filter to reduce the number of source records. If we include the string WHERE in the
source filter, the Integration Service fails the session .
Q15. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier
transformation.
Ans.
While joining Source Data of heterogeneous sources as well as to join flat files we will use the

Joiner transformation.
Use the Joiner transformation when we need to join the following types of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
Q16. What is the maximum number we can use in Number Of Sorted Ports for Sybase source system.
Ans.
Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do
not sort more than 16 columns.
Q17. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables
TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after TGT1?
Ans.
If we have multiple Source Qualifier transformations connected to multiple targets, we can designate
the order in which the Integration Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier
transformations in a mapping to specify the required loading order.
Image: Target Load Plan

Target Load Plan Ordering


Q18. Suppose we have a Source Qualifier transformation that populates two target tables. How do you
ensure TGT2 is loaded after TGT1?
Ans.
In the Workflow Manager, we can Configure Constraint based load ordering for a session. The
Integration Service orders the target load on a row-by-row basis. For every row generated by an active
source, the Integration Service loads the corresponding transformed row first to the primary key table,

then to the foreign key table.


Hence if we have one Source Qualifier transformation that provides data for multiple target tables
having primary and foreign key relationships, we will go for Constraint based load ordering.
Image: Constraint based loading

Revisiting Filter Transformation


Q19. What is a Filter Transformation and why it is an Active one?
Ans.
A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next
transformation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition
we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed
through it. A filter condition returns TRUE or FALSE for each row that passes through the
transformation, depending on whether a row meets the specified condition. Only rows that return TRUE
pass through this transformation. Discarded rows do not appear in the session log or reject files.
Q20. What is the difference between Source Qualifier transformations Source Filter to Filter
transformation?
Ans.

SQ Source Filter

Filter Transformation

Source Qualifier
transformation filters rows
when read from a source.

Filter transformation filters rows from


within a mapping

Source Qualifier
transformation can only
filter rows from Relational
Sources.

Filter transformation filters rows coming


from any type of source system in the
mapping level.

Source Qualifier limits the


row set extracted from a
source.

Filter transformation limits the row set sent


to a target.

Source Qualifier reduces the


number of rows used
throughout the mapping and
hence it provides better

To maximize session performance, include


the Filter transformation as close to the
sources in the mapping as possible to filter
out unwanted data early in the flow of data

performance.

from sources to targets.

The filter condition in the


Source Qualifier
transformation only uses
standard SQL as it runs in
the database.

Filter Transformation can define a


condition using any statement or
transformation function that returns either
a TRUE or FALSE value.

Revisiting Joiner Transformation


Q21. What is a Joiner Transformation and why it is an Active one?
Ans.
A Joiner is an Active and Connected transformation used to join source data from the same source
system or from two related heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation
uses a condition that matches one or more pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch.
The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
In the Joiner transformation, we must configure the transformation properties namely Join Condition,
Join Type and Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to
join two rows. Depending on the type of join selected, the Integration Service either adds the row to
the result set or discards the row .
The Joiner transformation produces result sets based on the join type, condition, and input data
sources. Hence it is an Active transformation.
Q22. State the limitations where we cannot use Joiner in the mapping pipeline.
Ans.
The Joiner transformation accepts input from most transformations. However, following are the
limitations:
Joiner transformation cannot be used when either of the input pipeline contains an Update
Strategy transformation.
Joiner transformation cannot be used if we connect a Sequence Generator transformation directly
before the Joiner transformation.
Q23. Out of the two input pipelines of a joiner, which one will you set as the master pipeline?
Ans.
During a session run, the Integration Service compares each row of the master source against the
detail source.
The master and detail sources need to be configured for optimal performance .

To improve performance for an Unsorted Joiner transformation, use the source with fewer rows as
the master source. The fewer unique rows in the master, the fewer iterations of the join comparison
occur, which speeds the join process.
When the Integration Service processes an unsorted Joiner transformation, it reads all master rows
before it reads the detail rows. The Integration Service blocks the detail source while it caches rows
from the master source . Once the Integration Service reads and caches all master rows, it unblocks
the detail source and reads the detail rows.
To improve performance for a Sorted Joiner transformation, use the source with fewer duplicate key
values as the master source.
When the Integration Service processes a sorted Joiner transformation, it blocks data based on the
mapping configuration and it stores fewer rows in the cache, increasing performance. Blocking logic is
possible if master and detail input to the Joiner transformation originate from different sources .
Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
Q24. What are the different types of Joins available in Joiner Transformation?
Ans.
In SQL, a join is a relational operator that combines data from multiple tables into a single result set.
The Joiner transformation is similar to an SQL join except that data can originate from different types of
sources.
The Joiner transformation supports the following types of joins :
Normal
Master Outer
Detail Outer
Full Outer

Join Type property of Joiner Transformation

Note: A normal or master outer join performs faster than a full outer or detail outer join.
Q25. Define the various Join Types of Joiner Transformation.
Ans.
In a normal join , the Integration Service discards all rows of data from the master and detail source
that do not match, based on the join condition.
A master outer join keeps all rows of data from the detail source and the matching rows from the
master source. It discards the unmatched rows from the master source.
A detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
A full outer join keeps all rows of data from both the master and detail sources.
Q26. Describe the impact of number of join conditions and join order in a Joiner Transformation.
Ans.
We can define one or more conditions based on equality between the specified master and detail
sources.
Both ports in a condition must have the same datatype . If we need to use two ports in the join
condition with non-matching datatypes we must convert the datatypes so that they match. The
Designer validates datatypes in a join condition.
Additional ports in the join condition increases the time necessary to join two sources.

The order of the ports in the join condition can impact the performance of the Joiner transformation. If
we use multiple ports in the join condition, the Integration Service compares the ports in the order we
specified.
NOTE: Only equality operator is available in joiner join condition.
Q27. How does Joiner transformation treat NULL value matching.
Ans.
The Joiner transformation does not match null values .
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service
does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and
then join on the default values.
Note: If a result set includes fields that do not contain data in either of the sources, the Joiner
transformation populates the empty fields with null values. If we know that a field will return a NULL
and we do not want to insert NULLs in the target, set a default value on the Ports tab for the
corresponding port.
Q28. Suppose we configure Sorter transformations in the master and detail pipelines with the following
sorted ports in order: ITEM_NO, ITEM_NAME, PRICE.
When we configure the join condition, what are the guidelines we need to follow to maintain the sort
order?
Ans.
If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO,
ITEM_NAME and PRICE we must ensure that:
Use ITEM_NO in the First Join Condition.
If we add a Second Join Condition, we must use ITEM_NAME.
If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in
the Second Join Condition.
If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the
Integration Service fails the session .
Q29. What are the transformations that cannot be placed between the sort origin and the Joiner
transformation so that we do not lose the input sort order.
Ans.
The best option is to place the Joiner transformation directly after the sort origin to maintain sorted
data.
However do not place any of the following transformations between the sort origin and the Joiner
transformation:
Custom
Unsorted Aggregator
Normalizer
Rank
Union transformation
XML Parser transformation

XML Generator transformation


Mapplet [if it contains any one of the above mentioned transformations]
Q30. Suppose we have the EMP table as our source. In the target we want to view those employees
whose salary is greater than or equal to the average salary for their departments.
Describe your mapping approach.Ans.
Our Mapping will look like this:
Image: Mapping using Joiner
To start with the mapping we need the following transformations:
After the Source qualifier of the EMP table place a Sorter Transformation . Sort based
on DEPTNO port.

Sorter Ports Tab

Next we place a Sorted Aggregator Transformation . Here we will find out the AVERAGE
SALARY for each (GROUP BY) DEPTNO .
When we perform this aggregation, we lose the data for individual employees. To maintain employee
data, we must pass a branch of the pipeline to the Aggregator Transformation and pass a branch with
the same sorted source data to the Joiner transformation to maintain the original data. When we join
both branches of the pipeline, we join the aggregated data with the original data.

Aggregator Ports Tab

Aggregator Properties Tab

So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original
data, based onDEPTNO .
Here we will be taking the aggregated pipeline as the Master and original dataflow as Detail Pipeline.

Joiner Condition Tab

Joiner Properties Tab

After that we need a Filter Transformation to filter out the employees having salary less than average
salary for their department.
Filter Condition: SAL>=AVG_SAL

Filter Properties Tab

Lastly we have the Target table instance.

Revisiting Sequence Generator Transformation


Q31. What is a Sequence Generator Transformation?
Ans.
A Sequence Generator transformation is a Passive and Connected transformation that generates
numeric values.
It is used to create unique primary key values, replace missing primary keys, or cycle through a
sequential range of numbers.
This transformation by default contains ONLY Two OUTPUT ports namely CURRVAL and NEXTVAL .
We cannot edit or delete these ports neither we cannot add ports to this unique transformation.
We can create approximately two billion unique numeric values with the widest range from 1 to
2147483647.
Q32. Define the Properties available in Sequence Generator transformation in brief.
Ans.

Sequence
Generator
Properties

Description

Start Value

Start value of the generated sequence that we want the


Integration Service to use if we use the Cycle option. If
we select Cycle, the Integration Service cycles back to
this value when it reaches the end value.
Default is 0.

Difference between two consecutive values from the


Increment By NEXTVAL port.
Default is 1.

End Value

Maximum value generated by SeqGen. After reaching


this value the session will fail if the sequence generator is
not configured to cycle.
Default is 2147483647.

Current value of the sequence. Enter the value we want


the Integration Service to use as the first value in the
Current Value
sequence.
Default is 1.

Cycle

If selected, when the Integration Service reaches the


configured end value for the sequence, it wraps around
and starts the cycle again, beginning with the configured
Start Value.

Number of
Cached
Values

Number of sequential values the Integration Service


caches at a time.
Default value for a standard Sequence Generator is 0.
Default value for a reusable Sequence Generator is 1,000.

Reset

Restarts the sequence at the current value each time a


session runs.
This option is disabled for reusable Sequence Generator
transformations.

Q33. Suppose we have a source table populating two target tables. We connect the NEXTVAL port of
the Sequence Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence
values in both of them.
Ans.
When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key
columns of the target tables, the Sequence number will not be the same .
A block of sequence numbers is sent to one target tables surrogate key column. The second targets

receives a block of sequence numbers from the Sequence Generator transformation only after the first
target table receives the block of sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as
TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and
Increment by 1.
Now suppose the requirement is like that we need to have the same surrogate keys in both the
targets.
Then the easiest way to handle the situation is to put an Expression Transformation in between the
Sequence Generator and the Target tables. The SeqGen will pass unique values to the expression
transformation, and then the rows are routed from the expression transformation to the targets.

Sequence Generator
Q34. Suppose we have 100 records coming from the source. Now for a target column population we
used a Sequence generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Ans.
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value
the session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the
sequence, i.e. whenever the Integration Service reaches the configured end value for the sequence, it
wraps around and starts the cycle again, beginning with the configured Start Value.
Q35. What are the changes we observe when we promote a non resuable Sequence Generator to a
resuable one?
And what happens if we set the Number of Cached Values to 0 for a reusable transformation?
Ans.

When we convert a non reusable sequence generator to resuable one we observe that the Number of
Cached Values is set to 1000 by default; And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0
in the Transformation Developer we encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.

Which is the fastest? Informatica or Oracle?


In our previous article, we tested the performance of ORDER BY operation in Informatica and
Oracle and found that, in our test condition, Oracle performs sorting 14% speedier than Informatica.
This time we will look into the JOIN operation, not only because JOIN is the single most important data
set operation but also because performance of JOIN can give crucial data to a developer in order to
develop proper push down optimization manually.
Informatica is one of the leading data integration tools in todays world. More than 4,000 enterprises
worldwide rely on Informatica to access, integrate and trust their information assets with it. On the
other hand, Oracle database is arguably the most successful and powerful RDBMS system that is
trusted from 1980s in all sorts of business domain and across all major platforms. Both of these
systems are bests in the technologies that they support. But when it comes to the application
development, developers often face challenge to strike the right balance of operational load sharing
between these systems. This article will help them to take the informed decision.

Which JOINs data faster? Oracle or Informatica?


As an application developer, you have the choice of either using joining syntaxes in database level to
join your data or using JOINER TRANSFORMATION in Informatica to achieve the same outcome. The
question is which system performs this faster?

Test Preparation
We will perform the same test with 4 different data points (data volumes) and log the results. We will
start with 1 million data in detail table and 0.1 million in master table. Subsequently we will test with 2
million, 4 million and 6 million detail table data volumes and 0.2 million, 0.4 million and 0.6 million
master table data volumes. Here are the details of the setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
5. Source database table is not available in Oracle shared pool before the same is read
6. There is no session level partition in Informatica PowerCentre
7. There is no parallel hint provided in extraction SQL query
8. Informatica JOINER has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre
designer. The first mapping m_db_side_join will use an INNER JOIN clause in the source qualifier to
sort data in database level. Second mapping m_Infa_side_join will use an Informatica JOINER to JOIN
data in informatica level. We have executed these mappings with different data points and logged the
result.

Further to the above test we will execute m_db_side_join mapping once again, this time with proper
database side indexes and statistics and log the results.

Result
The following graph shows the performance of Informatica and Database in terms of time taken by
each system to sort data. The average time is plotted along vertical axis and data points are
plotted along horizontal axis.

Data Points Master Table Record Count

Detail Table Record Count

0.1 M

1M

0.2 M

2M

0.4 M

4M

0.6 M

6M

Verdict
In our test environment, Oracle 10g performs JOIN operation 24% faster than
Informatica Joiner Transformation while without Index and 42% faster with
Database Index
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments

Note

1. This data can only be used for performance comparison but cannot be used for performance
benchmarking.
2. This data is only indicative and may vary in different testing conditions.

Which is the fastest? Informatica or Oracle?


Informatica is one of the leading data integration tools in todays world. More than 4,000 enterprises
worldwide rely on Informatica to access, integrate and trust their information assets with it. On the
other hand, Oracle database is arguably the most successful and powerful RDBMS system that is
trusted from 1980s in all sorts of business domain and across all major platforms. Both of these
systems are bests in the technologies that they support. But when it comes to the application
development, developers often face challenge to strike the right balance of operational load sharing
between these systems.
Think about a typical ETL operation often used in enterprise level data integration. A lot of data
processing can be either redirected to the database or to the ETL tool. In general, both the database
and the ETL tool are reasonably capable of doing such operations with almost same efficiency and
capability. But in order to achieve the optimized performance, a developer must carefully consider and
decide which system s/he should be trusting with for each individual processing task.
In this article, we will take a basic database operation Sorting, and we will put these two systems to
test in order to determine which does it faster than the other, if at all.

Which sorts data faster? Oracle or Informatica?


As an application developer, you have the choice of either using ORDER BY in database level to sort
your data or using SORTER TRANSFORMATION in Informatica to achieve the same outcome. The
question is which system performs this faster?

Test Preparation
We will perform the same test with different data points (data volumes) and log the results. We will start
with 1 million records and we will be doubling the volume for each next data points. Here are the
details of the setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
5. Source database table is not available in Oracle shared pool before the same is read
6. There is no session level partition in Informatica PowerCentre
7. There is no parallel hint provided in extraction SQL query
8. The source table has 10 columns and first 8 columns will be used for sorting
9. Informatica sorter has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre
designer. The first mapping m_db_side_sort will use an ORDER BY clause in the source qualifier to
sort data in database level. Second mapping m_Infa_side_sort will use an Informatica sorter to sort
data in informatica level. We have executed these mappings with different data points and logged the
result.

Result

The following graph shows the performance of Informatica and Database in terms of time taken by
each system to sort data. The time is plotted along vertical axis and data volume is plotted along
horizontal axis.

Verdict
The above experiment demonstrates that
Oracle database is faster in SORT operation
than Informatica by an average factor of 14%.
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments

Note
This data can only be used for performance comparison but cannot be used for performance
benchmarking.

Informatica Reject File - How to Identify rejection reason


Saurav Mitra
inShare0
0diggsdigg
.

When we run a session, the integration service may create a reject file for each target instance in the
mapping to store the target reject record. With the help of the Session Log and Reject File we can
identify the cause of data rejection in the session. Eliminating the cause of rejection will lead to
rejection free loads in the subsequent session runs. If theInformatica Writer or the Target
Database rejects data due to any valid reason the integration service logs the rejected records into the
reject file. Every time we run the session the integration service appends the rejected records to the
reject file.

Working with Informatica Bad Files or Reject Files


By default the Integration service creates the reject files or bad files in the $PMBadFileDir process
variable directory. It writes the entire reject record row in the bad file although the problem may be in
any one of the Columns. The reject files have a default naming convention
like [target_instance_name].bad . If we open the reject file in an editor we will see comma separated
values having some tags/ indicator and some data values. We will see two types of Indicatorsin the
reject file. One is the Row Indicator and the other is the Column Indicator .
For reading the bad file the best method is to copy the contents of the bad file and saving the same as
a CSV (Comma Sepatared Value) file. Opening the csv file will give an excel sheet type look and feel.
The firstmost column in the reject file is the Row Indicator , that determines whether the row was
destined for insert, update, delete or reject. It is basically a flag that determines the Update Strategy for
the data row. When the Commit Type of the session is configured as User-defined the row indicator
indicates whether the transaction was rolled back due to a non-fatal error, or if the committed
transaction was in a failed target connection group.

List of Values of Row Indicators:

Row Indicator

Indicator Significance

Rejected By

Insert

Writer or target

Update

Writer or target

Delete

Writer or target

Reject

Writer

Rolled-back insert

Writer

Rolled-back update

Writer

Rolled-back delete

Writer

Committed insert

Writer

Committed update

Writer

Committed delete

Writer

Now comes the Column Data values followed by their Column Indicators, that determines the data
quality of the corresponding Column.

List of Values of Column Indicators:


>

Column
Indicator

Type of data

Writer Treats As

Valid data or
Good Data.

Writer passes it to the target database. The


target accepts it unless a database error
occurs, such as finding a duplicate key
while inserting.

Overflowed
Numeric Data.

Numeric data exceeded the specified


precision or scale for the column. Bad data,
if you configured the mapping target to
reject overflow or truncated data.

Null Value.

The column contains a null value. Good


data. Writer passes it to the target, which
rejects it if the target database does not
accept null values.

String data exceeded a specified precision


for the column, so the Integration Service
Truncated String
truncated it. Bad data, if you configured
Data.
the mapping target to reject overflow or
truncated data.

Also to be noted that the second column contains column indicator flag value 'D' which signifies that
the Row Indicator is valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

Implementing Informatica Incremental Aggregation

Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate
calculations in a session. If the source changes incrementally and we can capture the changes, then
we can configure the session to process those changes. This allows the Integration Service to update
the target incrementally, rather than forcing it to delete previous loads data, process the entire source
data and recalculate the same data each time you run the session.

Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we
will use the entire source. This allows the Integration Service to read and store the necessary
aggregate data information. On 2nd week of Jan, when we run the session again, we will filter out the
CDC records from the source i.e the records loaded after the initial load. The Integration Service then
processes these new data and updates the target accordingly.
Use incremental aggregation when the changes do not significantly change the target. If
processing the incrementally changed source alters more than half the existing target, the session may
not benefit from using incremental aggregation. In this case, drop the table and recreate the target with
entire source data and recalculate the same aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly
facts in a weekly basis.
Let us see a sample mapping to implement incremental aggregation:
Image: Incremental Aggregation Sample Mapping
Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL
table that saves the last successful load date for the particular mapping.
Image: Incremental Aggregation Source Qualifier

Look at the ports tab of Expression transformation.

Look at the ports tab of Aggregator Transformation.

Now the most important session properties configuation to implement incremental Aggregation

If we want to reinitialize the aggregate cache suppose during first week of every month we will
configure another session same as the previous session the only change being the Reinitialize
aggregate cache property checked in

Now have a look at the source table data:

CUSTOMER_KEY INVOICE_KEY

AMOUNT

LOAD_DATE

1111

5001

100

01/01/2010

2222

5002

250

01/01/2010

3333

5003

300

01/01/2010

1111

6007

200

07/01/2010

1111

6008

150

07/01/2010

2222

6009

250

07/01/2010

4444

1234

350

07/01/2010

5555

6157

500

07/01/2010

After the first Load on 1st week of Jan 2010, the data in the target is as follows:

CUSTOMER_KEY

INVOICE_KEY

MON_KEY

AMOUNT

1111

5001

201001

100

2222

5002

201001

250

3333

5003

201001

300

Now during the 2nd week load it will process only the incremental data in the source i.e those records
having load date greater than the last session run date. After the 2nd weeks load after incremental
aggregation of the incremental source data with the aggregate cache file data will update the target
table with the following dataset:

CUSTOMER_KEY INVOICE_KEY

MON_KEY

AMOUNT

Remarks/Operation

1111

6008

201001

450

The cache file updated after


aggretation

2222

6009

201001

500

The cache file updated after


aggretation

3333

5003

201001

300

The cache file remains the same


as before

4444

1234

201001

350

New group row inserted in cache


file

5555

6157

201001

500

New group row inserted in cache


file

The first time we run an incremental aggregation session, the Integration Service processes the entire
source. At the end of the session, the Integration Service stores aggregate data for that session run in

two files, the index file and the data file. The Integration Service creates the files in the cache directory
specified in the Aggregator transformation properties.Each subsequent time we run the session with
incremental aggregation, we use the incremental source changes in the session. For each input
record, the Integration Service checks historical information in the index file for a corresponding group.
If it finds a corresponding group, the Integration Service performs the aggregate operation
incrementally, using the aggregate data for that group, and saves the incremental change. If it does not
find a corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target. It saves
modified aggregate data in the index and data files to be used as historical data the next time you run
the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service creates
a backup of the incremental aggregation files. The cache directory for the Aggregator transformation
must contain enough disk space for two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we
configure the session toreinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is
lost.
Note: To protect the incremental aggregation files from file corruption or disk failure,
periodically back up the files.

Using Informatica Normalizer Transformation


Saurav Mitra
inShare0
0diggsdigg
.
Normalizer, a native transformation in Informatica, can ease many complex data transformation
requirement. Learn how to effectively use normalizer here.

Using Noramalizer Transformation


A Normalizer is an Active transformation that returns multiple rows from a source row, it returns
duplicate data for single-occurring source columns. The Normalizer transformation parses multipleoccurring columns from COBOL sources, relational tables, or other sources. Normalizer can be used
to transpose the data in columns to rows.
Normalizer effectively does the opposite of what Aggregator does!

Example of Data Transpose using Normalizer


Think of a relational table that stores four quarters of sales by store and we need to create a row for
each sales occurrence. We can configure a Normalizer transformation to return a separate row for
each quarter like below..

The following source rows contain four quarters of sales by store:


Source Table

Store

Quarter1

Quarter2

Quarter3

Quarter4

Store1

100

300

500

700

Store2

250

450

650

850

The Normalizer returns a row for each store and sales combination. It also returns an index(GCID) that
identifies the quarter number:
Target Table

Store

Sales

Quarter

Store 1

100

Store 1

300

Store 1

500

Store 1

700

Store 2

250

Store 2

450

Store 2

650

Store 2

850

How Informatica Normalizer Works


Suppose we have the following data in source:

Name

Month

Transportation

House Rent

Food

Sam

Jan

200

1500

500

John

Jan

300

1200

300

Tom

Jan

300

1350

350

Sam

Feb

300

1550

450

John

Feb

350

1200

290

Tom

Feb

350

1400

350

and we need to transform the source data and populate this as below in the target table:

Name

Month

Expense Type

Expense

Sam

Jan

Transport

200

Sam

Jan

House rent

1500

Sam

Jan

Food

500

John

Jan

Transport

300

John

Jan

House rent

1200

John

Jan

Food

300

Tom

Jan

Transport

300

Tom

Jan

House rent

1350

Tom

Jan

Food

350

.. like this.
Now below is the screen-shot of a complete mapping which shows how to achieve this result using
Informatica PowerCenter Designer.Image: Normalization Mapping Example 1
I will explain the mapping further below.

Setting Up Normalizer Transformation Property


First we need to set the number of occurences property of the Expense head as 3 in the Normalizer
tab of the Normalizer transformation, since we have Food,Houserent and Transportation.
Which in turn will create the corresponding 3 input ports in the ports tab along with the fields Individual
and Month

In the Ports tab of the Normalizer the ports will be created automatically as configured in the
Normalizer tab.
Interestingly we will observe two new columns namely,

GK_EXPENSEHEAD
GCID_EXPENSEHEAD
GK field generates sequence number starting from the value as defined in Sequence field while GCID
holds the value of the occurence field i.e. the column no of the input Expense head.

Here 1 is for FOOD, 2 is for HOUSERENT and 3 is for TRANSPORTATION.

Now the GCID will give which expense corresponds to which field while converting columns to rows.
Below is the screen-shot of the expression to handle this GCID efficiently:
Image: Expression to handle GCID

Informatica Dynamic Lookup Cache


A LookUp cache does not change once built. But what if the underlying lookup table changes the data
after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if
the underlying table changes?

Dynamic Lookup Cache


Let's think about this scenario. You are loading your target table through a mapping. Inside the
mapping you have a Lookup and in the Lookup, you are actually looking up the same target
table you are loading. You may ask me, "So? What's the big deal? We all do it quite often...".
And yes you are right. There is no "big deal" because Informatica (generally) caches the lookup table
in the very beginning of the mapping, so whatever record getting inserted to the target table through
the mapping, will have no effect on the Lookup cache. The lookup will still hold the previously cached
data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing?
What if you want your lookup cache to always show the exact snapshot of the data in your target table

at that point in time? Clearly this requirement will not be fullfilled in case you use a static cache. You
will need a dynamic cache to handle this.

But why anyone will need a dynamic cache?


To understand this, let's first understand a static cache scenario.

Informatica Dynamic Lookup Cache


Saurav Mitra

Article Index
Informatica Dynamic Lookup Cache
What is Static Cache
What is Dynamic Cache
How does dynamic cache work
Dynamic Lookup Mapping Example
Dynamic Lookup Sequence ID
Dynamic Lookup Ports
NULL handling in LookUp
Other Details
All Pages
Page 1 of 9
inShare0
0diggsdigg

.
A LookUp cache does not change once built. But what if the underlying lookup table changes the data
after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if
the underlying table changes?

Dynamic Lookup Cache

Let's think about this scenario. You are loading your target table through a mapping. Inside the
mapping you have a Lookup and in the Lookup, you are actually looking up the same target
table you are loading. You may ask me, "So? What's the big deal? We all do it quite often...".
And yes you are right. There is no "big deal" because Informatica (generally) caches the lookup table
in the very beginning of the mapping, so whatever record getting inserted to the target table through
the mapping, will have no effect on the Lookup cache. The lookup will still hold the previously cached
data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing?
What if you want your lookup cache to always show the exact snapshot of the data in your target table
at that point in time? Clearly this requirement will not be fullfilled in case you use a static cache. You
will need a dynamic cache to handle this.

But why anyone will need a dynamic cache?

To understand this, let's first understand a static cache scenario.


Static Cache Scenario
Let's suppose you run a retail business and maintain all your customer information in a customer
master table (RDBMS table). Every night, all the customers from your customer master table is loaded
in to a Customer Dimension table in your data warehouse. Your source customer table is a transaction
system table, probably in 3rd normal form, and does not store history. Meaning, if a customer changes
his address, the old address is updated with the new address. But your data warehouse table stores
the history (may be in the form of SCD Type-II). There is a map that loads your data warehouse table
from the source table. Typically you do a Lookup on target (static cache) and check with your every
incoming customer record to determine if the customer is already existing in target or not. If the
customer is not already existing in target, you conclude the customer is new and INSERT the record
whereas if the customer is already existing, you may want to update the target record with this new
record (if the record is updated). This is illustrated below, You don't need dynamic Lookup cache for
this
Image: A static Lookup Cache to determine if a source record is new or updatable
Dynamic Lookup Cache Scenario
Notice in the previous example I mentioned that your source table is an RDBMS table. This ensures
that your source table does not have any duplicate record.
But, What if you had a flat file as source with many duplicate records?
Would the scenario be same? No, see the below illustration.

Image: A Scenario illustrating the use of dynamic lookup cache


Here are some more examples when you may consider using dynamic lookup,

Updating a master customer table with both new and updated customer information
coming together as shown above
Loading data into a slowly changing dimension table and a fact table at the same
time. Remember, you typically lookup the dimension while loading to fact. So you load dimension table
before loading fact table. But using dynamic lookup, you can load both simultaneously.
Loading data from a file with many duplicate records and to eliminate duplicate
records in target by updating a duplicate row i.e. keeping the most recent row or the initial row

Loading the same data from multiple sources using a single mapping. Just consider
the previous Retail business example. If you have more than one shops and Linda has visited two of
your shops for the first time, customer record Linda will come twice during the same load.

So, How does dynamic lookup work?


When the Integration Service reads a row from the source, it updates the lookup cache by performing
one of the following actions:

Inserts the row into the cache: If the incoming row is not in the cache, the
Integration Service inserts the row in the cache based on input ports or generated Sequence-ID. The
Integration Service flags the row as insert.
Updates the row in the cache: If the row exists in the cache, the Integration Service
updates the row in the cache based on the input ports. The Integration Service flags the row as
update.
Makes no change to the cache: This happens when the row exists in the cache and
the lookup is configured or specified To Insert New Rows only or, the row is not in the cache and
lookup is configured to update existing rows only or, the row is in the cache, but based on the lookup
condition, nothing changes. The Integration Service flags the row as unchanged.
Notice that Integration Service actually flags the rows based on the above three conditions.
And that's a great thing, because, if you know the flag you can actually reroute the row to achieve
different logic. This flag port is called

NewLookupRow
Using the value of this port, the rows can be routed for insert, update or to do nothing. You just need to
use a Router or Filter transformation followed by an Update Strategy.
Oh, forgot to tell you the actual values that you can expect in NewLookupRow port are:

0 = Integration Service does not update or insert the row in the cache.
1 = Integration Service inserts the row into the cache.
2 = Integration Service updates the row in the cache.
When the Integration Service reads a row, it changes the lookup cache depending on the results of the
lookup query and the Lookup transformation properties you define. It assigns the value 0, 1, or 2 to the
NewLookupRow port to indicate if it inserts or updates the row in the cache, or makes no change.

Posted 2nd June 2012 by Shankar Prasad


0

Add a comment

3.
JUN

Datawarehouse and Informatica Interview Question

Datawarehouse and Informatica Interview Question


*******************Shankar Prasad*******************************

1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are
associated with one Fact Table ur project?
Ans: Yes
2.What is ROLAP, MOLAP, and DOLAP...?
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP
(Desktop OLAP). In these three OLAP
architectures, the interface to the analytic layer is typically the same; what is
quite different is how the data is physically stored.
In MOLAP, the premise is that online analytical processing is best
implemented by storing the data multidimensionally; that is,
data must be stored multidimensionally in order to be viewed in a
multidimensional manner.
In ROLAP, architects believe to store the data in the relational model; for
instance, OLAP capabilities are best provided
against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It
creates multidimensional datasets that can be
transferred from server to desktop, requiring only the DOLAP software to exist
on the target system. This provides significant
advantages to portable computer users, such as salespeople who are
frequently on the road and do not have direct access to
their office server.
3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: Multidimensional Database There are two primary technologies that are
used for storing the data used in OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational
databases (RDBMS). The major difference
between MDDBs and RDBMSs is in how they store data. Relational
databases store their data in a series of tables and
columns. Multidimensional databases, on the other hand, store their data
in a large multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales
with Date, Product, and Location coordinates of
12-1-2001, Car, and south, respectively.

Advantages of MDDB:
Retrieval is very fast because
The data corresponding to any combination of dimension members can be retrieved
with a single I/O.
Data is clustered compactly in a multidimensional array.
Values are caluculated ahead of time.

The index is small and can therefore usually reside completely in memory.
Storage is very efficient because
The blocks contain only data.
A single index locates the block corresponding to a combination of sparse
dimension numbers.
4. What is MDB modeling and RDB Modeling?
Ans:
5. What is Mapplet and how do u create Mapplet?
Ans: A mapplet is a reusable object that represents a set of transformations. It
allows you to reuse transformation logic and can
contain as many transformations as you need.
Create a mapplet when you want to use a standardized set of transformation
logic in several mappings. For example, if you
have a several fact tables that require a series of dimension keys, you can
create a mapplet containing a series of Lookup
transformations to find each dimension key. You can then use the mapplet in
each fact table mapping, rather than recreate the
same lookup logic in each mapping.
To create a new mapplet:
1. In the Mapplet Designer, choose Mapplets-Create Mapplet.
2. Enter a descriptive mapplet name.
The recommended naming convention for mapplets is mpltMappletName.
3. Click OK.
The Mapping Designer creates a new mapplet in the Mapplet Designer.
4. Choose Repository-Save.
6. What for is the transformations are used?
Ans: Transformations are the manipulation of data from how it appears in the source
system(s) into another form in the data
warehouse or mart in a way that enhances or simplifies its meaning. In short,
u transform data into information.

This includes Datamerging, Cleansing, Aggregation: Datamerging: Process of standardizing data types and fields. Suppose one
source system calls integer type data as smallint
where as another calls similar data as decimal. The data from the two source
systems needs to rationalized when moved into
the oracle data format called number.
Cleansing: This involves identifying any changing inconsistencies or
inaccuracies.
Eliminating inconsistencies in the data from multiple sources.
Converting data from different systems into single consistent data set suitable for
analysis.
Meets a standard for establishing data elements, codes, domains, formats and
naming conventions.
Correct data errors and fills in for missing data values.
Aggregation: The process where by multiple detailed values are combined into a
single summary value typically summation numbers representing dollars spend or
units sold.
Generate summarized data for use in aggregate fact and dimension tables.

Data Transformation is an interesting concept in that some transformation


can occur during the extract, some during the
transformation, or even in limited cases--- during load portion of the ETL
process. The type of transformation function u
need will most often determine where it should be performed. Some
transformation functions could even be performed in more
than one place. Bze many of the transformations u will want to perform
already exist in some form or another in more than
one of the three environments (source database or application, ETL tool, or the
target db).
7. What is the difference btween OLTP & OLAP?
Ans: OLTP stand for Online Transaction Processing. This is standard, normalized
database structure. OLTP is designed for
Transactions, which means that inserts, updates, and deletes must be fast. Imagine
a call center that takes orders. Call takers are continually taking calls and entering
orders that may contain numerous items. Each order and each item must be inserted
into a database. Since the performance of database is critical, we want to maximize
the speed of inserts (and updates and deletes). To maximize performance, we
typically try to hold as few records in the database as possible.
OLAP stands for Online Analytical Processing. OLAP is a term that means many
things to many people. Here, we will use the term OLAP and Star Schema pretty
much interchangeably. We will assume that star schema database is an OLAP
system.( This is not the same thing that Microsoft calls OLAP; they extend OLAP to
mean the cube structures built using their product, OLAP Services). Here, we will
assume that any system of read-only, historical, aggregated data is an OLAP
system.
A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is
almost always used to support decision-making in the organization. That is why
many data warehouses are considered to beDSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanisms for read-only,
historical, aggregated data.
By read-only, we mean that the person looking at the data wont be changing it. If
a user wants at the sales yesterday for a certain product, they should not have the
ability to change that number.
The historical part may just be a few minutes old, but usually it is at least a day
old.A data warehouse usually holds data that goes back a certain period in time,
such as five years. In contrast, standard OLTP systems usually only hold data as long
as it is current or active. An order table, for example, may move orders to an
archive table once they have been completed, shipped, and received by the
customer.
When we say that data warehouses and data marts hold aggregated data, we need
to stress that there are many levels of aggregation in a typical data warehouse.
8. If data source is in the form of Excel Spread sheet then how do use?

Ans: PowerMart and PowerCenter treat a Microsoft Excel source as a relational


database, not a flat file. Like relational sources,
the Designer uses ODBC to import a Microsoft Excel source. You do not need
database permissions to import Microsoft
Excel sources.
To import an Excel source definition, you need to complete the
following tasks:
Install the Microsoft Excel ODBC driver on your system.
Create a Microsoft Excel ODBC data source for each source file in the
ODBC 32-bit Administrator.
Prepare Microsoft Excel spreadsheets by defining ranges and
formatting columns of numeric data.
Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the
Designer. Ranges display as source definitions
when you import the source.
9. Which db is RDBMS and which is MDDB can u name them?
Ans: MDDB ex. Oracle Express Server(OES), Essbase by
Powerplay by Cognos and
RDBMS ex. Oracle , SQL Server etc.

Hyperion

Software,

10. What are the modules/tools in Business Objects? Explain theier purpose briefly?
Ans:
BO
Designer,
Business
Query
for
Excel, BO
Reporter,
Infoview,Explorer,WEBI, BO Publisher, and Broadcast Agent, BO
ZABO).
InfoView: IT portal entry into WebIntelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
WebIntelligence: Integrated query, reporting, and OLAP analysis over the
Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast prebuilt reports and spreadsheets, including event
notification and response capabilities, event filtering, and
calendar based notification, over the LAN, email, pager,Fax, Personal Digital Assistant( PDA), Short
Messaging Service(SMS), etc.
Set Analyzer - Applies set-based analysis to perform functions such as
execlusion, intersections, unions, and overlaps
visually.
Developer Suite Build packaged, analytical, or customized apps.
11.What are the Ad hoc quries, Canned Quries/Reports? and How do u create them?
(Plz check this pageC\:BObjects\Quries\Data Warehouse - About Queries.htm)
Ans: The data warehouse will contain two types of query. There will be fixed
queries that are clearly defined and well understood, such as regular reports,
canned queries (standard reports) and common aggregations. There will also
be ad hoc queries that are unpredictable, both in quantity and frequency.

Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a
database. Any business analyst wants to know what is inside the database. He then
proceeds by calculating totals, averages, maximum and minimum values for most
attributes within the database. These are unpredictable element of a data
warehouse. It is exactly that ability to run any query when desired and expect a
reasonable response that makes the data warhouse worthwhile, and makes the
design such a significant challenge.
The end-user access tools are capable of automatically generating the database
query that answers any Question posed by the user. The user will typically pose
questions in terms that they are familier with (for example, sales by
store last
week); this is converted into the database query by the access tool, which is aware
of the structure of information within the data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned
queries contain prompts that allow you to customize the query for your specific
needs. For example, a prompt may ask you for a School, department, term, or
section ID. In this instance you would enter the name of the School, department or
term, and the query will retrieve the specified data from the Warehouse.You can
measure resource requirements of these queries, and the results can be used for
capacity palnning and for database design.
The main reason for using a canned query or report rather than creating your own is
that your chances of misinterpreting data or getting the wrong answer are reduced.
You are assured of getting the right data and the right answer.
12. How many Fact tables and how many dimension tables u did? Which table
precedes what?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Ans: Because its denormalized structure , i.e., Dimension Tables are denormalized.
Why to denormalize means the first (and often
only) answer is : speed. OLTP structure is designed for data inserts, updates,
and deletes, but not data retrieval. Therefore,
we can often squeeze some speed out of it by denormalizing some of the
tables and having queries go against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same
recordset. Joins are also confusing to many
End users. By denormalizing, we can present the user with a view of the data
that is far easier for them to understand.

Benefits of STAR SCHEMA:


Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports drilling in reports.
Flexibility to meet business and technical needs.
15. How do u load the data using Informatica?
Ans: Using session.

16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to
use FTP without a special utility?
Ans: (i): The FTP (File Transfer Protocol) utility program is commonly used for
copying files to and from other computers. These
computers may be at the same site or at different sites thousands of miles
apart. FTP is general protocol that works on UNIX
systems as well as other non- UNIX systems.
(ii): Remote connect commands:
ftp machinename
ex: ftp 129.82.45.181 or ftp iesg
If the remote machine has been reached successfully, FTP responds by asking
for a loginname andpassword. When u enter
ur own loginname and password for the remote machine, it returns the prompt
like below
ftp>
and permits u access to ur own home directory on the remote machine. U
should be able to move around in ur own directory
and to copy files to and from ur local machine using the FTP interface
commands.
Note: U can set the mode of file transfer to ASCII ( default and transmits
seven bits per character).
Use the ASCII mode with any of the following:
- Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text
documents)
- SPSS Portable files.
- HTML files.
If u set mode of file transfer to Binary (the binary mode transmits all
eight bits per byte and thus provides less chance of
a transmission error and must be used to transmit files other than ASCII
files).
For example use binary mode for the following types of files:
- SPSS System files
- SAS Dataset
- Graphic files (eg., *.gif, *.jpg, *.bmp, etc.)
- Microsoft Office documents (*.doc, *.xls, etc.)

1.
2.

3.
4.

(iii): Yes. If u r using Windows, u can access a text-based FTP utility from a
DOS prompt.
To do this, perform the following steps:
From the Start Programs MS-Dos Prompt
Enter ftp ftp.geocities.com. A prompt will appear
(or)
Enter ftp to
get
ftp
prompt ftp> open hostname ex. ftp>open
ftp.geocities.com (It connect to the specified host).
Enter ur yahoo! GeoCities member name.
enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo!
GeoCities directory.
17.What cmd is used to transfer multiple files at a time using FTP?

Ans: mget ==> To copy multiple files from the remote machine to the local
machine. You will be prompted for a y/n answer before
transferring each file mget * ( copies all files in the current remote
directory to ur current local directory,
using the same file names).
mput ==> To copy multiple files from the local machine to the remote
machine.
18. What is an Filter Transformation? or what options u have in Filter
Transformation?
Ans: The Filter transformation provides the means for filtering records in a
mapping. You pass all the rows from a source
transformation through the Filter transformation, then enter a filter condition
for the transformation. All ports in a Filter
transformation are input/output, and only records that meet the condition
pass through the Filter transformation.
Note: Discarded rows do not appear in the session log or reject files
To maximize session performance, include the Filter transformation as close to
the sources in the mapping as possible.
Rather than passing records you plan to discard through the mapping, you
then filter out unwanted data early in the
flow of data from sources to targets.
You cannot concatenate ports from more than one transformation into the
Filter transformation; the input ports for the filter
must come from a single transformation. Filter transformations exist within the
flow of the mapping and cannot be
unconnected. The Filter transformation does not allow setting output
default values.

19.What are default sources which will supported by Informatica Powermart ?


Ans :
Relational tables, views, and synonyms.
Fixed-width and delimited flat files that do not contain binary data.
COBOL files.
20. When do u create the Source Definition ? Can I use this Source Defn to any
Transformation?
Ans: When working with a file that contains fixed-width binary data, you
must create the source definition.
The Designer displays the source definition as a table, consisting of names,
datatypes, and constraints.To use a source
definition in a mapping, connect a source definition to a Source
Qualifier or Normalizer transformation. The Informatica
Server uses these transformations to read the source data.
21. What is Active & Passive Transformation ?
Ans: Active and Passive Transformations

Transformations can be active or passive. An active transformation can


change the number of records passed through it. A
passive transformation never changes the record count.For example, the
Filter transformation removes rows that do not
meet the filter condition defined in the transformation.

Active transformations that might change the record count include the
following:
Advanced External Procedure
Aggregator
Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use PowerConnect to access ERP sources, the ERP Source
Qualifier is also an active transformation.
/*
You can connect only one of these active transformations to the same
transformation or target, since the Informatica
Server cannot determine how to concatenate data from different sets of
records with different numbers of rows.
*/
Passive transformations that never change the record count include
the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy
You can connect any number of these passive transformations, or connect
one active transformation with any number of
passive transformations, to the same transformation or target.
22. What is staging Area and Work Area?
Ans: Staging Area : - Holding Tables on DW Server.
- Loaded from Extract Process
- Input for Integration/Transformation
- May function as Work Areas
- Output to a work area or Fact Table
Work Area: - Temporary Tables
- Memory

23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page #
125)
Ans: Defn: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the
term metadata is used in a number of different
situations.
Metadata is used for:
Data transformation and load
Data management
Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source
data and any changes that need to be made. The advantage of storing metadata
about the data being transformed is that as source data changes the changes can be
captured in the metadata, and transformation programs automatically regenerated.
For each source data field the following information is reqd:
Source Field:
Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme
from different sources).
Name (Local field name).
Type (storage type of data, like character,integer,floating pointand so on).
Location
- system ( system it comes from ex.Accouting system).
- object ( object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
Unique identifier
Name
Type (database data type, such as Char, Varchar, Number and so on).
Tablename (Name of the table th field will be part of).
The other information that needs to be stored is the transformation or
transformations that need to be applied to turn the source data into the destination
data:
Transformation:
Transformation (s)
- Name
- Language (name of the lanjuage that transformation is written in).
- module name
- syntax
The Name is the unique identifier that differentiates this from any other similar
transformations.
The Language attribute contains the name of the lnguage that the
transformation is written in.
The other attributes are module name and syntax. Generally these will be mutually
exclusive, with only one being defined. For simple transformations such as simple
SQL functions the syntax will be stored. For complex transformations the name
of the module that contains the code is stored instead.
Data management:
Metadata is reqd to describe the data as it resides in the data warehouse.This is
needed by the warhouse manager to allow it to track and control all data
movements. Every object in the database needs to be described.

Metadata is needed for all the following:


Tables
- Columns
- name
- type
Indexes
- Columns
- name
- type
Views
- Columns
- name
- type
Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details
refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The
same metadata can be used by the Whouse manager to describe the data in the data
warehouse is also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This
metadata can be used to build a history of all quries run and generate a query profile
for each user, group of users and the data warehouse as a whole.
The metadata that is reqd for each query is:
- query
- tables accessed
- columns accessed
- name
- refence identifier
- restrictions applied
- column name
- table name
- reference identifier
- restriction
- join Criteria applied

- aggregate functions used

group
by
criteria

sort
criteria

syntax
- execution plan
resources

24. What kind of Unix flavoures u r experienced?


Ans: Solaris 2.5 SunOs 5.5 (Operating System)
Solaris 2.6 SunOs 5.6 (Operating System)
Solaris 2.8 SunOs 5.8 (Operating System)
AIX 4.0.3
5.5.1 2.5.1 May 96 sun4c, sun4m, sun4d, sun4u, x86, ppc
5.6 2.6 Aug. 97 sun4c, sun4m, sun4d, sun4u, x86
5.7 7 Oct. 98 sun4c, sun4m, sun4d, sun4u, x86
5.8 8 2000 sun4m, sun4d, sun4u, x86

25. What are the tasks that are done by Informatica Server?
Ans:The Informatica Server performs the following tasks:
Manages the scheduling and execution of sessions and batches
Executes sessions and batches
Verifies permissions and privileges
Interacts with the Server Manager and pmcmd.
The Informatica Server moves data from sources to targets based on metadata
stored in a repository. For instructions on how to move and transform data, the
Informatica Server reads a mapping (a type of metadata that includes
transformations and source and target definitions). Each mapping uses a session to
define additional information and to optionally override mapping-level options. You
can group multiple sessions to run as a single unit, known as a batch.
26. What are the two programs that communicate with the Informatica Server?
Ans: Informatica provides Server Manager and pmcmd programs to communicate
with the Informatica Server:
Server Manager. A client application used to create and manage sessions and
batches, and to monitor and stop the Informatica Server. You can use information
provided through the Server Manager to troubleshoot sessions and improve session
performance.
pmcmd. A command-line program that allows you to start and stop sessions and
batches, stop the Informatica Server, and verify if the Informatica Server is running.
27. When do u reinitialize Aggregate Cache?
Ans: Reinitializing the aggregate cache overwrites historical aggregate data with new
aggregate data. When you reinitialize the
aggregate cache, instead of using the captured changes in source tables, you
typically need to use the use the entire source
table.
For example, you can reinitialize the aggregate cache if the source for a
session changes incrementally every day and
completely changes once a month. When you receive the new monthly source,
you might configure the session to reinitialize

the aggregate cache, truncate the existing target, and use the new source
table during the session.

/? Note: To be clarified when server manger works for following ?/


To reinitialize the aggregate cache:
1.In the Server Manager, open the session property sheet.
2.Click the Transformations tab.
3.Check Reinitialize Aggregate Cache.
4.Click OK three times to save your changes.
5.Run the session.
The Informatica Server creates a new aggregate cache, overwriting the existing
aggregate cache.
/? To be check for step 6 & step 7 after successful run of session ?/
6.After running the session, open the property sheet again.
7.Click the Data tab.
8.Clear Reinitialize Aggregate Cache.
9.Click OK.
28. (i) What is Target Load Order in Designer?
Ans: Target Load Order: - In the Designer, you can set the order in which the
Informatica Server sends records to various target
definitions in a mapping. This feature is crucial if you want to maintain referential
integrity when inserting, deleting, or updating
records in tables that have the primary key and foreign key constraints applied to
them. The Informatica Server writes data to
all the targets connected to the same Source Qualifier or Normalizer
simultaneously, to maximize performance.

28. (ii) What are the minimim condition that u need to have so as to use Targte Load
Order Option in Designer?
Ans: U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets,
create one Source Qualifier or Normalizer
transformation for each target within a mapping. To set the target load order,
you then determine the order in which each
Source Qualifier sends data to connected targets in the mapping.
When a mapping includes a Joiner transformation, the Informatica Server
sends all records to targets connected to that
Joiner at the same time, regardless of the target load order.
28(iii). How do u set the Target load order?
Ans: To set the target load order:
1. Create a mapping that contains multiple Source Qualifier transformations.
2. After you complete the mapping, choose Mappings-Target Load Plan.
A dialog box lists all Source Qualifier transformations in the mapping, as
well as the targets that receive data from each
Source Qualifier.
3. Select a Source Qualifier from the list.

order.

4. Click the Up and Down buttons to move the Source Qualifier within the load
5. Repeat steps 3 and 4 for any other Source Qualifiers you wish to reorder.
6. Click OK and Choose Repository-Save.

29. What u can do with Repository Manager?


Ans: We can do following tasks using Repository Manager : To create usernames, you must have one of the following sets of
privileges:
- Administer Repository privilege
- Super User privilege
To create a user group, you must have one of the following privileges :
- Administer Repository privilege
- Super User privilege
To assign or revoke privileges , u must hv one of the following
privilege..
- Administer Repository privilege
- Super User privilege
Note: You cannot change the privileges of the default user groups or the default
repository users.
30. What u can do with Designer ?
Ans: The Designer client application provides five tools to help you create
mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol,
ERP, and relational sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.
Note:The Designer allows you to work with multiple tools at one time. You can
also work in multiple folders and repositories
31. What are different types of Tracing Levels u hv in Transformations?
Ans: Tracing Levels in Transformations :Level
Description
Terse
Indicates when the Informatica Server initializes the session and its
components. Summarizes session results, but not at the level of individual records.
Normal
Includes initialization information as well as error messages and notification
of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive
information about initializing transformations in the session.
Verbose data
Includes all information provided with the Verbose initialization setting.

Note: By default, the tracing level for every transformation is Normal.


To add a slight performance boost, you can also set the tracing level to Terse,
writing the minimum of detail to the session log
when running a session containing the transformation.

31(i). What the difference is between a database, a data warehouse and a data
mart?
Ans: -- A database is an organized collection of information.
-- A data warehouse is a very large database with special sets of tools to
extract and cleanse data from operational systems
and to analyze data.
-- A data mart is a focused subset of a data warehouse that deals with a
single area of data and is organized for quick
analysis.
32. What is Data Mart, Data WareHouse and Decision Support System explain
briefly?
Ans: Data Mart:
A data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular
community of knowledge workers. In scope, the data may derive from an enterprisewide database or data warehouse or be more specialized. The emphasis of a data
mart is on meeting the specific demands of a particular group of knowledge users in
terms of analysis, content, presentation, and ease-of-use. Users of a data mart can
expect to have data presented in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the
presence of the other in some form. However, most writers using the term seem to
agree that the design of a data mart tends to start from an analysis of user
needs and that a data warehouse tends to start from an analysis of what
data already exists and how it can be collected in such a way that the data
can later be used. A data warehouse is a central aggregation of data (which can be
distributed physically); a data mart is a data repository that may derive from a data
warehouse or not and that emphasizes ease of access and usability for a particular
designed purpose. In general, a data warehouse tends to be a strategic but
somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting
an immediate need.
Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that
an enterprise's various business systems collect. The term was coined by W. H.
Inmon. IBM sometimes uses the term "information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data
from various online transaction processing (OLTP) applications and other sources is
selectively extracted and organized on the data warehouse database for use by
analytical applications and user queries. Data warehousing emphasizes the
capture of data from diverse sources for useful analysis and access, but does not
generally start from the point-of-view of the end user or knowledge worker who may
need access to specialized, sometimes local databases. The latter idea is known as
the data mart.
data mining, Web mining, and a decision support system (DSS) are three
kinds of applications that can make use of a data warehouse.
Decision Support System:
A decision support system (DSS) is a computer program application that analyzes
business data and presents it so that users can make business decisions more easily.

It is an "informational application" (in distinction to an "operational application" that


collects the data in the course of normal business operation).
Typical information that a decision support application might gather and
present would be:
Comparative sales figures between one week and the next
Projected revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a
context that is described
A decision support system may present information graphically and may include an
expert system or artificial intelligence (AI). It may be aimed at business executives
or some other group of knowledge workers.
33. What r the differences between Heterogeneous and Homogeneous?
Ans: Heterogeneous
Homogeneous
Stored in different Schemas
Common structure
Stored in different file or db types
Same database type
Spread across in several countries
Same data center
Different platform n H/W config.
Same platform and H/Ware configuration.
34. How do you use DDL commands in PL/SQL block ex. Accept table name from
user and drop it, if available else display msg?
Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL,
the Package used isDBMS_SQL.
35. What r the steps to work with Dynamic SQL?
Ans: Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute
SQL stmt of Dynamic Cursor and
Close the Cursor.
36. Which package, procedure is used to find/check free space available for db
objects like table/procedures/views/synonymsetc?
Ans: The Package
is DBMS_SPACE
The Procedure
is UNUSED_SPACE
The Table
is DBA_OBJECTS
Note: See the script to find free space @ c:\informatica\tbl_free_space
37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows
with same EmpID?If u use lookup for the same
situation does it allow to load 2 rows or only 1?
Ans: => No, it will not it generates pkey constraint voilation. (it loads 1 row)
=> Even then no if EmpId is Pkey.
38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another
source (oracle) and the target is having Name
varchar2(50) then how does informatica handles this situation? How
Informatica handles string and numbers datatypes
sources?
39. How do u debug mappings? I mean where do u attack?

40. How do u qry the Metadata tables for Informatica?


41(i). When do u use connected lookup n when do u use unconnected lookup?
Ans:
Connected Lookups : A connected Lookup transformation is part of the mapping data flow. With connected
lookups, you can have multiple return values. That is, you can pass multiple values
from the same row in the lookup table out of the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions
Unconnected Lookups : An unconnected Lookup transformation exists separate from the data flow in the
mapping. You write an expression using
the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example,
updating slowly changing dimension tables)
=> Calling the same lookup multiple times in one mapping

41(ii). What r the differences between Connected lookups and Unconnected lookups?
Ans:
Although both types of lookups perform the same basic task, there
are some important differences:
----------------------------------------------------------------------------------------------------------------------------Connected Lookup

Unconnected Lookup

----------------------------------------------------------------------------------------------------------------------------Part of the mapping data flow.


Separate from the mapping data
flow.
Can return multiple values from the same row.
Returns one value from each
row.
You link the lookup/output ports to another
You designate the return
value with the Return port (R).
transformation.
Supports default values.
Does not support default values.
If there's no match for the lookup condition, the
If there's no match for the
lookup condition, the server
server returns the default value for all output ports.
returns NULL.
More visible. Shows the data passing in and out
Less visible. You write an
expression using :LKP to tell
of the lookup.
the server when to perform the
lookup.
Cache includes all lookup columns used in the
Cache includes lookup/output
ports in the Lookup condition

mapping (that is, lookup table columns included


in the lookup condition and lookup table
columns linked as output ports to other
transformations).

and lookup/return port.

42. What u need concentrate after getting explain plan?


Ans: The 3 most significant columns in the plan table are named
OPERATION,OPTIONS, and OBJECT_NAME.For each step,
these tell u which operation is going to be performed and which object is the
target of that operation.
Ex:**************************
TO USE EXPLAIN PLAN FOR A QRY...
**************************
SQL> EXPLAIN PLAN
2 SET STATEMENT_ID = 'PKAR02'
3 FOR
4 SELECT JOB,MAX(SAL)
5 FROM EMP
6 GROUP BY JOB
7 HAVING MAX(SAL) >= 5000;
Explained.
**************************
TO QUERY THE PLAN TABLE :**************************
SQL> SELECT RTRIM(ID)||' '||
2
LPAD(' ', 2*(LEVEL-1))||OPERATION
3
||' '||OPTIONS
4
||' '||OBJECT_NAME STEP_DESCRIPTION
5 FROM PLAN_TABLE
6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02'
7 CONNECT BY PRIOR ID = PARENT_ID
8 AND STATEMENT_ID = 'PKAR02'
9 ORDER BY ID;
STEP_DESCRIPTION
---------------------------------------------------0 SELECT STATEMENT
1 FILTER
2
SORT GROUP BY
3
TABLE ACCESS FULL EMP

43. How components are interfaced in Psoft?


Ans:

44. How do u do the analysis of an ETL?


Ans:
====================================================
==========
45. What is Standard, Reusable Transformation and Mapplet?
Ans: Mappings contain two types of transformations, standard and reusable. Standard
transformations exist within a single
mapping. You cannot reuse a standard transformation you created in another
mapping, nor can you create a shortcut to that transformation. However, often you
want to create transformations that perform common tasks, such as calculating the
average salary in a department. Since a standard transformation cannot be used by
more than one mapping, you have to set up the same transformation each time you
want to calculate the average salary in a department.
Mapplet: A mapplet is a reusable object that represents a set of
transformations. It allows you to reuse transformation logic
and can contain as many transformations as you need. A mapplet can contain
transformations, reusable transformations, and
shortcuts to transformations.
46. How do u copy Mapping, Repository, Sessions?
Ans: To copy an object (such as a mapping or reusable transformation) from a
shared folder, press the Ctrl key and drag and drop
the mapping into the destination folder.
To copy a mapping from a non-shared folder, drag and drop the mapping into the
destination folder.
In both cases, the destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a
Source Definition, the Source Analyzer must be active.

Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not
dragging it to the workbook.
When asked if you want to make a copy, click Yes, then enter a new
name and click OK.
Choose Repository-Save.

Repository Copying: You can copy a repository from one database to


another. You use this feature before upgrading, to
preserve the original repository. Copying repositories provides a quick way to
copy all metadata you want to use as a basis for
a new repository.
If the database into which you plan to copy the repository contains an existing
repository, the Repository Manager deletes the existing repository. If you want to
preserve the old repository, cancel the copy. Then back up the existing repository
before copying the new repository.
To copy a repository, you must have one of the following privileges:
Administer Repository privilege

Super User privilege


To copy a repository:
1. In the Repository Manager, choose Repository-Copy Repository.
2. Select a repository you wish to copy, then enter the following information:
---------------------------------------------------------------------------------------------------------Copy Repository Field Required/ Optional
-------------------------------------------------------------------------------Repository
Required
copy. Each repository name must be unique within

Description
--------------------------Name

for

the

repository

the domain and should be easily


distinguished from all other repositories.
Database Username Required
Username required to connect to the database.
This login must have the
appropriate database permissions to create the
repository.
Database Password
Required
Password associated with the
database username.Must be in US-ASCII.
ODBC Data Source
Required
Data source used to connect to
the database.
Native Connect String Required
Connect string identifying the
location of the database.
Code Page
Required
Character set associated with
the repository. Must be a superset of the code
page of the repository you want to
copy.
If you are not connected to the repository you want to copy, the Repository
Manager asks you to log in.
3.
Click OK.
5.
If asked whether you want to delete an existing repository data in the
second repository, click OK to delete it. Click Cancel to preserve the existing
repository.

Copying Sessions:
In the Server Manager, you can copy stand-alone sessions within a folder, or copy
sessions in and out of batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
Super User privilege
To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose Operations-Copy Session.
The Server Manager makes a copy of the session. The Informatica Server names the
copy after the original session, appending a number, such as session_name1.
47. What are shortcuts, and what is advantage?

Ans: Shortcuts allow you to use metadata across folders without making copies,
ensuring uniform metadata. A shortcut inherits all
properties of the object to which it points. Once you create a shortcut, you can
configure the shortcut name and description.
When the object the shortcut references changes, the shortcut inherits those
changes. By using a shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For
example, if you have a shortcut to a target
definition, and you add a column to the definition, the shortcut automatically
inherits the additional column.
Shortcuts allow you to reuse an object without creating multiple objects in the
repository. For example, you use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies
of the same source definition, one in each
folder, you can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a
shortcut to a non-shared folder, the Designer
creates a copy of the object instead.

You can create shortcuts to the following repository objects:


Source definitions
Reusable transformations
Mapplets
Mappings
Target definitions
Business components
You can create two types of shortcuts:
Local shortcut. A shortcut created in the same repository as the original
object.
Global shortcut. A shortcut created in a local repository that references an
object in a global repository.

Advantages: One of the primary advantages of using a shortcut is


maintenance. If you need to change all instances of an
object, you can edit the original repository object. All shortcuts accessing the
object automatically inherit the changes.
Shortcuts have the following advantages over copied repository objects:
You can maintain a common repository object in a single location. If you need to
edit the object, all shortcuts immediately inherit the changes you make.
You can restrict repository users to a set of predefined metadata by asking users to
incorporate the shortcuts into their work instead of developing repository objects
independently.
You can develop complex mappings, mapplets, or reusable transformations, then
reuse them easily in other folders.
You can save space in your repository by keeping a single repository object and
using shortcuts to that object, instead of creating copies of the object in multiple
folders or multiple repositories.
48. What are Pre-session and Post-session Options?

(Plzz refer Help Using Shell Commands n Post-Session Commands and Email)
Ans: The Informatica Server can perform one or more shell commands before or
after the session runs. Shell commands are
operating system commands. You can use pre- or post- session shell
commands, for example, to delete a reject file or
session log, or to archive target files before the session begins.
The status of the shell command, whether it completed successfully or
failed, appears in the session log file.
To call a pre- or post-session shell command you must:
1.
Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or
batch file for Windows NT servers.
2.
Configure the session to execute the pre- or post-session shell commands.
You can configure a session to stop if the Informatica Server encounters an error
while executing pre-session shell commands.
For example, you might use a shell command to copy a file from one directory to
another. For aWindows NT server you would use the following shell command to
copy the SALES_ ADJ file from the target directory, L, to the source, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command line to perform a similar
operation:
cp sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as
the Informatica Server. Environment settings in one shell command script do not
carry over to other scripts. To run all shell commands in the same environment, call
a single shell script that in turn invokes other scripts.
49. What are Folder Versions?
Ans: In the Repository Manager, you can create different versions within a folder to
help you archive work in development. You can copy versions to other folders as
well. When you save a version, you save all metadata at a particular point in
development. Later versions contain new or modified metadata, reflecting work that
you have completed since the last version.
Maintaining different versions lets you revert to earlier work when needed. By
archiving the contents of a folder into a version each time you reach a development
landmark, you can access those versions if later edits prove unsuccessful.
You create a folder version after completing a version of a difficult mapping, then
continue working on the mapping. If you are unhappy with the results of subsequent
work, you can revert to the previous version, then create a new version to continue
development. Thus you keep the landmark version intact, but available for
regression.
Note: You can only work within one version of a folder at a time.
50. How do automate/schedule sessions/batches n did u use any tool for automating
Sessions/batch?

Ans: We scheduled our sessions/batches using Server Manager.


You can either schedule a session to run at a given time or interval, or you can
manually start the session.
U needto hv create sessions n batches with Read n Execute permissions or
super user privilege.
If you configure a batch to run only on demand, you cannot schedule
it.
Note: We did not use any tool for automation process.
51. What are the differences between 4.7 and 5.1 versions?
Ans: New Transformations added like XML Transformation
Transformation, and PowerMart and PowerCenter both
are same from 5.1version.

and

MQ

Series

52. What r the procedure that u need to undergo before moving Mappings/sessions
from Testing/Development to Production?
Ans:
53. How many values it (informatica server) returns when it passes thru Connected
Lookup n Unconncted Lookup?
Ans: Connected Lookup can return multiple values where as Unconnected Lookup
will return only one values that is Return Value.
54. What is the difference between PowerMart and PowerCenter in 4.7.2?
Ans: If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the
same repository. Because you can run
these servers at the same time, you can distribute the repository session load
across available servers to improve overall
performance.
With PowerCenter, you receive all product functionality, including distributed
metadata, the ability to organize repositories into
a data mart domain and share metadata across repositories.
A PowerCenter license lets you create a single repository that you can
configure as a global repository, the core component
of a data warehouse.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata and
multiple registered servers. Also, the various
options available with PowerCenter (such as PowerCenter Integration Server for BW,
PowerConnect for IBM DB2,
PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with
PowerMart.

55. What kind of modifications u can do/perform with each Transformation?


Ans: Using transformations, you can modify data in the following ways:
---------------------------------------Task
Transformation

ression

gator

e Strategy

---------------------------------------Calculate a value
Expression
Perform an aggregate calculations
Aggregator
Modify text
Expression
Filter records
Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure
Stored Procedure
Call a procedure in a shared library or in the
External Procedure
COM layer of Windows NT
Generate primary keys
Sequence Generator
Limit records to a top or bottom range
Rank
Normalize records, including those read
Normalizer
from COBOL sources
Look up values
Lookup
Determine whether to insert, delete, update,
Update Strategy
or reject records
Join records from different databases
Joiner
or flat file systems
56. Expressions in Transformations, Explain briefly how do u use?
Ans: Expressions in Transformations
To transform data passing through a transformation, you can write an
expression. The most obvious examples of these are the
Expression and Aggregator transformations, which perform calculations
on either single values or an entire range of values
within a port. Transformations that use expressions include the following:
-------------------------------------------------------------Transformation
How It Uses Expressions
-------------------------------------------------------------Calculates the result of an expression for each row passing through the
transformation, using values from one or more ports.
Calculates the result of an aggregate expression, such as a sum or
average, based on all data passing through a port or on groups within that data.
Filter
Filters records based on a
condition you enter using an expression.
Filters the top or bottom range of records, based on a condition you
enter using an expression.
Assigns a numeric code to each record based on an expression, indicating
whether the Informatica Server should use the information in the record to insert,
delete, or update the target.
In each transformation, you use the Expression Editor to enter the expression. The
Expression Editor supports the transformation language for building expressions. The
transformation language uses SQL-like functions, operators, and other components
to build the expression. For example, as in SQL, the transformation language
includes the functions COUNT and SUM. However, the PowerMart/PowerCenter
transformation language includes additional functions not found in SQL.
When you enter the expression, you can use values available through ports.
For example, if the transformation has two input ports representing a price and

sales tax rate, you can calculate the final sales tax using these two values. The ports
used in the expression can appear in the same transformation, or you can use output
ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what
happens?Where do u set this option?
Ans: U get an fatel error which cause server to fail/stop the session.
U can set Event-Based Scheduling Option in Session Properties under General
tab-->Advanced options..
---------------------------------------------------Event-Based
Required/ Optional Description
----------------------------------------------------Indicator File to Wait For
Optional
Required to use eventbased scheduling. Enter the indicator file
(or directory and
file) whose arrival schedules the session. If you do
not enter a directory, the
Informatica Server assumes the file appears
in the server variable
directory $PMRootDir.
58. What is the Test Load Option and when you use in Server Manager?
Ans: When testing sessions in development, you may not need to process the
entire source. If this is true, use the Test Load
Option(Session Properties General Tab Target Options Choose Target Load
options as Normal (option button), with
Test Load cheked (Check box) and No.of rows to test ex.2000 (Text box with
Scrolls)). You can also click the Start button.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------59. SCD Type 2 and SGT difference?
60. Differences between 4.7 and 5.1?
61. Tuning Informatica Server for improving performance? Performance Issues?
Ans: See /* C:\pkar\Informatica\Performance Issues.doc */
62. What is Override Option? Which is better?
63. What will happen if u increase buffer size?
64. what will happen if u increase commit Intervals? and also decrease commit
Intervals?
65. What kind of Complex mapping u did? And what sort of problems u faced?
66. If u have 10 mappings designed and u need to implement some changes(may be
in existing mapping or new mapping need to
be designed) then how much time it takes from easier to complex?

67. Can u refresh Repository in 4.7 and 5.1? and also can u refresh pieces (partially)
of repository in 4.7 and 5.1?
68. What is BI?
Ans: http://www.visionnet.com/bi/index.shtml
69. Benefits of BI?
Ans: http://www.visionnet.com/bi/bi-benefits.shtml
70. BI Faq
Ans: http://www.visionnet.com/bi/bi-faq.shtml
71. What is difference between data scrubbing and data cleansing?
Ans: Scrubbing data is the process of cleaning up the junk in legacy data and
making it accurate and useful for the next generations
of automated systems. This is perhaps the most difficult of all conversion
activities. Very often, this is made more difficult when
the customer wants to make good data out of bad data. This is the dog work.
It is also the most important and can not be done
without the active participation of the user.
DATA CLEANING - a two step process including DETECTION and
then CORRECTION of errors in a data set
72. What is Metadata and Repository?
Ans:
Metadata. Data about data .
It contains descriptive data for end users.
Contains data that controls the ETL processing.
Contains data about the current state of the data warehouse.
ETL updates metadata, to provide the most current state.
Repository. The place where you store the metadata is called a repository. The
more sophisticated your repository, the more
complex and detailed metadata you can store in it. PowerMart and
PowerCenter use a relational database as the
repository.

73. SQL * LOADER?


Ans: http://downloadwest.oracle.com/otndoc/oracle9i/901_doc/server.901/a90192/ch03.htm#1004678
74. Debugger in Mapping?
75. Parameters passing in 5.1 vesion exposure?
76. What is the filename which u need to configure in Unix while Installing
Informatica?

77. How do u select duplicate rows using Informatica i.e., how do u use
Max(Rowid)/Min(Rowid) in Informatica?
**********************************Shankar
Prasad*************************************************
Posted 2nd June 2012 by Shankar Prasad
0

Add a comment
2.
JUN

Datawarehouse- BASIC DEFINITIONS - Informatica


Datawarehouse - BASIC

DEFINITIONS (by Shankar Prasad)

DWH : is a repository of integrated information, specifically structured for queries


and analysis. Data and information are extracted from heterogeneous sources as
they are generated. This makes it much easier and more efficient to run queries
over data that originally came from different sources.
Data Mart : is a collection of subject areas organized for decision support based
on the needs of a given department. Ex : sales, marketing etc. the data mart is
designed to suit the needs of a department. Data mart is much less granular than
the ware house data
Data Warehouse : is used on an enterprise level, while data marts is used on a
business division / department level. Data warehouses are arranged around the
corporate subject areas found in the corporate data model. Data warehouses
contain more detail information while most data marts contain more summarized
or aggregated data.
OLTP : Online Transaction Processing. This is standard, normalized database
structure. OLTP is designed for Transactions, which means that inserts, updates
and deletes must be fast.
OLAP : Online Analytical Processing. Read-only, historical, aggregated data.
Fact Table : contain the quantitative measures about the business
Dimension Table : descriptive data about the facts (business)

Conformed dimensions : dimension table shared by fact tables.. these tables


connect separate star schemas into an enterprise star schema.
Star Schema : is a set of tables comprised of a single, central fact table
surrounded by de-normalized dimensions. Star schema implement dimensional
data structures with de-normalized dimensions
Snow Flake : is a set of tables comprised of a single, central fact table
surrounded by normalized dimension hierarchies. Snowflake schema implement
dimensional data structures with fully normailized dimensions.
Staging Area : it is the work place where raw data is brought in, cleaned,
combined, archived and exported to one or more data marts. The purpose of
data staging area is to get data ready for loading into a presentation layer.
Queries : The DWH contains 2 types of queries. There will be fixed queries that
are clearly defined and well understood, such as regular reports, canned
queries and common aggregations.
There will also be ad hoc queries that are unpredictable, both in quantity and
frequency.
Ad Hoc Query : are the starting point for any analysis into a database. The
ability to run any query when desired and expect a reasonable response that
makes the data warehouse worthwhile and makes the design such a significant
challenge.
The end-user access tools are capable of automatically generating the
database query that answers any question posted by the user.
Canned Queries : are pre-defined queries. Canned queries contain prompts that
allow you to customize the query for your specific needs
Kimball (Bottom up) vs Inmon (Top down) approaches :
Acc. To Ralph Kimball, when you plan to design analytical solutions for an
enterprise, try building data marts. When you have 3 or 4 such data marts, you
would be having an enterprise wide data warehouse built up automatically
without time and effort from exclusively spent on building the EDWH. Because
the time required for building a data mart is lesser than for an EDWH.
INMON : try to build an Enterprise wide Data warehouse first and all the data
marts will be the subsets of the EDWH. Acc. To him, independent data marts
cannot make up an enterprise data warehouse under any circumstance, but they
will remain isolated pieces of information stove pieces

*********************************************************************************************************************
***
Dimensional Data Model :
Dimensional data model is most often used in data warehousing systems. This is different from
the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can
imagine, the same data would then be stored differently in a dimensional model than in a 3rd
normal form model.
To understand dimensional data modeling, let's define some of the terms commonly used in this
type of modeling:
Dimension: A category of information. For example, the time dimension.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time
Dimension.
Hierarchy: The specification of levels that represents relationship between different attributes
within a dimension. For example, one possible hierarchy in the Time dimension is Year -->
Quarter --> Month --> Day.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales
amount would be such a measure. This measure is stored in the fact table with the appropriate
granularity. For example, it can be sales amount by store by day. In this case, the fact table
would contain three columns: A date column, a store column, and a sales amount column.
Lookup Table: The lookup table provides the detailed information about the attributes. For
example, the lookup table for the Quarter attribute would include a list of all of the quarters
available in the data warehouse. Each row (each quarter) may have several fields, one for the
unique ID that identifies the quarter, and one or more additional fields that specifies how that
particular quarter is represented on a report (for example, first quarter of 2001 may be
represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more
lookup tables, but fact tables do not have direct relationships to one another. Dimensions and
hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup
tables.
In designing data models for data warehouses / data marts, the most commonly used schema
types are Star Schema andSnowflake Schema.
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and
is radially connected to other surrounding objects (dimension lookup tables) like a star. A star
schema can be simple or complex. A simple star consists of one fact table; a complex star can
have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each
point of the star explodes into more points. The main advantage of the snowflake schema is
the improvement in query performance due to minimized disk storage requirements and joining
smaller lookup tables. The main disadvantage of the snowflake schema is the additional
maintenance efforts needed due to the increase number of lookup tables.
Whether one uses a star or a snowflake largely depends on personal preference and business
needs. Personally, I am partial to snowflakes, when there is a business case to analyze the
information at that particular level.
Slowly Changing Dimensions:
The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a
nutshell, this applies to cases where the attribute for a record varies over time. We give an
example below:
Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in
the customer lookup table has the following record:
Customer Key

Name

State

1001

Christina

Illinois

At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc.
now modify its customer table to reflect this change? This is the "Slowly Changing Dimension"
problem.
There are in general three ways to solve this type of problem, and they are categorized as
follows:
Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is
treated essentially as two people.
Type 3: The original record is modified to reflect the change.
We next take a look at each of the scenarios and how the data model and the data looks like
for each of them. Finally, we compare and contrast among the three alternatives.
Type 1 Slowly Changing Dimension:
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original
information. In other words, no history is kept.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

After Christina moved from Illinois to California, the new information replaces the new record,
and we have the following table:
Customer Key

Name

State

1001

Christina

California

Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no
need to keep track of the old information.
Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in history. For
example, in this case, the company would not be able to know that Christina lived
in Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data
warehouse to keep track of historical changes.
Type 2 Slowly Changing Dimension:
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new
information. Therefore, both the original and the new record will be present. The newe record
gets its own primary key.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

After Christina moved from Illinois to California, we add the new information as a new row into
the table:
Customer Key

Name

State

1001

Christina

Illinois

1005

Christina

California

Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of rows for the
table is very high to start with, storage and performance can become a concern.
- This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse
to track historical changes.
Type 3 Slowly Changing Dimension :
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular
attribute of interest, one indicating the original value, and one indicating the current value.
There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

To accomodate Type 3 Slowly Changing Dimension, we will now have the following columns:

Customer Key
Name
Original State
Current State
Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we
have the following table (assuming the effective date of change is January 15, 2003):
Customer Key

Name

Original State

Current State

Effective Date

1001

Christina

Illinois

California

15-JAN-2003

Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more than once. For
example, if Christina later moves to Texas on December 15, 2003, the California information
will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for the data
warehouse to track historical changes, and when such changes will only occur for a finite
number of time.
Surrogate key :
A surrogate key is frequently a sequential number but doesn't have to be. Having the key
independent of all other columns insulates the database relationships from changes in data
values or database design and guarantees uniqueness.
Some database designers use surrogate keys religiously regardless of the suitability of other
candidate keys. However, if a good key already exists, the addition of a surrogate key will
merely slow down access, particularly if it is indexed.
The concept of surrogate key is important in data warehouse ,surrogate means deputy or
substitute. surrogate key is a small integer(say 4 bytes)that can uniquely identify the record in
the dimension table.however it has no meaning data warehouse experts suggest that
production key used in the databases should not be used in the dimension tables as primary
keys instead in there place the surrogate key have to be used which are generated
automatically.

Conceptual, Logical, And Physical Data Models:


There are three levels of data modeling. They are conceptual, logical, and physical. This
section will explain the difference among the three, the order with which each one is created,
and how to go from one level to the other.
Conceptual Data Model
Features of conceptual data model include:

Includes the important entities and the relationships among them.


No attribute is specified.
No primary key is specified.
At this level, the data modeler attempts to identify the highest-level relationships among the
different entities.
Logical Data Model
Features of logical data model include:

Includes all entities and relationships among them.


All attributes for each entity are specified.
The primary key for each entity specified.
Foreign keys (keys identifying the relationship between different entities) are

specified.
Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as possible,
without regard to how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to
be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:

1.
2.
3.
4.
5.
6.

Identify all entities.


Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
Normalization.
Physical Data Model
Features of physical data model include:

Specification all tables and columns.


Foreign keys are used to identify relationships between tables.
Denormalization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different
from the logical data model.
At this level, the data modeler will specify how the logical data model will be realized in the
database schema.
The steps for physical data model design are as follows:

1.
2.
3.
4.

Convert entities into tables.


Convert relationships into foreign keys.
Convert attributes into columns.
Modify the physical data model based on physical constraints / requirements.
What Is OLAP :

OLAP stands for On-Line Analytical Processing. The first attempt to provide a definition to OLAP
was by Dr. Codd, who proposed 12 rules for OLAP. Later, it was discovered that this particular
white paper was sponsored by one of the OLAP tool vendors, thus causing it to lose objectivity.
The OLAP Report has proposed the FASMI test, Fast Analysis
of SharedMultidimensional Information. For a more detailed description of both Dr. Codd's rules
and the FASMI test, please visit The OLAP Report.
For people on the business side, the key feature out of the above list is "Multidimensional." In
other words, the ability to analyze metrics in different dimensions such as time, geography,
gender, product, etc. For example, sales for the company is up. What region is most
responsible for this increase? Which store in this region is most responsible for the increase?
What particular product category or categories contributed the most to the increase? Answering
these types of questions in order means that you are performing an OLAP analysis.
Depending on the underlying technology used, OLAP can be braodly divided into two different
camps: MOLAP and ROLAP. A discussion of the different OLAP types can be found in the MOLAP,
ROLAP, and HOLAP section.
In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and
Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and
ROLAP.
MOLAP
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a
multidimensional cube. The storage is not in the relational database, but in proprietary
formats.
Advantages:

Excellent performance: MOLAP cubes are built for fast data retrieval, and is
optimal for slicing and dicing operations.

Can perform complex calculations: All calculations have been pre-generated


when the cube is created. Hence, complex calculations are not only doable, but they return
quickly.
Disadvantages:

Limited in the amount of data it can handle: Because all calculations are
performed when the cube is built, it is not possible to include a large amount of data in the
cube itself. This is not to say that the data in the cube cannot be derived from a large amount
of data. Indeed, this is possible. But in this case, only summary-level information will be
included in the cube itself.

Requires additional investment: Cube technology are often proprietary and do


not already exist in the organization. Therefore, to adopt MOLAP technology, chances are
additional investments in human and capital resources are needed.
ROLAP
This methodology relies on manipulating the data stored in the relational database to give the
appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of
slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Advantages:

Can handle large amounts of data: The data size limitation of ROLAP
technology is the limitation on data size of the underlying relational database. In other words,
ROLAP itself places no limitation on data amount.

Can leverage functionalities inherent in the relational database: Often,


relational database already comes with a host of functionalities. ROLAP technologies, since
they sit on top of the relational database, can therefore leverage these functionalities.
Disadvantages:

Performance can be slow: Because each ROLAP report is essentially a SQL query
(or multiple SQL queries) in the relational database, the query time can be long if the
underlying data size is large.

Limited by SQL functionalities: Because ROLAP technology mainly relies on


generating SQL statements to query the relational database, and SQL statements do not fit all
needs (for example, it is difficult to perform complex calculations using SQL), ROLAP
technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have
mitigated this risk by building into the tool out-of-the-box complex functions as well as the
ability to allow users to define their own functions.
HOLAP
HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summarytype information, HOLAP leverages cube technology for faster performance. When detail
information is needed, HOLAP can "drill through" from the cube into the underlying relational
data.
Bill Inmon vs. Ralph Kimball:
In the data warehousing field, we often hear about discussions on where a person /
organization's philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. We describe
below the difference between the two.
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system.
An enterprise has one data warehouse, and data marts source their information from the data
warehouse. In the data warehouse, information is stored in 3rd normal form.
Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the
enterprise. Information is always stored in the dimensional model.
There is no right or wrong between these two ideas, as they represent different data
warehousing philosophies. In reality, the data warehouse in most enterprises are closer to
Ralph Kimball's idea. This is because most data warehouses started out as a departmental
effort, and hence they originated as a data mart. Only when more data marts are built later do
they evolve into a data warehouse.
********************************************************************************
******************** Shankar Prasad ****************************************
********************************************************************************

Posted 2nd June 2012 by Shankar Prasad


1

View comments

JUN

Informatica Interview Question Answer

Informatica Interview Question Answer:

by Shankar Prasad
----------------------------------------------------------------------------------------------------------------------Q. What are Target Types on the Server?
A. Target Types are File, Relational and ERP.
Q. What are Target Types on the Server?
A. Target Types are File, Relational and ERP.
Q. How do you identify existing rows of data in the target table using
lookup transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of NewLookuprow
Output port to decide whether the incoming record already exists in the table / cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and check
the Lookup condition port value (Null/ Not Null) to decide whether the incoming
record already exists in the table or not.
Q. What are Aggregate transformations?
A. Aggregator transform is much like the Group by clause in traditional
SQL.
This particular transform is a connected/active transform which can take
the incoming data from the mapping pipeline and group them based on
the group by ports specified and can caculated aggregate functions
like ( avg, sum, count, stddev....etc) for each of those groups.
From a performance perspective if your mapping has an AGGREGATOR
transform use filters and sorters very early in the pipeline if there is
any need for them.
Q. What are various types of Aggregation?
A. Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST,
MEDIAN, PERCENTILE, STDDEV, and VARIANCE.
Q. What are Dimensions and various types of Dimension?
A. Dimensions are classified to 3 types.
1. SCD TYPE 1(Slowly Changing Dimension): this contains current data.
2. SCD TYPE 2(Slowly Changing Dimension): this contains current data + complete
historical data.
3. SCD TYPE 3(Slowly Changing Dimension): this contains current data.+partially
historical data
Q. What are 2 modes of data movement in Informatica Server?

A. The data movement mode depends on whether Informatica Server should process
single byte or multi-byte character data. This mode selection can affect the
enforcement of code page relationships and code page validation in the Informatica
Client and Server.
a) Unicode - IS allows 2 bytes for each character and uses additional byte for each
non-ascii character (such as Japanese characters)
b) ASCII - IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration
parameters. This comes into effect once you restart the Informatica Server.

Q. What is Code Page Compatibility?


A. Compatibility between code pages is used for accurate data movement when the
Informatica Sever runs in the Unicode data movement mode. If the code pages are
identical, then there will not be any data loss. One code page can be a subset or
superset of another. For accurate data movement, the target code page must be a
superset of the source code page.
`Superset - A code page is a superset of another code page when it contains the
character encoded in the other code page, it also contains additional characters not
contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the
code page are encoded in the other code page.
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you
are importing Japanese data into mapping, u must select the Japanese code page of
source data.
Q. What is Router transformation?
A. It is different from filter transformation in that we can specify multiple conditions
and route the data to multiple targets depending on the condition.

Q. What is Load Manager?


A. While running a Workflow, the PowerCenter Server uses the Load Manager process
and the Data Transformation Manager Process (DTM) to run the workflow and carry
out workflow tasks. When the PowerCenter Server runs a workflow, the Load
Manager performs the following tasks:
1.
2.
3.
4.
5.

Locks the workflow and reads workflow properties.


Reads the parameter file and expands workflow variables.
Creates the workflow log file.
Runs workflow tasks.
Distributes sessions to worker servers.

6. Starts the DTM to run sessions.


7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks
query
conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mappings, reader, writer, and transformation threads to extract,
transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.

Q. What is Data Transformation Manager?


A. After the load manager performs validations for the session, it creates the DTM
process. The DTM process is the second process associated with the session run. The
primary purpose of the DTM process is to create and manage threads that carry out
the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This is
also known as buffer memory. It creates the main thread, which is called the master
thread. The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition to allow
concurrent processing.. When Informatica server writes messages to the session log
it includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread - Main thread of the DTM process. Creates and manages all other
threads.
Mapping Thread - One Thread to Each Session. Fetches Session and Mapping
Information.
Pre and Post Session Thread - One Thread each to Perform Pre and Post Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.

Writer Thread - One Thread for Each Partition if target exist in the source pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each Partition.

Q. What is Session and Batches?


A. Session - A Session Is A set of instructions that tells the Informatica Server How And
When To Move Data From Sources To Targets. After creating the session, we can use
either the server manager or the command line program pmcmd to start or stop the
session. Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server. There Are Two Types Of Batches :
Sequential - Run Session One after the Other.
Concurrent Run Session At The Same Time.
Q. What is a source qualifier?
A. It represents all data queried from the source.
Q. Why we use lookup transformations?
A. Lookup Transformations can access data from relational tables that are not sources
in mapping. With Lookup transformation, we can accomplish the following tasks:
Get a related value-Get the Employee Name from Employee table based on the
Employee ID
Perform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup
transformation to determine whether the records already exist in the target or not.
Q. While importing the relational source definition from database, what are
the meta data of source U import?
Source name
Database location
Column names
Data types
Key constraints
Q. How many ways you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition

Q. Where should you place the flat file to import the flat file definition to
the designer?
A. Place it in local folder
Q. Which transformation should u need while using the Cobol sources as
source definitions?
A. Normalizer transformation which is used to normalize the data. Since Cobol sources
r often consists of denormalized data.

Q. How can you create or import flat file definition in to the warehouse
designer?
A. You can create flat file definition in warehouse designer. In the warehouse designer,
you can create a new target: select the type as flat file. Save it and u can enter
various columns for that created target by editing its properties. Once the target is
created, save it. You can import it from the mapping designer.
Q. What is a mapplet?
A. A mapplet should have a mapplet input transformation which receives input values,
and an output transformation which passes the final modified data to back to the
mapping. Set of transformations where the logic can be reusable when the mapplet
is
displayed within the mapping only input & output ports are displayed so that the
internal logic is hidden from end-user point of view.
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. What are connected and unconnected transformations?
A. Connect Transformation : A transformation which participates in the mapping data
flow. Connected transformation can receivemultiple inputs and provides multiple
outputs
Unconnected: An unconnected transformation does not participate in the mapping
data flow. It can receive multiple inputs and provides single output

Q. In how many ways can you create ports?


A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. What are mapping parameters and mapping variables?
A. Mapping parameter represents a constant value that U can define before running
a session. A mapping parameter retains the same value throughout the entire
session.
When u use the mapping parameter ,U declare and use the parameter in a mapping
or mapplet. Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change
throughout the session. The Informatica server saves the value of mapping variable
to the repository at the end of session run and uses that value next time U run the
session.
Q. Can U use the mapping parameters or variables created in one mapping
into another mapping?
A. NO.
We can use mapping parameters or variables in any transformation of the same
mapping or mapplet in which U have created mapping parameters or variables.
Q. How can U improve session performance in aggregator transformation?

A. 1. Use sorted input. Use a sorter before the aggregator


2. Do not forget to check the option on the aggregator that tells the
aggregator that the input is sorted on the same keys as group by. The
key order is also very important.
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it
completes aggregate calculations. When u run a session that
uses an aggregator transformation, the Informatica server
creates index and data caches in memory to process the
transformation. If the Informatica server requires more space, it
stores overflow values in cache files.
Q. What r the difference between joiner transformation
and source qualifier transformation?

A. You can join heterogeneous data sources in joiner


transformation which we cannot achieve in source qualifier
transformation.
You need matching keys to join two relational sources in source
qualifier transformation. Whereas u doesnt need matching keys
to join two sources.
Two relational sources should come from same data source in
sourcequalifier. You can join relational sources which r coming
from different sources also.
Q. In which conditions can we not use joiner
transformations?
A. You cannot use a Joiner transformation in the following
situations (according to Informatica 7.1):
Either input pipeline contains an Update Strategy
transformation.
You connect a Sequence Generator transformation directly
before the Joiner
transformation.
Q. What r the settings that u use to configure the joiner
transformation?
A. Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and
detail
Master outer -- all detail rows and only matching rows from
master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail ( matching or
non matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the
Informatica Server reads all the records from the master source
and builds index and data caches based on the master rows.
After building the caches, the Joiner transformation reads records
from the detail source and performs joins.
Q. Why use the lookup transformation?

A. To perform the following tasks.


Get a related value. For example, if your source table includes
employee ID, but you want to include the employee name in your
target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values
used in a calculation, such as gross sales per invoice or sales tax,
but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup
transformation to determine whether records already exist in the
target.

Differences Between Connected and Unconnected Lookups


Connected Lookup

Unconnected Lookup

Receives input values from


Receives input values directly from the the result of a :LKP
pipeline.
expression in another
transformation.
You can use a dynamic or static cache.

You can use a static cache.

Cache includes all lookup columns used


in the mapping (that is, lookup source
columns included in the lookup
condition and lookup source columns
linked as output ports to other
transformations).

Cache includes all


lookup/output ports in the
lookup condition and the
lookup/return port.

Can return multiple columns from the


same row or insert into the dynamic
lookup cache.

Designate one return port


(R). Returns one column
from each row.

If there is no match for the lookup


condition, the
PowerCenterServer returns the default
value for all output ports. If you
configure dynamic caching, the
PowerCenter Server inserts rows into
the cache or leaves it unchanged.

If there is no match for the


lookup condition, the
PowerCenter Server returns
NULL.

If there is a match for the lookup


condition, the PowerCenter Server
returns the result of the lookup
condition for all lookup/output ports. If
you configure dynamic caching, the
PowerCenter Server either updates the
row the in the cache or leaves the row
unchanged.

If there is a match for the


lookup condition, the
PowerCenter Server returns
the result of the lookup
condition into the return
port.

Pass multiple output values to another


transformation. Link lookup/output
ports to another transformation.

Pass one output value to


another transformation.
The lookup/output/return
port passes the value to the
transformation calling :LKP
expression.

Supports user-defined default values.

Does not support userdefined default values.

Q. What is meant by lookup caches?


A. The Informatica server builds a cache in memory when it
processes the first row of a data in a cached look up transformation.
It allocates memory for the cache based on the amount u configure
in the transformation or session properties. The Informatica server
stores condition values in the index cache and output values in the
data cache.
Q. What r the types of lookup caches?
A. Persistent cache: U can save the lookup cache files and reuse
them the next time the Informatica server processes a lookup
transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized
with the lookup table, you can configure the lookup transformation to
rebuild the lookup cache.
Static cache: U can configure a static or read-only cache for only
lookup table. By default Informatica server creates a static cache. It
caches the lookup table and lookup values in the cache for each row
that comes into the transformation. When the lookup condition is
true, the Informatica server does not update the cache while it
processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new
rows into cache and the target, you can create a look up
transformation to use dynamic cache. The Informatica server
dynamically inserts data to the target table.
Shared cache: U can share the lookup cache between multiple
transactions. You can share unnamed cache between
transformations in the same mapping.
Q. What r the types of lookup caches?
A. Persistent cache: U can save the lookup cache files and reuse
them the next time the Informatica server processes a lookup
transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized
with the lookup table, you can configure the lookup transformation to
rebuild the lookup cache.
Static cache: U can configure a static or read-only cache for only
lookup table. By default Informatica server creates a static cache. It

caches the lookup table and lookup values in the cache for each row
that comes into the transformation. When the lookup condition is
true, the Informatica server does not update the cache while it
processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new
rows into cache and the target, you can create a look up
transformation to use dynamic cache. The Informatica server
dynamically inserts data to the target table.
Shared cache: U can share the lookup cache between multiple
transactions. You can share unnamed cache between
transformations in the same mapping.
Q: What do you know about Informatica and ETL?
A: Informatica is a very useful GUI based ETL tool.
Q: FULL and DELTA files. Historical and Ongoing load.
A: FULL file contains complete data as of today including history data, DELTA file contains
only the changes since last extract.
Q: Power Center/ Power Mart which products have you worked with?
A: Power Center will have Global and Local repository, whereas Power Mart will have only
Local repository.
Q: Explain what are the tools you have used in Power Center and/or Power
Mart?
A: Designer, Server Manager, and Repository Manager.

Q: What is a Mapping?
A: Mapping Represent the data flow between source and target
Q: What are the components must contain in Mapping?
A: Source definition, Transformation, Target Definition and Connectors

1.

2.

Q: What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data.
Transformation performs specific function. They are two types of transformations:
Active

Rows, which are affected during the transformation or can change the
no of rows that pass through it. Eg: Aggregator, Filter, Joiner,
Normalizer, Rank, Router, Source qualifier, Update Strategy, ERP
Source Qualifier, Advance External Procedure.
Passive

Does not change the number of rows that pass through it. Eg:
Expression, External Procedure, Input, Lookup, Stored Procedure,
Output, Sequence Generator, XML Source Qualifier.

Q: Which transformation can be overridden at the Server?


A: Source Qualifier and Lookup Transformations
Q: What is connected and unconnected Transformation and give Examples?
Q: What are Options/Type to run a Stored Procedure?
A:
Normal: During a session, the stored procedure runs where the transformation exists in
the mapping on a row-by-row basis. This is useful for calling the stored procedure for each
row of data that passes through the mapping, such as running a calculation against an
input port. Connected stored procedures run only in normal mode.
Pre-load of the Source. Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of
data in a temporary table.
Post-load of the Source. After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the stored
procedure runs. This is useful for verifying target tables or disk space on the target
system.
Post-load of the Target. After the session sends data to the target, the stored
procedure runs. This is useful for re-creating indexes on the database.
It must contain at least one Input and one Output port.
Q: What kinds of sources and of targets can be used in Informatica?
A:
Sources may be Flat file, relational db or XML.

Target may be relational tables, XML or flat files.


Q: Transformations: What are the different transformations
you have worked with?

A:
Source Qualifier (XML, ERP, MQ)
Joiner
Expression
Lookup
Filter
Router
Sequence Generator
Aggregator
Update Strategy
Stored Proc
External Proc
Advanced External Proc
Rank
Normalizer
Q: What are active/passive transformations?

A: Passive transformations do not change the nos. of rows passing through it whereas
active transformation changes the nos. rows passing thru it.
Active: Filter, Aggregator, Rank, Joiner, Source Qualifier
Passive: Expression, Lookup, Stored Proc, Seq. Generator
Q: What are connected/unconnected transformations?
A:
Connected transformations are part of the mapping pipeline. The input and output ports
are connected to other transformations.
Unconnected transformations are not part of the mapping pipeline. They are not linked in
the map with any input or output ports. Eg. In Unconnected Lookup you can pass multiple
values to unconnected transformation but only one column of data will be returned from
the transformation. Unconnected: Lookup, Stored Proc.
Q: In target load ordering, what do you order - Targets or Source Qualifiers?
A: Source Qualifiers. If there are multiple targets in the mapping, which are populated
from multiple sources, then we can use Target Load ordering.
Q: Have you used constraint-based load ordering? Where do you set this?
A: Constraint based loading can be used when you have multiple targets in the mapping
and the target tables have a PK-FK relationship in the database. It can be set in the
session properties. You have to set the Source Treat Rows as: INSERT and check the box
Constraint based load ordering in Advanced Tab.
Q: If you have a FULL file that you have to match and load into a corresponding
table, how will you go about it? Will you use Joiner transformation?
A: Use Joiner and join the file and Source Qualifier.
Q: If you have 2 files to join, which file will you use as the master file?
A: Use the file with lesser nos. of records as master file.
Q: If a sequence generator (with increment of 1) is connected to (say) 3 targets
and each target uses the NEXTVAL port, what value will each target get?
A: Each target will get the value in multiple of 3.
Q: Have you used the Abort, Decode functions?
A: Abort can be used to Abort / stop the session on an error condition.
If the primary key column contains NULL, and you need to stop the session from
continuing then you may use ABORT function in the default value for the port. It can be
used with IIF and DECODE function to Abort the session.
Q: Have you used SQL Override?
A: It is used to override the default SQL generated in the Source Qualifier / Lookup
transformation.
Q: If you make a local transformation reusable by mistake, can you undo the
reusable action?
A: No
Q: What is the difference between filter and router transformations?
A: Filter can filter the records based on ONE condition only whereas Router can be used to
filter records on multiple condition.
Q: Lookup transformations: Cached/un-cached

A: When the Lookup Transformation is cached the Informatica Server caches the data and
index. This is done at the beginning of the session before reading the first record from the
source. If the Lookup is uncached then the Informatica reads the data from the database
for every record coming from the Source Qualifier.
Q: Connected/unconnected if there is no match for the lookup, what is
returned?
A: Unconnected Lookup returns NULL if there is no matching record found in the Lookup
transformation.
Q: What is persistent cache?
A: When the Lookup is configured to be a persistent cache Informatica server does not
delete the cache files after completion of the session. In the next run Informatica server
uses the cache file from the previous session.
Q: What is dynamic lookup strategy?
A: The Informatica server compares the data in the lookup table and the cache, if there is
no matching record found in the cache file then it modifies the cache files by inserting the
record. You may use only (=) equality in the lookup condition.
If multiple matches are found in the lookup then Informatica fails the session. By default
the Informatica server creates a static cache.
Q: Mapplets: What are the 2 transformations used only in mapplets?
A: Mapplet Input / Source Qualifier, Mapplet Output
Q: Have you used Shortcuts?
A: Shortcuts may used to refer to another mapping. Informatica refers to the original
mapping. If any changes are made to the mapping / mapplet, it is immediately reflected
in the mapping where it is used.
Q: If you used a database when importing sources/targets that was dropped
later on, will your mappings still be valid?
A: No
Q: In expression transformation, how can you store a value from the previous
row?
A: By creating a variable in the transformation.
Q: How does Informatica do variable initialization? Number/String/Date
A: Number 0, String blank, Date 1/1/1753
Q: Have you used the Informatica debugger?
A: Debugger is used to test the mapping during development. You can give breakpoints in
the mappings and analyze the data.

Q: What do you know about the Informatica server architecture? Load Manager,
DTM, Reader, Writer, Transformer.
A:
Load Manager is the first process started when the session runs. It checks for validity of
mappings, locks sessions and other objects.
DTM process is started once the Load Manager has completed its job. It starts a thread for
each pipeline.
Reader scans data from the specified sources.
Writer manages the target/output data.
Transformer performs the task specified in the mapping.

Q: Have you used partitioning in sessions? (not available with Powermart)


A: It is available in PowerCenter. It can be configured in the session properties.
Q: Have you used External loader? What is the difference between normal and
bulk loading?
A: External loader will perform direct data load to the table/data files, bypass the SQL
layer and will not log the data. During normal data load, data passes through SQL layer,
data is logged in to the archive log file and as a result it is slow.
Q: Do you enable/disable decimal arithmetic in session properties?
A: Disabling Decimal Arithmetic will improve the session performance but it converts
numeric values to double, thus leading to reduced accuracy.
Q: When would use multiple update strategy in a mapping?
A: When you would like to insert and update the records in a Type 2 Dimension table.
Q: When would you truncate the target before running the session?
A: When we want to load entire data set including history in one shot. Update strategy do
not have dd_update, dd_delete and it does only dd_insert.
Q: How do you use stored proc transformation in the mapping?
A: In side mapping we can use stored procedure transformation, pass input parameters
and get back the output parameters. When handling through session, it can be invoked
either in Pre-session or post-session scripts.
Q: What did you do in the stored procedure? Why did you use stored proc
instead of using expression?
A:
Q: When would you use SQ, Joiner and Lookup?
A:
If we are using multiples source tables and they are related at the database, then we can
use a single SQ.
If we need to Lookup values in a table or Update Slowly Changing Dimension tables then
we can use Lookup transformation.
Joiner is used to join heterogeneous sources, e.g. Flat file and relational tables.
Q: How do you create a batch load? What are the different types of batches?
A: Batch is created in the Server Manager. It contains multiple sessions. First create
sessions and then create a batch. Drag the sessions into the batch from the session list
window.
Batches may be sequential or concurrent. Sequential batch runs the sessions sequentially.
Concurrent sessions run parallel thus optimizing the server resources.
Q: How did you handle reject data? What file does Informatica create for bad
data?
A: Informatica saves the rejected data in a .bad file. Informatica adds a row identifier for
each record rejected indicating whether the row was rejected because of Writer or Target.
Additionally for every column there is an indicator for each column specifying whether the
data was rejected due to overflow, null, truncation, etc.
Q: How did you handle runtime errors? If the session stops abnormally how
were you managing the reload process?

Q: Have you used pmcmd command? What can you do using this command?
A: pmcmd is a command line program. Using this command
You can start sessions
Stop sessions
Recover session
Q: What are the two default repository user groups
A: Administrators and Public

o
o
o

o
o
o
o

Q: What are the Privileges of Default Repository and Extended Repository user?
A:
Default Repository Privileges
Use Designer
Browse Repository
Create Session and Batches
Extended Repository Privileges
Session Operator
Administer Repository
Administer Server
Super User
Q: How many different locks are available for repository objects
A: There are five kinds of locks available on repository objects:

Read lock. Created when you open a repository object in a folder for which you do not have
write permission. Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you
have write permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server
starts a scheduled session or batch.
Fetch lock. Created when the repository reads information about repository objects from
the database.
Save lock. Created when you save information to the repository.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and sends
post-session email when the session completes.
Q: What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform data,
and handle pre and post-session operations.

o
o
o
o
o
o
o

Q: When the Informatica Server runs a session, what are the tasks handled?
A:
Load Manager (LM):
LM locks the session and reads session properties.
LM reads the parameter file.
LM expands the server and session variables and parameters.
LM verifies permissions and privileges.
LM validates source and target code pages.
LM creates the session log file.
LM creates the DTM (Data Transformation Manager) process.

Data Transformation Manager (DTM):

o DTM process allocates DTM process memory.

o DTM initializes the session and fetches the mapping.


o DTM executes pre-session commands and procedures.
o DTM creates reader, transformation, and writer threads for each source pipeline. If the
pipeline is partitioned, it creates a set of threads for each partition.
o DTM executes post-session commands and procedures.
o DTM writes historical incremental aggregation and lookup data to disk, and it writes
persisted sequence values and mapping variables to the repository.
o Load Manager sends post-session email
Q: What is Code Page?
A: A code page contains the encoding to specify characters in a set of one or more
languages.
Q: How to handle the performance in the server side?
A: Informatica tool has no role to play here. The server administrator will take up the
issue.

Q: What are the DTM (Data Transformation Manager) Parameters?


A:
DTM Memory parameter - Default buffer block size/Data & Index Cache size ,
Reader Parameter - Line Sequential buffer length for flat files,
General Parameter - Commit Interval (source and Target)/ Others- Enabling Lookup cache,
Event based Scheduling - Indicator file to wait for.

1.

Explain about your projects


Architecture
Dimension and Fact tables
Sources and Targets
Transformations used
Frequency of populating data
Database size

2.

What is dimension modeling?


Unlike ER model the dimensional model is very asymmetric with one large central table
called as fact table connected to multiple dimension tables .It is also called star schema.

3.

What are mapplets?


Mapplets are reusable objects that represents collection of transformations
Transformations not to be included in mapplets are
Cobol source definitions
Joiner transformations
Normalizer Transformations
Non-reusable sequence generator transformations
Pre or post session procedures
Target definitions
XML Source definitions
IBM MQ source definitions
Power mart 3.5 style Lookup functions

4.

What are the transformations that use cache for performance?


Aggregator, Lookups, Joiner and Ranker

5.

What the active and passive transformations?


An active transformation changes the number of rows that pass through the mapping.
1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy
6. Aggregator
7. Advanced External procedure
8. Normalizer
9. Joiner
Passive transformations do not change the number of rows that pass through the
mapping.
1. Expressions
2. Lookup
3. Stored procedure
4. External procedure
5. Sequence generator
6. XML Source qualifier

6.

What is a lookup transformation?


Used to look up data in a relational table, views, or synonym, The informatica server
queries the lookup table based on the lookup ports in the transformation. It compares
lookup transformation port values to lookup table column values based on the lookup
condition. The result is passed to other transformations and the target.
Used to :
Get related value
Perform a calculation
Update slowly changing dimension tables.
Diff between connected and unconnected lookups. Which is better?
Connected :
Received input values directly from the pipeline
Can use Dynamic or static cache.
Cache includes all lookup columns used in the mapping
Can return multiple columns from the same row
If there is no match , can return default values
Default values can be specified.
Un connected :
Receive input values from the result of a LKP expression in another transformation.
Only static cache can be used.
Cache includes all lookup/output ports in the lookup condition and lookup or return port.
Can return only one column from each row.
If there is no match it returns null.
Default values cannot be specified.

Explain various caches :


Static:
Caches the lookup table before executing the transformation. Rows are not added
dynamically.
Dynamic:
Caches the rows as and when it is passed.
Unshared:
Within the mapping if the lookup table is used in more than one transformation then the

cache built for the first lookup can be used for the others. It cannot be used across
mappings.
Shared:
If the lookup table is used in more than one transformation/mapping then the cache built
for the first lookup can be used for the others. It can be used across mappings.
Persistent :
If the cache generated for a Lookup needs to be preserved for subsequent use then
persistent cache is used. It will not delete the index and data files. It is useful only if the
lookup table remains constant.

What are the uses of index and data caches?


The conditions are stored in index cache and records from the lookup are stored in data
cache
7.

Explain aggregate transformation?


The aggregate transformation allows you to perform aggregate calculations, such as
averages, sum, max, min etc. The aggregate transformation is unlike the Expression
transformation, in that you can use the aggregator transformation to perform calculations
in groups. The expression transformation permits you to perform calculations on a row-byrow basis only.
Performance issues ?
The Informatica server performs calculations as it reads and stores necessary data group
and row data in an aggregate cache.
Create Sorted input ports and pass the input records to aggregator in sorted forms by
groups then by port

Incremental aggregation?
In the Session property tag there is an option for performing incremental aggregation.
When the Informatica server performs incremental aggregation , it passes new source
data through the mapping and uses historical cache (index and data cache) data to
perform new aggregation calculations incrementally.

What are the uses of index and data cache?


The group data is stored in index files and Row data stored in data files.
8.

Explain update strategy?


Update strategy defines the sources to be flagged for insert, update, delete, and reject at
the targets.
What are update strategy constants?
DD_INSERT,0
DD_UPDATE,1
DD_DELETE,2
DD_REJECT,3

If DD_UPDATE is defined in update strategy and Treat source rows as INSERT in


Session . What happens?
Hints: If in Session anything other than DATA DRIVEN is mentions then Update strategy in
the mapping is ignored.

What are the three areas where the rows can be flagged for particular
treatment?
In mapping, In Session treat Source Rows and In Session Target Options.
What is the use of Forward/Reject rows in Mapping?

9.

Explain the expression transformation ?


Expression transformation is used to calculate values in a single row before writing to the
target.
What are the default values for variables?
Hints: Straing = Null, Number = 0, Date = 1/1/1753

10. Difference between Router and filter transformation?


In filter transformation the records are filtered based on the condition and rejected rows
are discarded. In Router the multiple conditions are placed and the rejected rows can be
assigned to a port.

How many ways you can filter the records?


1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy
.
11. How do you call stored procedure and external procedure transformation ?
External Procedure can be called in the Pre-session and post session tag in the Session
property sheet.
Store procedures are to be called in the mapping designer by three methods
1. Select the icon and add a Stored procedure transformation
2. Select transformation Import Stored Procedure
3. Select Transformation Create and then select stored procedure.
12. Explain Joiner transformation and where it is used?
While a Source qualifier transformation can join data originating from a common source
database, the joiner transformation joins two related heterogeneous sources residing in
different locations or file systems.
Two relational tables existing in separate databases
Two flat files in different file systems.
Two different ODBC sources
In one transformation how many sources can be coupled?
Two sources can be couples. If more than two is to be couples add another Joiner in the
hierarchy.
What are join options?
Normal (Default)
Master Outer
Detail Outer
Full Outer

13. Explain Normalizer transformation?


The normaliser transformation normalises records from COBOL and relational sources,
allowing you to organise the data according to your own needs. A Normaliser
transformation can appear anywhere in a data flow when you normalize a relational
source. Use a Normaliser transformation instead of the Source Qualifier transformation
when you normalize COBOL source. When you drag a COBOL source into the Mapping
Designer Workspace, the Normaliser transformation appears, creating input and output
ports for every columns in the source.
14. What is Source qualifier transformation?
When you add relational or flat file source definition to a mapping , you need to connect
to a source Qualifier transformation. The source qualifier represents the records that the

informatica server reads when it runs a session.


Join Data originating from the same source database.
Filter records when the Informatica server reads the source data.
Specify an outer join rather than the default inner join.
Specify sorted ports
Select only distinct values from the source
Create a custom query to issue a special SELECT statement for the Informatica server to
read the source data.

15. What is Ranker transformation?


Filters the required number of records from the top or from the bottom.
16. What is target load option?
It defines the order in which informatica server loads the data into the targets.
This is to avoid integrity constraint violations

17. How do you identify the bottlenecks in Mappings?


Bottlenecks can occur in
1. Targets
The most common performance bottleneck occurs when the informatica server writes
to a target
database. You can identify target bottleneck by configuring the session to write to a flat
file target.
If the session performance increases significantly when you write to a flat file, you have
a target
bottleneck.
Solution :
Drop or Disable index or constraints
Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised)
Tune the database for RBS, Dynamic Extension etc.,

2. Sources
Set a filter transformation after each SQ and see the records are not through.
If the time taken is same then there is a problem.
You can also identify the Source problem by
Read Test Session where we copy the mapping with sources, SQ and remove all
transformations
and connect to file target. If the performance is same then there is a Source bottleneck.
Using database query Copy the read query directly from the log. Execute the query
against the
source database with a query tool. If the time it takes to execute the query and the time
to fetch
the first row are significantly different, then the query can be modified using optimizer
hints.
Solutions:
Optimize Queries using hints.
Use indexes wherever possible.
3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then there is a
problem.

(OR) Look for the performance monitor in the Sessions property sheet and view the
counters.
Solutions:
If High error rows and rows in lookup cache indicate a mapping bottleneck.
Optimize Single Pass Reading:
Optimize Lookup transformation :
1. Caching the lookup table:
When caching is enabled the informatica server caches the lookup table and
queries the
cache during the session. When this option is not enabled the server queries the
lookup
table on a row-by row basis.
Static, Dynamic, Shared, Un-shared and Persistent cache
2. Optimizing the lookup condition
Whenever multiple conditions are placed, the condition with equality sign
should take
precedence.
3. Indexing the lookup table
The cached lookup table should be indexed on order by columns. The session log
contains
the ORDER BY statement
The un-cached lookup since the server issues a SELECT statement for each row
passing
into lookup transformation, it is better to index the lookup table on the columns
in the
condition
Optimize Filter transformation:
You can improve the efficiency by filtering early in the data flow. Instead of using a
filter
transformation halfway through the mapping to remove a sizable amount of data.
Use a source qualifier filter to remove those same rows at the source,
If not possible to move the filter into SQ, move the filter transformation as close to
the
source
qualifier as possible to remove unnecessary data early in the data flow.
Optimize Aggregate transformation:
1. Group by simpler columns. Preferably numeric columns.
2. Use Sorted input. The sorted input decreases the use of aggregate caches. The
server
assumes all input data are sorted and as it reads it performs aggregate
calculations.
3. Use incremental aggregation in session property sheet.
Optimize Seq. Generator transformation:
1. Try creating a reusable Seq. Generator transformation and use it in multiple
mappings
2. The number of cached value property determines the number of values the
informatica
server caches at one time.
Optimize Expression transformation:
1. Factoring out common logic
2. Minimize aggregate function calls.
3. Replace common sub-expressions with local variables.
4. Use operators instead of functions.
4. Sessions
If you do not have a source, target, or mapping bottleneck, you may have a session
bottleneck.

You can identify a session bottleneck by using the performance details. The informatica
server
creates performance details when you enable Collect Performance Data on the General
Tab of
the session properties.
Performance details display information about each Source Qualifier, target definitions,
and
individual transformation. All transformations have some basic counters that indicate
the
Number of input rows, output rows, and error rows.
Any value other than zero in the readfromdisk and writetodisk counters for
Aggregate, Joiner,
or Rank transformations indicate a session bottleneck.
Low bufferInput_efficiency and BufferOutput_efficiency counter also indicate a
session
bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session
bottlenecks.
5. System (Networks)

18. How to improve the Session performance?


1 Run concurrent sessions
2 Partition session (Power center)
3. Tune Parameter DTM buffer pool, Buffer block size, Index cache size, data cache size,
Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be
increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and
Joiner
transformation. The server stores the transformed data from the above transformation in
the data
cache before returning it to the data flow. It stores group information for those
transformations in
index cache.
If the allocated data or index cache is not large enough to store the date, the server
stores the data
in a temporary disk file as it processes the session data. Each time the server pages to
the disk the
performance slows. This can be seen from the counters .
Since generally data cache is larger than the index cache, it has to be more than the
index.
4. Remove Staging area
5. Tune off Session recovery
6. Reduce error tracing
19. What are tracing levels?
Normal-default
Logs initialization and status information, errors encountered, skipped rows due to
transformation errors, summarizes session results but not at the row level.
Terse
Log initialization, error messages, notification of rejected data.
Verbose Init.
In addition to normal tracing levels, it also logs additional initialization information, names
of index and data files used and detailed transformation statistics.
Verbose Data.
In addition to Verbose init, It records row level logs.

20. What is Slowly changing dimensions?


Slowly changing dimensions are dimension tables that have slowly increasing data as well
as updates to existing data.
21. What are mapping parameters and variables?
A mapping parameter is a user definable constant that takes up a value before running a
session. It can be used in SQ expressions, Expression transformation etc.
Steps:
Define the parameter in the mapping designer - parameter & variables .
Use the parameter in the Expressions.
Define the values for the parameter in the parameter file.
A mapping variable is also defined similar to the parameter except that the value of the
variable is subjected to change.
It picks up the value in the following order.
1. From the Session parameter file
2. As stored in the repository object in the previous run.
3. As defined in the initial values in the designer.
4. Default values
Q. What are the output files that the Informatica server creates during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for all status and error
messages (default name: pm.server.log). It also creates an error log for error messages.
These files will be created in Informatica home directory
Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping. Session
detail includes information such as table name, number of rows written or rejected. You
can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file select
the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when you run a session
that uses the external loader. The control file contains the information about the target
flat file such as data format and loading instructions for the external loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipients. You can create two different
messages. One if the session completed successfully the other if the session fails.
Indicator file: If you use the flat file as a target, you can configure the Informatica server
to create indicator file. For each target row, the indicator file contains a number to
indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the target file
based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache
files.
For the following circumstances Informatica server creates index and data cache files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation

Q. What is the difference between joiner transformation and source qualifier


transformation?
A. You can join heterogeneous data sources in joiner transformation which we cannot do
in source qualifier transformation.
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the first row of a
data in a cached look up transformation. It allocates memory for the cache based on the
amount you configure in the transformation or session properties. The Informatica server
stores condition values in the index cache and output values in the data cache.
Q. What is meant by parameters and variables in Informatica and how it is
used?
A. Parameter: A mapping parameter represents a constant value that you can define
before running a session. A mapping parameter retains the same value throughout the
entire session.
Variable: A mapping variable represents a value that can change through the session.
Informatica Server saves the value of a mapping variable to the repository at the end of
each successful session run and uses that value the next time you run the session
Q. What is target load order?
You specify the target load order based on source qualifiers in a mapping. If you have
multiple source qualifiers connected to multiple targets, you can define the order in which
Informatica server loads data into the targets
nformatica is a leading data integration software. The products of the company support various
enterprise-wide data integration and data quality solutions including data warehousing, data migration,
data consolidation, data synchronization, data governance, master data management, and crossenterprise data integration.

The important Informatica Components are:


Power Exchange
Power Center
Power Center Connect
Power Exchange
Power Channel
Metadata Exchange
Power Analyzer
Super Glue
This section will contain some useful tips and tricks for optimizing informatica performance. This
includes some of the real time problems or errors and way to troubleshoot them, best prcatices etc.

Q1: Introduce Yourself.


Re: What is incremental aggregation and how it is done?
Answer When using incremental aggregation, you apply
# 4 captured

changes in the source to aggregate calculations in a


session. If the source changes only incrementally
and you

can capture changes, you can configure the session


to
process only those changes. This allows the
Informatica
Server to update your target incrementally, rather
than
forcing it to process the entire source and
recalculate the
same calculations each time you run the session.

Q2: What is datawarehousing?


a collection of data designed to support management decision making. Data warehouses contain
a wide variety of data that present a coherent picture of business conditions at a single point in
time.
Development of a data warehouse includes development of systems to extract data from
operating systems plus installation of a warehousedatabase system that provides managers
flexible access to the data.
The term data warehousing generally refers to the combination of many different databases
across an entire enterprise. Contrast with data mart.
Q3: What is the need of datawarehousing?
Q4: Diff b/w OLTP & OlAP
OLTP
Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary

Q5: Why do we use OLTP & OLAP


Q6: How to handle decimal in informatica while using flatfies?
while importing flat file definetion just specify the scale for a neumaric data type. in the mapping, the flat file source
supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as decimal
for that number port of the source.

Q7: Why do we use update stratgey?


Seession Properties like pre Souurce Rows
INSERT,UPDATE,REJECT,DELETE ,,

Using Session Properties We can do single flow only.


SCD aplicable for Insert,Update,,at a time using Update
Strategy trans only.
Using Update Trans we can creat SCD mapping easily.

-----------------

Actually its important to use a update strategy


transofmration in the SCD's as SCDs maintain some historical
data specially type 2 dimensions. In this case we may need
to flag rows from the same target for different database
operations. Hence we have no choice but to use update
strategy as at session level this will not be possible.

Q8: Can we use update strategy in flatfiles?

Data in flat file cannot be updated

Q9: If yes why? If not why?


Q10: What is junk dimension?
A junk dimension is a collection of random transactional codes or text attributes that are unrelated to any
particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk
attributes.

Q11 Diff between iif and decode?


You can use nested IIF statements to test

multiple conditions. The following example tests for various

conditions and returns 0 if sales is zero or negative:


IIF( SALES > 0 IIF( SALES < 50 SALARY1 IIF( SALES < 100 SALARY2 IIF( SALES < 200 SALARY3 BONUS))) 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how
you can use DECODE instead of IIF :
SALES > 0 and SALES < 50 SALARY1
SALES > 49 AND SALES < 100 SALARY2
SALES > 99 AND SALES < 200 SALARY3

Q12 Diff b/w co-related subquery and nested subquery

Correlated subquery runs once for each row selected by the outer query. It contains a reference to a value from
the row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. It does not contain any reference to the
outer query row.
For example
Correlated Subquery:
select e1.empname e1.basicsal e1.deptno from emp e1 where e1.basicsal (select max(basicsal) from emp e2
where e2.deptno e1.deptno)
Nested Subquery:
select empname basicsal deptno from emp where (deptno basicsal) in (select deptno max(basicsal) from emp
group by deptno)

Q13: What is Union?


The Union transformation is a multiple input group transformation that you use to merge data from multiple
pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the
UNION ALL SQL statement to combine the results from two or more SQL statements. Similar
to the UNION ALL statement the Union transformation does not remove duplicate rows.

The Integration Service processes all input groups in parallel. The Integration Service concurrently reads sources
connected to the Union transformation and pushes blocks of data into the input groups of
the transformation. The Union transformation processes the blocks of data based on the order it receives the
blocks from the Integration Service.
You can connect heterogeneous sources to a Union transformation. The Union transformation merges sources
with matching ports and outputs the data from one output group with the same ports as the input groups.

Q14: How to use union?


what is the difference between star schema and Snowflake Schema
Star Schema : Star Schema is a relational database schema for representing multimensional data. It
is the simplest form of data warehouse schema that contains one or more dimensions and fact
tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables
resembles a star where one fact table is connected to multiple dimensions. The center of the star schema
consists of a large fact table and it points towards the dimension tables. The advantage of star schema are
slicing down performance increase and easy understanding of data.
Snowflake Schema : A snowflake schema is a term that describes a star schema structure normalized through
the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables.
In a star schema every dimension will have a primary key.

In a star schema a dimension table will not have any parent table.
Whereas in a snow flake schema a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill
down the data from topmost hierachies to the lowermost hierarchies.

Q15: How many data sources are available?


Q16: What is scd:
scd-slowly changing dimension
It is the capturing the slowly changing data which changes
very slowly with respect to the time. for example: the
address of a custumer may change in rare case. the address
of a custumer never changes frequently.
there are 3 types of scd. type1 - here the most recent
changed data is stored
type2- here the recent data as well as all past data
(historical data) is stored
trpe3- here partially historical data and recent data are
stored. it mean it stores most recent update and most recent
history.
As datawarehouse is a historical data, so type2 is more
usefull for it.

Q17: Types of scd


Q18: How can we improve the session performance?
Re: How the informatica server increases the session performance through partitioning
the source?
Answer For a relational sources informatica server creates
# 1 multiple
connections for each parttion of a single source
and

extracts seperate range of data for each connection.


Informatica server reads multiple partitions of a
single
source concurently. Similarly for loading also
informatica
server creates multiple connections to the target
and loads
partitions of data concurently.
For XML and file sources,informatica server reads
multiple
files concurently. For loading the data informatica
server
creates a seperate file for each partition(of a
source
file).U can choose to merge the targets.

Q19:What do you mean by informatica?


Q20: Diff b/w dimensions and fact table

Dimension Table features


1. It provides the context /descriptive information for a fact table measurements.
2. Provides entry points to data.
3. Structure of Dimension - Surrogate key one or more other fields that compose the natural key (nk) and set of Attributes.
4. Size of Dimension Table is smaller than Fact Table.
5. In a schema more number of dimensions are presented than Fact Table.
6. Surrogate Key is used to prevent the primary key (pk) violation(store historical data).
7. Values of fields are in numeric and text representation.
Fact Table features
1. It provides measurement of an enterprise.
2. Measurement is the amount determined by observation.
3. Structure of Fact Table - foreign key (fk) Degenerated Dimension and Measurements.
4. Size of Fact Table is larger than Dimension Table.
5. In a schema less number of Fact Tables observed compared to Dimension Tables.
6. Compose of Degenerate Dimension fields act as Primary Key.
7. Values of the fields always in numeric or integer form.

Performance tuning in Informatica?


The goal of performance tuning is optimize session performance so sessions run during the available load window for
the Informatica Server.Increase the session performance by following.
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less
than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on
session performance.So aviod netwrok connections.
Flat files: If ur flat files stored on a machine other than the informatca server, move those files to the machine that consists
of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.
U can run the multiple informatica servers againist the same repository.Distibuting the session load to
multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement

mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single
table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure database connections.
If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case
drop constraints and indexes before u run the session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.
Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads
data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session
performance.
Aviod transformation errors to improve the session performance.
If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache.
If Ur session contains filter transformation ,create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data
before processing it.To improve session performance in this case use sorted ports option.

You can also perform the following tasks to optimize the mapping:
Configure single-pass reading.
Optimize datatype conversions.
Eliminate transformation errors.
Optimize transformations.
Optimize expressions.

RE: Why did you use stored procedure in your ETL Appli...
Click Here to view complete document
hi
usage of stored procedure has the following advantages
1checks the status of the target database
2drops and recreates indexes
3determines if enough space exists in the database
4performs aspecilized calculation
=======================================
Stored procedure in Informatica will be useful to impose complex business rules.
=======================================static cache:
1.static cache remains same during the session run
2.static can be used to relational and falt file lookup
types
3.static cache can be used to both unconnected and

connected lookup transformation


4.we can handle multiple matches in static cache
5.we can use other than relational operators like <,>,<=,>=
&=
Dynamic cache:
1.dynamic cache changes durig session run
2.dynamic cache can be used to only relational lookup types
3.Dynamic cache can be used to only connetced lookups
4.we cannot multiple matches in dynamic cache
5.we can use only = operator with dynamic cache.

Q. What is the difference between $ & $$ in mapping or parameter file? In


which cases they are generally used?
A. $ prefixes are used to denote session Parameter and variables and $$
prefixes are used to denote mapping parameters and variables.
how to connect two or more table with single source qualifier?
create a Oracle source with how much ever column you want
and write the join query in SQL query override. But the
column order and data type should be same as in the SQL query.
A set of worlflow tasks is called worklet
Workflow tasks means
1)timer2)decesion3)command4)eventwait5)eventrise6)mail etc......
But we r use diffrent situations by using this only
=======================================
Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use
worklets. To execute a Worklet it has to be placed inside a workflow.

The use of worklet in a workflow is similar to the use of mapplet in a mapping.


Worklet is reusable workflows. It might contain more than on task in it. We can use these worklets in

other workflows
Which will beter perform IIf or decode?
decode is better perform than iff condtion,decode can be
uesd insted of using multiple iff cases
DECODE FUNCTION YOU CAN FIND IN SQL BUT IIF FUNCTION IS NOT
IN SQL. DECODE FUNCTION WILL GIVE CLEAR READABILITY TO
UNDERSTAND THE LOGIC TO OTHER.

What is source qualifier transformation?


SQ is an active tramsformation. It performs one of the following task: to join data from the same source
database to filtr the rows when Power centre reads source data to perform an outer join to select only
distinct values from the source
In source qualifier transformatio a user can defined join conditons,filter the data and eliminating the
duplicates. The default source qualifier can over written by the above options, this is known as SQL
Override.

The source qualifier represents the records that the informatica server reads when it runs a session.
When we add a relational or a flat file source definition to a mapping,we need to connect it to a source
qualifier transformation.The source qualifier transformation represents the records that the informatica
server reads when it runs a session.

How many dimension tables did you had in your project and name some
dimensions (columns)?
Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number.
Distributor Dimension: Distributor key, Distributor Id, Distributor Location,
Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact
Account Dimension : Account Key, Acct id, acct type, Location, Balance,

What is meant by clustering?


It will join two (or more) tables in single buffer, will retrieve the data easily.

What are the rank caches?


the informatica server stores group information in an index catche and row data in data catche
when the server runs a session with a Rank transformation, it compares an input row with rows
with rows in data cache. If the input row out-ranks a stored row,the Informatica server replaces
the stored row with the input row.
During the session ,the informatica server compares an inout row with rows in the datacache. If
the input row out-ranks a stored row, the informatica server replaces the stored row with the input
row. The informatica server stores group information in an index cache and row data in a data
cache.
Q. What type of repositories can be created using Informatica Repository Manager?
A. Informatica PowerCenter includeds following type of repositories :

Standalone Repository : A repository that functions individually and this is unrelated to


any other repositories.

Global Repository : This is a centralized repository in a domain. This repository can


contain shared objects across the repositories in a domain. The objects are shared through global shortcuts.

Local Repository : Local repository is within a domain and its not a global repository.
Local repository can connect to a global repository using global shortcuts and can use objects in
its shared folders.

Versioned Repository : This can either be local or global repository but it allows version
control for the repository. A versioned repository can store multiple copies, or versions of an object. This
features allows to efficiently develop, test and deploy metadata in the production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more languages. The code page is
selected based on source of the data. For example if source contains Japanese text then the code page
should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to a specific
set of data that describes the characters the application recognizes. This influences the way that application
stores, receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:

IBM DB2

Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase
Teradata
Infomratica Mapping Designer

Q. How to execute PL/SQL script from Informatica mapping?


A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation
PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql
procedure.
Q. How can you define a transformation? What are different types of transformations available in
Informatica?
A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a
set of transformations that perform specific functions. For example, an Aggregator transformation performs
calculations on groups of data. Below are the various transformations available in Informatica:

Aggregator
Application Source Qualifier
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Output
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator

XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file
source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is
connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the
session. The default query is SELET statement containing all the source columns. Source Qualifier has
capability to override this default query by changing the default settings of the transformation properties.
The list of selected ports or the order they appear in the default query should not be changed in overridden
query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums.
Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations
on groups. The Expression transformation permits calculations on a row-by-row basis only.
Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the
data, the aggregator transformation outputs the last row of each group unless otherwise specified in the
transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN,
MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for
Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes
new source data through the mapping and uses historical cache data to perform new aggregation
calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to merge data from
various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is
used to combine result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It
compares lookup transformation ports (input ports) to the source column values based on the lookup
condition. Later returned values can be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is the difference between a connected look up and unconnected look up?
A. Connected lookup takes input values directly from other transformations in the pipleline.
Unconnected lookup doesnt take inputs directly from any other transformation, but it can be used in any
transformation (like expression) and can be invoked as a function using :LKP expression. So, an
unconnected lookup can be called multiple times in a mapping.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of
transformations and it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored
as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a
reusable transformation are made, all the mappings where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the
target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is
used. The condition can be specified in Update Strategy to mark the processed row for update or insert.
Following options are available for update strategy :

DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent
numeric value of DD_INSERT is 0.

DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent
numeric value of DD_UPDATE is 1.

DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent
numeric value of DD_DELETE is 2.

DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent
numeric value of DD_REJECT is 3.

Re: What are Anti joins


Answer Anti-joins:
are written using the NOT EXISTS or NOT
# 1 Anti-joins
IN
constructs. An anti-join between two tables returns
rows
from the first table for which there are no
corresponding
rows in the second table. In other words, it
returns rows
that fail to match the sub-query on the right side.
Suppose you want a list of departments with no
employees.
You could write a query like this:
SELECT
d.department_name
FROM departments d
MINUS
SELECT
d.department_name
FROM departments d, employees e
WHERE d.department_id = e.department_id
ORDER BY department_name;
The above query will give the desired results, but
it might
be clearer to write the query using an anti-join:
SELECT
d.department_name
FROM departments d
WHERE NOT EXISTS (SELECT NULL
FROM employees e
WHERE e.department_id =
d.department_id)
ORDER BY d.department_name;

Re: Without using any transformations how u can load


the data into target?
if i were the candidate i would simply say if there are no
transformations to be done, i will simply run an insert
script if the source and target can talk to each other. or
simply source -> source qualifier -> target. if the
interviewer says SQ is a transformation, then say "then i
dont know. i have always used informatica when there is
some kind of transformation involved because that is what

informatica is mainly used for".


What is a source qualifier?
What is a surrogate key?
What is difference between Mapplet and reusable transformation?
What is DTM session?
What is a Mapplet?
What is a look up function? What is default transformation for the look up function?
What is difference between a connected look up and unconnected look up?
What is up date strategy and what are the options for update strategy?
What is subject area?
What is the difference between truncate and delete statements?
What kind of Update strategies are normally used (Type 1, 2 & 3) & what are the differences?
What is the exact syntax of an update strategy?
What are bitmap indexes and how and why are they used?
What is bulk bind? How does it improve performance?
What are the different ways to filter rows using Informatica transformations?
What is referential Integrity error? How do you rectify it?
What is DTM process?
What is target load order?
What exactly is a shortcut and how do you use it?
What is a shared folder?
What are the different transformations where you can use a SQL override?
What is the difference between a Bulk and Normal mode and where exactly is it defined?
What is the difference between Local & Global repository?
What are data driven sessions?
What are the common errors while running a Informatica session?
What are worklets and what is their use?
What is change data capture?
What exactly is tracing level?
What is the difference between constraints based load ordering and target load plan?
What is a deployment group and what is its use?
When and how a partition is defined using Informatica?
How do you improve performance in an Update strategy?
How do you validate all the mappings in the repository at once?
How can you join two or more tables without using the source qualifier override SQL or a Joiner
transformation?
How can you define a transformation? What are different types of transformations in Informatica?
How many repositories can be created in Informatica?

How many minimum groups can be defined in a Router transformation?


How do you define partitions in Informatica?
How can you improve performance in an Aggregator transformation?
How does the Informatica know that the input is sorted?
How many worklets can be defined within a workflow?
How do you define a parameter file? Give an example of its use.
If you join two or more tables and then pull out about two columns from each table into the source
qualifier and then just pull out one column from the source qualifier into an Expression transformation
and then do a generate SQL in the source qualifier how many columns will show up in the generated SQL.
In a Type 1 mapping with one source and one target table what is the minimum number of update
strategy transformations to be used?
At what levels can you define parameter files and what is the order?
In a session log file where can you find the reader and the writer details?
For joining three heterogeneous tables how many joiner transformations are required?
Can you look up a flat file using Informatica?
While running a session what default files are created?
Describe the use of Materialized views and how are they different from a normal view.
Contributed by Mukherjee, Saibal (ETL Consultant)
Many readers are asking Wheres the answer? Well it will take some time before I get time to write it
But there is no reason to get upset The informatica help files should have all of these answers!
Posted in ETL Tools, Informatica, Informatica FAQs, Interview FAQs,Uncategorized | 26 Comments

Loading & testing fact/transactional/balances (data), which is valid between dates!


Tuesday, July 25th, 2006

This is going to be a very interesting topic for ETL & Data modelers who design processes/tables to load
fact or transactional data which keeps on changing between dates.

ex: prices of shares, Company

ratings, etc.

The table above shows an entity in the source system that contains time variant values but they dont change daily. The
values are valid over a period of time; then they change.

1 .What the table structure should be used in the data warehouse?

Maybe Ralph Kimball or Bill Inmon can come with better data model!

But for ETL developers or ETL leads the

decision is already made so lets look for a solution.


2. What should be the ETL design to load such a structure?
Design A

There is one to one relationship between the source row and the target row.

There is a CURRENT_FLAG attribute, that means every time the ETL process get a new value it has add a
new row with current flag and go to the previous row and retire it. Now this step is a very costly ETL step
it will slow down the ETL process.

From the report writer issue this model is a major challange to use. Because what if the report wants a
rate which is not current. Imagine the complex query.
Design B

In this design the sanpshot of the source table is taken every day.

The ETL is very easy. But can you imagine the size of fact table when the source which has more than 1
million rows in the source table. (1 million x 365 days = ? rows per year). And what if the change in
values are in hours or minutes?

But you have a very happy user who can write SQL reports very easily.
Design C

Can there be a comprimise. How about using from date (time) to date (time)! The report write can
simply provide a date (time) and the straight SQL can return a value/row that was valid at that moment.

However the ETL is indeed complex as the A model. Because while the current row will be from current
date to- infinity. The previous row has to be retired to from date to todays date -1.

This kind of ETL coding also creates lots of testing issues as you want to make sure that for nay given
date and time only one instance of the row exists (for the primary key).
Which design is better, I have used all depending on the situtation.
3. What should be the unit test plan?
There are various cases where the ETL can miss and when planning for test cases and your plan should be to precisely
test those. Here are some examples of test plans
a. There should be only one value for a given date/date time
b. During the initial load when the data is available for multiple days the process should go sequential and create
snapshots/ranges correctly.
c. At any given time there should be only one current row .
d. etc

Das könnte Ihnen auch gefallen