Sie sind auf Seite 1von 16

Informatica- Complex Scenarios and their Solutions

Author(s)

• Aatish Thulasee Das


• Rohan Vaishampayan
• Vishal Raj

Date written(MM/DD/YY): 07/01/2003

Declaration

We hereby declare that this document is based on our personal experiences and / or
experiences of our project members. To the best of our knowledge, this document
does not contain any material that infringes the copyrights of any other individual or
organization including the customers of Infosys.
Aatish Thulasee Das, Rohan Vaishampayan, Vishal Raj

Project Details

• Projects involved: REYREY


• H/W Platform: 516 RAM, Microsoft Windows 2000
• S/W Environment: Informatica
• Appln. Type: ETL tool
• Project Type : Dataware housing

Target readers: Datawarehousing team using ETL tools

Keywords

ETL Tools, Informatica, Dataware Housing

INDEX
INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS...........................1
Author(s)........................................................................................................................1
Aatish Thulasee Das.......................................................................................................1
Rohan Vaishampayan.....................................................................................................1
Vishal Raj.......................................................................................................................1
Date written(MM/DD/YY): 07/01/2003........................................................................1
Declaration.....................................................................................................................1
Project Details ...............................................................................................................1
Target readers: Datawarehousing team using ETL tools...............................................1
Keywords ......................................................................................................................1
INTRODUCTION......................................................................................................................3
SCENARIOS:..............................................................................................................................3
1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE
SOURCES AND TARGETS.........................................................................................................3
1.1 Background...............................................................................................................3
1.2 Problem Scenario:.....................................................................................................3
1.3 Solution:...................................................................................................................3
Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple
source and Targets) in to simple mappings with one source and one target. That will
greatly help in managing the mappings. Also all the related mappings can be executed
in parallel in different sessions. Each session will establish it’s own connection and the
server can handle all the requests in parallel against the multiple targets. Each session
can be placed in to the Batch and run in ‘CONCURRENT’ mode.................................3
2. WHEN SOURCE DATA IS FLAT FILE...............................................................................4
2.1 Background..............................................................................................................4
What is a Flat File?........................................................................................................4
A Flat file is one in which table data is gathered in lines of ASCII text with the value
from each table cell separated by a delimiter or space and each row represented with a
new line...........................................................................................................................4
Below is the sample Flat File which was used during the project.................................4
........................................................................................................................................4
Fig 2.1: In_Daily - Flat File...........................................................................................4
2.2 Problem Scenario......................................................................................................4
When the above flat file was loaded into Informatica the Source analyzer was like
shown below....................................................................................................................4
........................................................................................................................................5
Fig 2.2: In_Daily - Flat File after loading into Informatica...........................................5
Two Issues which were encountered during loading the above shown flat files are as
following:........................................................................................................................5
2.3 Solution....................................................................................................................5
Following is the solution which was incorporated by us to solve the above problem ..5
1. Since the data was so heterogeneous we decided to keep all the data types in the
source qualifier as “String” and changed them as per the fields in which they were
getting mapped................................................................................................................5
2. Regarding the Size of the fields we changed the size to the maximum possible size
for example as mentioned...............................................................................................6
............................................................................................................................6
3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD
SETS..............................................................................................................................................6
4. TOO LARGE LOOKUP TABLES:.......................................................................................8
5 COMPLEX LOGIC FOR SEQUENCE GENERATION:.................................................12
Introduction

This Document is based upon learning that we had during the work on project ‘Reynolds
and Reynolds’ in CAPS (PCC), Pune. We have come up with the Best Practices to
overcome the complex scenarios we faced during the ETL process. This Document also
tells about some common best practices to follow while developing the mappings.

Scenarios:

1. Performance problems when a mapping contains multiple


sources and Targets.

1.1 Background

In Informatica, multiple sources can be mapped with the multiple targets. This
property is quite useful to map the relative mappings at one place. This reduces
the creation of multiple sessions. Also all the relative loading takes place in one
go. It is quite logical to group the different sources and targets in same mapping
that contains the same logic.

1.2 Problem Scenario:

In the multiple target scenarios, if there are some complex transformations in


some of the sub mappings then the performance is degraded drastically. In this
scenario the single database connection is handling multiple database statements.
Also it is difficult to manage the mapping. For example if there is performance
problem due to one of the sub mapping then other sub mapping will also suffer
the performance degradation.

1.3 Solution:

Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple
source and Targets) in to simple mappings with one source and one target. That
will greatly help in managing the mappings. Also all the related mappings can be
executed in parallel in different sessions. Each session will establish it’s own
connection and the server can handle all the requests in parallel against the
multiple targets. Each session can be placed in to the Batch and run in
‘CONCURRENT’ mode.
2. When source data is Flat File

2.1 Background

What is a Flat File?

A Flat file is one in which table data is gathered in lines of ASCII text with the value from
each table cell separated by a delimiter or space and each row represented with a new
line.

Below is the sample Flat File which was used during the project.

Fig 2.1: In_Daily - Flat File.

2.2 Problem Scenario

When the above flat file was loaded into Informatica the Source analyzer was like shown
below
Fig 2.2: In_Daily - Flat File after loading into Informatica.

Two Issues which were encountered during loading the above shown flat files are as
following:

1. Data types of the fields from the flat file and the respective fields from Target
tables were not matching. For example refer to Fig 2.1 in the First row i.e. record
corresponding to “BH” the Fourth field is having its Data type as “Date” also refer the
Third row i.e. field corresponding to “CR” the fourth field is “Char” and in the target
table the corresponding field was having data type as “Char”.

2. Size of the fields from the flat file and the respective fields from Target tables
were not matching. For example refer to Fig 2.1 the Eighth row i.e. record
corresponding to “QR” the fifth field is having its Field size as 100 but after the
loading process the source analyzer showed the size of the field equal to 45 (as
shown in the Fig 2.2) also the fifth field corresponding to “CR” is 5 and in the target
table the corresponding field was having size equal to 100.

2.3 Solution

Following is the solution which was incorporated by us to solve the above problem

1. Since the data was so heterogeneous we decided to keep all the data types in
the source qualifier as “String” and changed them as per the fields in which they
were getting mapped.
2. Regarding the Size of the fields we changed the size to the maximum possible
size for example as mentioned

3 Extracting data from the flat file containing nested record


sets.

3.1 Background:

The Flat file shown in the previous section (fig 2.2) contains the nested record set. To
explain the nested formation of the record of the above file is restructured in the Fig 3.1.
Level 3

Level 2

Level 1

Fig 3.1: In_Daily - Flat File restructured in the Nested form.

Here the data is in 3 levels. First level of data is containing the Batch File
information starting with record BH and ending with BT record. The second level
of data is containing Dealer records in the batch file, starting with record DH and
ending with DT. The third level of data is containing information of different
activities for a particular dealer.

3.2 Problem Scenario:

The data required for loading was in the form such that a single row should
consist of dealer detail as well as different activities done by the particular dealer.
But only the second level data (i.e. 2nd and 14th rows in the flat file shown above)
contain the different dealer details and the Third level of data contains the
different activity details for dealers. Both the data required to be concatenated to
form single information to load in a single row of target table.

3.3 Solution:

In this particular kind of scenario, the dealer information data (Second Level data)
should be stored into variables by putting the condition that satisfies the dealer
information. This row should be filtered in the next transformation. So, for that
particular row of flat file (i.e. dealer information) the data is stored in the
variables. And for the dealer’s activity data (Third Level Data), row should be
passed to next transformation with the Dealer Information that was stored in the
variable during previous row load.
The same is done here:

4. Too Large Lookup Tables:

4.1 Background:

What is a Lookup Transformation?

A Lookup transformation is used in your mapping to look up data in a relational table,


view, or synonym (See 4.1). Import a lookup definition from any relational database to
which both the Informatica Client and Server can connect. Multiple Lookup
transformations can be used in a mapping.

The Informatica Server queries the lookup table based on the lookup ports in the
transformation (See Fig 4.2). It compares Lookup transformation port values to lookup
table column values based on the lookup condition. Use the result of the lookup to pass
to other transformations and the target.
You can use the Lookup transformation to perform many tasks, including:

• Get a related value. For example, if your source table includes employee ID, but you
want to include the employee name in your target table to make your summary data
easier to read.
• Perform a calculation. Many normalized tables include values used in a calculation,
such as gross sales per invoice or sales tax, but not the calculated value (such as net
sales).
• Update slowly changing dimension tables. You can use a Lookup transformation to
determine whether records already exist in the target.

(The actual screens are attached for reference.)

Fig 4.1: LOOKUP is a kind of Transformation.


Lookup Conditions

Fig 4.2: The Lookup Conditions to be specified in order to get Lookup Values.

4.2 Problem Scenario:

In the project one of the mappings had large lookup tables that were hampering the
performance of the mapping as

a. They were consuming a lot of cache memory unnecessarily and


b. More time was spent in searching for relatively less number of values from a large
lookup table.

Thus the loading of data from source table(s) to the target table(s) was unnecessarily
consuming more time than it should normally do.

4.3 Our Solution:

We eliminated the first problem by simply using the lookup table as one of the source
table itself. The source tables & target tables are not cached in Informatica and hence it
made sense to use the large lookup table as a source. (See Fig 4.3) This also ensured
that Cache memory would not be wasted unnecessarily and could be used for other
tasks.
Multiple Source Tables Joined in the Source Qualifier

Source Qualifier

Fig 4.3: The Mapping showing the use of Lookup table as a Source table.
SQL to join the tables
User Defined Join
Fig 3.4: The use of Join condition in the Source Qualifier.

After using the lookup table as a source we used a joiner condition in the Source
Qualifier. This reduced the searching time that was taken by Informatica as the numbers
of rows to be searched were drastically reduced since the join condition takes care of the
excess rows which would otherwise have been there in the Lookup transformation. Thus
the second problem was also successfully eliminated.

5 Complex logic for Sequence Generation:

5.1 Background:

What is a Sequence Generator?


A sequence generator is transformation that generates a sequence of numbers once you
specify a starting value (see Fig 2.2) and the increment by which to increment this
starting value. (The actual screens are attached for reference.)
Fig 5.1: The Sequence Generator is a kind of Transformation.
Fig 5.2: The Transformation details to be filled in order to generate a sequence.

5.2 Problem Scenario: In the project one of the mappings had two requirements viz.,

a. During the transfer of Data to a column of a Target Table the Sequence Generator
was required to trigger only selectively. But as per it’s property, every time a row gets
loaded into the Target Table the sequence generator is triggered.

b. Another requirement was that the sequences of numbers generated by the Sequence
Generator were required to be in order.

For e.g.: The values that were to be loaded in the column of the target table were either
sequence generated or obtained from a lookup table. So whenever the lookup condition
returned a value that value would populate the Target Table but at the same time the
Sequence Generator would also trigger and hence increment by 1 so its CURRVAL
(current value, see Fig 5.1) would be increment by 1. So when the next value is loaded in
the column of the target table the difference between the sequence generated values
would be 2 instead of 1. Thus the generated sequence won’t be continuous and there
would be gaps or holes in the sequence.
5.2 Our Solution:

A basic rule for the Sequence Generator is that if a row gets loaded into the Target table
the sequence generator gets triggered. In order to prevent the sequence generator from
triggering we created two instances of the same target table. (See Fig 5.3)

Sequence Generator Target Table (Second Instance)

Lookup Table Target Table (First Instance)

Fig 5.3: The Mapping showing two instances of the same Target table.

The sequence generator was mapped to the column in the Target Table in the first
instance (See Fig 5.3) whereas the value returned from the Lookup Table (if any) was
mapped to the same column in the Target table in the second instance (See Fig5.3).

And all the other values for the remaining columns in the Target Table were filtered on
the basis of the value returned from the Lookup Table i.e. if the lookup table returned a
value then a row in the second instance of the target table would get populated and thus
the sequence generator wont be triggered.

If the lookup table returns a null value then a row would get populated in the first instance
of the target table and in this case the sequence generator would trigger and its value
would get loaded in the column of the Target Table.
Thus by achieving control over the triggering of the sequence generator we could avoid
the “holes” or gaps in the sequence generated by the Sequence generator.