Sie sind auf Seite 1von 20

Informatica Best Practices and Lessons

Learnt
Contents
1. INTRODUCTION....................................................................................................................... 4
1.1. PURPOSE........................................................................................................................................4
2. EFFICIENT USAGE OF TRANSFORMATIONS..................................................................5
2.1. SOURCE QUALIFIER TRANSFORMATION.........................................................................................5
2.2. EXPRESSION TRANSFORMATION.....................................................................................................6
2.2.1. NUMERIC OPERATIONS OVER STRING OPERATION.....................................................................6
2.2.2. OPTIMISE CHAR-VARCHAR COMPARISONS.................................................................................6
2.2.3. USING OPERATORS INSTEAD OF FUNCTIONS............................................................................6
2.2.4. USE INTEGER VALUES FOR COMPARISON...................................................................................7
2.2.5. OPTIMISE IIF FUNCTIONS..........................................................................................................7
2.2.6. VARIABLES................................................................................................................................7
2.3. LOOKUP TRANSFORMATION...........................................................................................................8
2.3.1. CACHE LOOKUPS.......................................................................................................................9
2.3.1.1. SHARED CACHE....................................................................................................................9
2.3.1.2. PERSISTENT CACHE..............................................................................................................9
2.3.2. OPTIMIZING LOOKUP CONDITION...........................................................................................10
2.3.3. LOOKUP TYPES........................................................................................................................10
2.4. AGGREGATOR TRANSFORMATION................................................................................................11
2.5. FILTER & ROUTER TRANSFORMATIONS.......................................................................................11
2.6. JOINER TRANSFORMATION...........................................................................................................12
2.7. SEQUENCE GENERATOR................................................................................................................12
2.8. UPDATE STRATEGY......................................................................................................................14
2.9. UNION TRANSFORMATION...........................................................................................................15
2.10. SQL TRANSFORMATION...............................................................................................................15
2.11. PUSHDOWN OPTIMIZATION..........................................................................................................16
2.12. PERFORMANCE AUDIT CHECKLIST..............................................................................................17
3. SHORTCUTS, REUSABLE OBJECTS & MAPPLETS.....................................................18
3.1. BEST PRACTICES..........................................................................................................................18
3.2. MAPPLETS....................................................................................................................................19
4. WORKING WITH LINKS................................................................................................... 20
4.1. LINK CONDITIONS........................................................................................................................20
4.1.1. EXAMPLE OF LINK CONDITIONS.............................................................................................20
5. REFERENCES................................................................................................................... 21

Page 2 of 20
1. Introduction

1.1. Purpose
This document collates the best practices and lessons learnt on the ETL tool Informatica. The
best practices are on the basis of Informatica recommendation and TCS’s long experience in
handling various Informatica projects. This document details different transformations which can
be used optimally for effective development and also describes usage of shortcuts which enables
the developer re-use existing the logic available

Page 3 of 20
2. Efficient Usage of transformations

Power Center environments vary widely from organization to organization, most sessions and
mappings can benefit from the implementation of common objects and optimization procedures.
Following are some of the tips that can be followed while developing mapping to help ensure
optimization.

 Reduce the number of transformations in a mapping, since there is an overhead involved in


moving data between transformations.
 Calculate once, use many times - Avoid calculating or testing the same value multiple times in
the transformations. Within an expression, use variables to calculate a value used several
times
 Facilitate Transformation reuse -Use mapping parameters and variables, mapplets to
encapsulate multiple reusable transformations.
 Use Active transformations near the source (i.e., placing filters, aggregators as close to
source as possible), which reduces the number of records to be processed
 Select appropriate driving/master table while using joins. The table with the lesser number of
rows should be the driving/master table.
 Use Flat Files - Using flat files located on the server machine loads faster than a database
located in the server machine. Fixed-width files are faster to load than delimited files because
delimited files require extra parsing.
 Use Shortcuts- Common code used across applications to be maintained in shared folder. It
eases maintenance, code changes and to leverage the existing logic
 Create a permanent cache for a group of reusable lookup: In case there is a need to perform
multiple lookup on the same table with in a same mapping or different mapping, it is
recommended to have the reusable lookup with persistent cache enabled

The following are some of the transformation specific best practices that can be followed
for mapping development

2.1. Source Qualifier Transformation

In the source qualifier transformation, it is recommended to ensure good performance by utilising


the inherent power of the source system used. If the source is a flat file there is little that can be
done to enhance performance as all joins and sorting need to be done in PowerCenter Server.
However if the source system is a database then the power of the database can be used to
maximise the performance of the mapping. The following are the general rules that can be
followed to block the unwanted rows from processing
 Data Filtration
 Data Joining
 Data Sorting
 Data Aggregation
 Utilise single-pass reads.

a. Data Filteration- It’s a good practice to minimize as far as possible the amount of data fetched
from the source applications. In case of date driven extracts, it is preferred to use date parameter
or “date > sysdate – (interval)” to limit the records extracted. The filter condition for source data
extraction should be added to the source qualifier in the filter condition box

Page 4 of 20
b. Data Joining- Informatica Joiner transformation utilizes a large amount of disk space and
requires much processing particularly if the data is not be pre-sorted. Joining table fields in a
database enables the utilization of indexes and hints so as to return the data as efficiently as
possible.
c. Data Sorting- In case there is an aggregator or joiner transformation in an mapping, it is
recommended to sort data in the source qualifier This can be done in source qualifier by moving
the ports to be sorted to the top of the source qualifier and then defining in the properties tab, the
number of ports to be sorted. If two ports are designated as the number of sorted ports then
Informatica will automatically generate a sql statement with an order by clause using the two
ports.
d. Data Aggregation- In case data need to be aggregated in the source qualifier, override sql is
required.
e. Utilize single-pass reads- Single-pass reading is the server’s ability to use one Source
Qualifier to populate multiple targets. In case same source table populates multiple target tables,
it is recommended to have one source qualifier, rather having different streams of source qualifier
Note: Great caution should be exercised for source qualifier sql overrides. Source qualifier ports
must be in the same order as the fields defined in the sql override and every port of source
qualifier must be connected to another transformation in the same mapping. If the query has to be
changed, ensure that the new port/field order is maintained.
In Humana, it is recommended to define sql overrirde in the mapping level. This ensures
consitency across mappings, since tool provides the capability to provide the sql override both at
the mapping and session level. It is recommended also to parameterize the table schema name,
in case there is a change in the table schema, it will not impact the mapping. In case of complex
queries especially where large volumes of data exist in the tables, it is advisable to run the query
in database to understand the explain plan and later the hint can be applied in the sql override

2.2. Expression transformation

2.2.1. Numeric operations over string operation


Numeric operations are faster than string operations.The Integration Service processes numeric
operations faster than string operations. For e.g: If lookup has to be done on EMPLOYEE_NAME
and EMPLOYEE_ID, it is recommended to lookup around EMPLOYEE_ID which improves
performance.

2.2.2. Optimise char-varchar comparisons


When the Integration Service performs comparisons between CHAR and VARCHAR columns, it
slows each time it finds trailing blank spaces in the row. It is recommended to take care for a
char-varchar comparison by trimming spaces before comparing.

2.2.3. Using Operators Instead of Functions


The Integration Service reads expressions written with operators faster than expressions with
functions. Where possible, use operators to write expressions. For example, if there is an
expression that contains nested CONCAT functions:
CONCAT(CONCAT( CUSTOMERS.FIRST_NAME, ‘ ’) CUSTOMERS.LAST_NAME)
It is recommended to rewrite the expression with the || operator as follows:
CUSTOMERS.FIRST_NAME || ‘ ’ || CUSTOMERS.LAST_NAME

Page 5 of 20
2.2.4. Use integer values for comparison
Use integer values in place of other datatypes for performing comparisons using Lookup and
Filter transformations. For example, many databases store U.S. ZIP code information as a Char
or Varchar datatype. If the zip code data type is converted to an Integer datatype, the lookup
database stores the zip code “94303-1234” as 943031234. This increase the speed of the lookup
comparisons based on zip code.

2.2.5. Optimise IIF functions


IIF functions can return a value and an action, which allows for more compact expressions. For
example, there is a source with three Y/N flags: FLG_A, FLG_B, FLG_C. It is a requirement to
return values based on certain conditions
IF all the three flags (FLG_A, FLG_B, FLG_C) = Y THEN (VAL_A+ VAL_B+VAL_C)
IF two flags (FLG_A, FLG_B) = Y AND FLG_C = N THEN (VAL_A+ VAL_B)
IF two flags (FLG_A, FLG_C) = Y AND FLG_B =N THEN (VAL_A+ VAL_C)
IF two flags (FLG_B, FLG_C) = Y AND FLG_A = N THEN (VAL_B+ VAL_C)

This can be done in the following way using expression transformation:

 Option 1
IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_A + VAL_B + VAL_C, IIF( FLG_A =
'Y' and FLG_B = 'Y' AND FLG_C = 'N', VAL_A + VAL_B , IIF( FLG_A = 'Y' and FLG_B = 'N'
AND FLG_C = 'Y', VAL_A + VAL_C, IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',
VAL_A , IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_B + VAL_C, IIF( FLG_A =
'N' and FLG_B = 'Y' AND FLG_C = 'N', VAL_B , IIF( FLG_A = 'N' and FLG_B = 'N' AND
FLG_C = 'Y', VAL_C, IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N', 0.0, ))))))))
This expression requires 8 IIFs, 16 ANDs, and at least 24 comparisons.

 Option2
IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.0)
This results in three IIFs, two comparisons, two additions, and a faster session.

It is recommended to follow the option 2 as part of effective development

2.2.6. Variables
Variables are very important in mappings to temporary assign values, which can be reused within
the expression transformation for the other output ports. If same calculation is required at multiple
times in the same transformation, it is recommended to do the calculation in the local variable.
However Informatica evaluates input ports first, followed by variables port in the order they
appear in the transformation and the output ports in the end. This feature can be used to good
effect for running sum type calculations. For e.g.

Example 1

Port_Name Port-Type Expression


Balance_in I N/A
Cust_id I/O N/A
V_Balance V iif(v_cust_id = cust_id, v_balance+balance_in, balance_in)
V_cust_id V Cust_id
O_balance O V_balance

Page 6 of 20
In this example the balance is calculated by determining if the incoming cust_id is the same as
the cust_id in the previous record. If cust_id is same, a running sum is performed by adding the
balance_in to the total balance in the previous record. Once the balance calculation is performed
the v_cust_id values is reset to the values in the current row ready for the next row execution

Example 2

Port_Name Port-Type Expression


Balance_in I N/A
Cust_id I N/A
V_Cust_type V Lkp:lkp_cust_type(Cust_id)
Customer_type O Decode(v_Cust_type, ‘a’, ’personal’, ‘b’,
’corporate’)
Balance O iif(v_Cust_type = ‘a’,1,0)

In this example v_Cust_type is evaluated before the output ports, so the value held will be that of
the current record. This value is then used to evaluate two further output ports. In this technique
lookup is called once to perform both the operations (Customer_type, Balance)

2.3. Lookup Transformation

Lookup transformation is used in a mapping for data lookups in a flat file, relational table, views,
or synonym. Lookup definition can be imported from any flat file or relational database to which
both the PowerCenter Client and Integration Service can connect. Lookup can be both connected
/ unconnected. The Integration Service queries the lookup source based on the lookup ports in
the transformation and a lookup condition. The Lookup transformation returns the result of the
lookup to the target or another transformation. It is recommended to define the sql override to
limit the number of rows cached on to the server from the database. It provides four options on
multiple value returns from the lookup (Use first value, last value, Any value, Report Error

The following tasks can be performed with a Lookup transformation:

 Get a related value Retrieve a value from the lookup table based on a value in the source.
For example, the source has an employee ID. Retrieve the employee name from the lookup
table.
 Perform a calculation Retrieve a value from a lookup table and then use it in a calculation.
For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
 Update slowly changing dimension tables. Determine whether rows exist in a target an
accordingly apply the insert/ update to the target table
Lookup can be optimized in the following ways:-
a. Caching Lookups :
 Shared Cache
 Persistent Cache
b. Reducing number of Cached Rows
c. Optimizing Lookup condition

Page 7 of 20
2.3.1. Cache lookups
Lookups can generate a cache file which persists for the duration of the session, or even created
as a permanent named cache. This means that the data required for the lookup is read from its
source only once. Informatica creates an index on the lookup cache minimizing the processing
time for calls to the cache.

2.3.1.1. Shared Cache


Lookup cache can be shared between multiple transformations. Unnamed cache can be shared
between transformations with in the same mapping. Named cache can be shared between
transformations in the same or different mappings.

2.3.1.2. Persistent Cache


Persistent lookups are used when there is a need to save and reuse the existing cache files.
Persistent cache is used when a lookup table does not change between session runs. The first
time the Informatica Server runs a session using a persistent lookup cache; it saves the cache
files to disk instead of deleting them on session completion. The next time the Informatica Server
runs the session which calls the same persistent lookup, it builds the memory cache from the
cache files, eliminating the time required to read the lookup table. If the lookup table changes
occasionally, you can override session properties to recache the lookup from the database.

Advantages:
 Informatica Server uses existing cache files for subsequent lookups, eliminating the time
required to built the cache again
 Cache file can be shared
 Persistent lookup cache can be used for same multiple lookup call with in the same mapping /
different mappings

Disadvantages:
 Cache needs to be rebuilt if lookup table changes occasionally
 Cache needs to be rebuilt if lookup transformation is changed or data movement code is
changed or database connection information is changed
 Lookup Sql override, has to be same in all lookups using same persistent cache file, else the
mapping will fail

In case there are multiple interdependent jobs in a workflow, which uses same persistent lookup
cache, it is recommended to have dummy jobs for persistent cache file generation in the
beginning of workflow. This eases the support and maintenance and jobs execution time. As a
practice, please ensure to delete the existing cache file prior to re-building the cache. Informatica
internally handles deletion of existing cache files; it is still preferred to have the UNIX script for
deletion. (For e.g.: In case session for building lookup cache is disabled or is not executed, then
the subsequent session will use the same cache file available on the server. The addition of UNIX
scripts job will ensure files are deleted; this can be done through command task

Page 8 of 20
2.3.2. Optimizing Lookup Condition
 Condition Order: When using a Lookup Table Transformation, improve lookup performance
by placing all conditions that use the equality operator ‘=’ first in the list of conditions under
the condition tab.
Place the conditions in the following order to optimize lookup performance:
 Equal to (=)
 Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
 Not equal to (!=)

Override sql: Care should be taken when using override sql as the query will need to be re -
written if the required lookup fields change. Also the override sql’ select list should match with the
look up ports in the look up transformation. The use of indexes on the source of the lookup will
enable optimization of any lookup sql. The indexes should be created on the fields used for the
conditions of the lookup.

2.3.3. Lookup types


Following are the different lookup types.
 Connected lookup
 Unconnected lookup
 Static Lookup
 Dynamic Lookup

Connected Lookup
Connected lookup received input values directly from the pipeline. It can return multiple columns
and can use dynamic lookup cache.
Unconnected Lookup
Unconnected lookup exists separate from the pipeline in the mapping, output of this lookup is
called using reference qualifier while connected lookup exists in the pipeline of the mapping and
its output can be simply flown out like any other transformation. Unconnected lookup can output
only one port. Though both of these lookup types can be used interchangeably, following are the
condition where unconnected lookup is preferably used
 Calling same lookup multiple times within the same mapping, for different conditions
 Only one value is required from the lookup table

Dynamic Lookup
Dynamic lookup is used when the target table is also the lookup table. When you use a dynamic
cache, the Informatica Server updates the lookup cache as it passes rows to the target.

Static Lookup
In case of static lookup, Informatica Server builds the look up cache when it processes the first
lookup request. It queries the cache based on the lookup condition for each row that passes into
the transformation.

Page 9 of 20
How to make use of Dynamic lookup for the sequence number in Informatica
Sometimes there is a need to create a generated key for a column in the target table. For lookup
ports with an Integer or Small Integer datatype, you can associate a generated key instead of an
input port. To do this, select Sequence-ID in the Associated Port column. When Sequence-ID in
the Associated Port column is selected, the Informatica Server generates a key when it inserts a
row into the lookup cache. Map the lookup/output ports to the target to ensure that the lookup
cache and target are synchronized.

2.4. Aggregator Transformation

The Aggregator transformation performs aggregate calculations, such as averages and


sums. The Integration Service performs aggregate calculations as it reads and stores data
group and row data in an aggregate cache. It creates index cache for storing group values
and data cache stores calculation based on the group by ports

Aggregator transformations can be optimized by performing the following tasks:


 Group by simple columns. When possible, use numbers instead of string and dates in
the columns used for the GROUP BY. Avoid complex expressions in the Aggregator
expressions.
 Use sorted input wherever possible ensuring that the group by ports are in the same
order at the top of the transformation as the order by clause of the incoming data.
 Use incremental aggregation.
 Minimise aggregate function calls. Optimise the expression e.g. sum (A) + sum (B) is
more complex but gives the same results as sum (A+B).
 Filter the data before aggregation it. It is recommended to have filter transformation in
the mapping, prior to the Aggregator transformation. This reduces unnecessary
aggregation.
 Limit the number of connected input/output or output ports to the ports required in the
target table. This reduces the amount of data the Aggregator transformation stores in
the data cache.

2.5. Filter & Router Transformations

A filter transformation an active transformation may change the number of rows which pass
through it. The filter transformation allows rows that meet the specified filter condition to pass
through. It drops rows that do not meet the condition. It allows to filter data based on one or more
conditions.

However Router transformation evaluates data based on one or more conditions and routes the
rows of data that meet each condition into a separate output group. It creates default output
group for the rows that do not meet any of the filter conditions. In Router transformation, one or
more router groups can be defined. All the groups have the same number of ports. Two different
router groups can not be connected to the same transformation.

Page 10 of 20
Following measures can be taken to obtain optimal performance for filter and router.
a) Remove error records. The filter or router should be used to ensure that any records in error
are not passed forward for further complex transformations
b) Use this transformation as near to source as possible. This reduces the numbers of rows
passed to the next transformation and ensures that processing time is not wasted on records
that will only be filtered out later
c) If multiple filters are required consider using a router. This makes the mapping process easier
to follow.

2.6. Joiner Transformation

Joiner transformations require additional space at run time to hold intermediate results. Joiner
transformations need a data cache to hold the master table rows and an index cache to hold
the join columns from the master table. It is important to ensure sufficient memory is available
to hold the data and the index cache so the system does not page to disk.

 Define the master rows. Always use the smallest set of data as the master set as these
form the cache. The smaller the cache the more efficiently the calls to the cache will be
handled. Also less of the memory will be used for the cache.
 Use normal joins whenever possible as this will reduce the number of rows
 Only use a joiner transformation where there is no possibility of joining the data natively. It
is often used for joining two heterogeneous sources.

2.7. Sequence generator


The Sequence Generator transformation generates numeric values. It is used to create
unique primary key values, replace missing primary keys, or cycle through a sequential
range of numbers. This is a connected transformation. It contains two output ports that
can be connected to one or more transformations. It can be used as a reusable
transformation. In case there are multiple mappings to load the same target table, which
requires sequence generator transformation for primary key generation, it is
recommended to use the reusable sequence transformation.

Cache values: The sequence generator retrieves the next value for the sequence from
the repository. Caching a number of values cuts down on the number of calls made to
the repository database. A cached values set to 1000 minimizes these calls and
considerable speeds up performance. However if expected volumes are less than 1000
rows it is advised to make the cache approximately the same as the expected volume.
Caching values also facilitates running parallel sessions of the same mapping, avoiding
duplicate sequence values – very important if the sequence is used as a unique key.
Note: During code deployment from QA to production repository, Informatica prompts for
how to handle cached values – whether to overwrite the value which exists in the target
repository with the value in the source repository or keep the existing value. Selecting the
wrong option means that the last value in development could overwrite a usually higher
value in production. In such a scenario when mapping is executed it will generate values
that already exist in the target table and if constraints are activated the records will all fail
which is a huge overhead. If constraints are not activated then the records will load with
duplicate keys which will involve a great deal of manual intervention to correct.

Page 11 of 20
Oracle Sequence via a Stored Procedure has one added advantage, It retrieves the
nextval value for the sequence from the database. Whilst this is not as efficient as using a
cached Informatica sequence Oracle sequence is the only way to ensure prevention of
duplicates. In informatica, during deployment if retain existing sequence option is
checked, prevents sequence value to be overwritten by the value stored in the
development environment
The following are the options for connecting sequence generated transformation to the
target tables

Option1
In case Sequence generator transformation is connected to two target tables directly it
generates two different sequence numbers for both the target table. For e.g. in the below
figure for a given row processing CUSTOMER.CUSTOMER_ID =1 AND
CUSTOMER_LOCATION.CUSTOMER_ID =2.

Option 2
In case there is an expression transformation in between sequence generator
transformation and the two target tables it generates same sequence number for both the
target table. For e.g. in the below figure for a given row processing
CUSTOMER.CUSTOMER_ID =1 AND CUSTOMER_LOCATION.CUSTOMER_ID =1

Page 12 of 20
2.8. Update Strategy
Update Strategy Transformation, controls how rows are flagged for insert, update, delete, or
reject within a mapping. This transformation is essential if there is a need to flag rows destined for
the same target for different database operations, or there is a need to reject rows.

Update Strategy transformation can be set at two levels

1. Within a session. When session is configured, it can be defined to either treat all rows in
the same way (for example, treat all rows as inserts), or use instructions coded into the
mapping to flag rows for different database operations. The following settings are present
in the session level

 Insert: Treat all rows as insert. If inserting the row violates a primary or foreign key
constraint in the database, the Integration Service rejects the row.
 Update: Treat all rows as update. For each row, if the Integration Service finds a
corresponding row in the target table (based on the primary key value), the
Integration Service updates it. Note that the primary key constraint must exist in the
target definition in the repository.
 Delete: Treat all rows as delete. For each row, the Integration Service looks for a
matching primary key value in the target table. If it exists, the Integration Service
deletes the row. The primary key constraint must exist in the target definition.
 Data driven: Integration Service follows instructions coded into Update Strategy
within the mapping to determine how to flag rows for insert, delete, update, or reject.
If the mapping for the session contains an Update Strategy transformation, this field
is marked Data Driven by default in the session.

2. Within a mapping. Within a mapping, the Update Strategy transformation is used to flag
rows for insert, delete, update, or reject.

Forwarding rejected rows


Update Strategy transformation can be configured to either pass rejected rows to the next
transformation or drop them. By default, the Integration Service forwards rejected rows to the next
transformation. The Integration Service flags the rows for reject and writes them to the session
reject file. If Forward Rejected Rows option is unchecked, the Integration Service drops rejected
rows and writes them to the session log file. Unselect this option to filter out unnecessary records
if those are not needed for the later transformations.

Note: Position the Aggregator after the Update Strategy transformation. In case aggregator
transformation is placed upstream of update strategy, aggregator transformation process the rows
on the basis of flag (update, insert, delete, reject). For example, if a row is flagged for delete and
then later use the row to calculate the sum in the aggregator, the Integration Service subtracts the
value appearing in this row. If the row had been flagged for insert, the Integration Service would
add its value to the sum.

Page 13 of 20
2.9. Union Transformation

Union transformation merges data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements. Similar to the UNION ALL
statement, the Union transformation does not remove duplicate rows.

Following measure can be taken for efficient usage of union transformation.


Remove Duplicate rows: Add a Router or Filter transformation to remove duplicate rows
immediately after the union transformation to filter out unwanted records.

2.10. SQL Transformation

SQL transformation uses external SQL queries or queries that are defined in the transformation.
When an SQL transformation is configured to run in script mode, the Integration Service
processes an external SQL script for each input row. When the transformation runs in query
mode, the Integration Service processes an SQL query that is defined in the transformation.

Following measures can be taken for efficient usage of SQL transformation.

 Use static query


Each time the Integration Service processes a new query in a session, it calls a function
called SQLPrepare to create an SQL procedure and pass it to the database. When the query
changes for each input row, it has a performance impact. When the transformation runs in
query mode, construct a static query in the transformation to improve performance. A static
query statement does not change, although the data in the query clause changes. To create a
static query, use parameter binding instead of string substitution in the SQL Editor. When you
use parameter binding, set the parameters in the query clause to values in the transformation
input ports.

 No Transaction statement in the query


When an SQL query contains commit and rollback query statements, the Integration Service
must recreate the SQL procedure after each commit or rollback. To optimize performance, do
not use transaction statements in an SQL transformation query.

 Select Static Database Connection over dynamic database connection


In SQL transformation, it is defined how the transformation connects to the database. It can
be a static connection or provide the connection information to the transformation at run time.
When the transformation uses a static connection, connection from the Workflow Manager
connections is selected. The SQL transformation connects to the database once during the
session. In case of dynamic connection information, the SQL transformation connects to the
database each time the transformation processes an input row.

Page 14 of 20
2.11. Pushdown Optimization

Transformation logic can be pushed to the source or target database using pushdown
optimization. In pushdown optimization, the Integration Service translates the transformation logic
into SQL queries and sends the SQL queries to the database. The source or target database
executes the SQL queries to process the transformations. The amount of transformation logic that
can be pushed to the database depends on the database, transformation logic, mapping and
session configuration. During run time session configured for pushdown optimization, the
Integration Service analyzes the mapping and writes one or more SQL statements based on the
mapping transformation logic. The Integration Service analyzes the transformation logic,
mapping, and session configuration to determine the transformation logic it can push to the
database. The Integration Service executes any SQL statement generated against the source or
target tables, and it processes any transformation logic that it cannot push to the database. Use
the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the
Integration Service can push to the source or target database. Pushdown Optimization Viewer
provides the option to view the messages related to pushdown optimization.

Types of Pushdown optimization

 Source-side pushdown optimization: The Integration Service analyzes the mapping from
the source to the target or until it reaches a downstream transformation it cannot push to the
database. The Integration Service generates and executes a SELECT statement based on
the transformation logic for each transformation it can push to the database. Then, it reads
the results of this SQL statement and continues to run the session.

 Target-side pushdown optimization: The Integration Service analyzes the mapping from
the target to the source or until it reaches an upstream transformation it cannot push to the
database. It generates an INSERT, DELETE, or UPDATE statement based on the
transformation logic for each transformation it can push to the database. The Integration
Service processes the transformation logic up to the point that it can push the transformation
logic to the target database. After that it executes the generated SQL.

 Full pushdown optimization: The Integration Service pushes as much transformation logic
as possible to both source and target databases. If a session is configured for full pushdown
optimization and the Integration Service cannot push all the transformation logic to the
database, it performs source-side or target-side pushdown optimization instead. To use full
pushdown optimization, the source and target must be on the same database.

Note: For large volume of data in a full pushdown optimization, the database server must run
a long transaction. Following database performance issues may be encountered:
 High Usage of database resources.
 Database tables may be locked for a long periods of time which increases the chance of
deadlock

Page 15 of 20
2.12. Performance Audit Checklist
The following design best practices can be audited from performance point of view:-
a. Dropping Indexes and Key Constraints from the target table before loading
b. Use of bulk load when the session inserts large amount of data
c. Configuring Single Pass reading if there are several session reading from same source
d. If the mapping contains lookup transformations then lookup caching should be enabled
e. The order of the lookup condition :- Should contain equal to sign condition first in case of
more than one lookup condition
f. Using sorted input for aggregator
g. Use of Incremental aggregation
h. Using operators instead of functions
i. Enabling sessions for Parallel pipeline partitioning
j. Using correct buffer block size
k. Using correct cache size for lookup, aggregator and joiner caches

Page 16 of 20
3. Shortcuts, Reusable Objects & Mapplets

Any object within a folder can be shared. These objects are sources, targets, mappings,
transformations, mapplets, task, worklet and session. To share objects in a folder, the folder must
be designated as shared. Once the folder is shared, shortcuts can be created on objects with in
the shared folder. In case there is an object which is being referenced again in multiple mappings
or across multiple folders, it is recommended to have the object placed in a shared folder. Re-
usable objects defined in non-shared folder can not be referenced in other folders in contrast to
re-usable objects defined in shared folder can be referenced in other folders
Advantage:
 Ease maintenance. In case there is a need to change the instances of all the shortcut objects,
with ease original repository object in shared folder can be modified. All shortcuts accessing
the object inherit the changes. In contrast, if there are multiple copies of an object, all the
instance of objects have to be edited, or recopy the object, to obtain the same results.
 Common repository objects are maintained in a single location.
 Accelerates subsequent project development cycle by accessing the shared objects
 Save space in the repository by keeping a single repository object and using shortcuts to that
object, instead of creating copies of the object in multiple folders or multiple repositories.
 Project Cost Reduction by leveraging the re-usable assets

Disadvantage:
 Changes to an object referenced by shortcuts can invalidate the mappings or mapplets using
the shortcut and any sessions using these objects. To avoid invalidating repository objects,
create shortcuts objects in their finalized version. In such a scenario the mapping and session
has to be validated again
 Incorrect description of referenced object. When a designer is launched and in the workspace
a reusable object that uses a shortcut, is dragged, then the current version of the object that
the shortcut references is available. However, if another user then edits and saves changes
to the referenced object, the shortcut displayed in the workspace is no longer an accurate
description of the referenced object. It is recommended always to refresh a shortcut in the
workspace by clicking Edit > Revert to Saved.
 Co-ordination among development team and the owner of shared folder, for any changes
related to shared objects
 The order that the folders are promoted to UAT and production environments becomes more
important as in this approach the shortcuts need to be promoted before the mappings.

3.1. Best Practices


 In the shortcut design, it is recommended to ensure the following as part of effective
development
 Keep shared objects in centralized folders. This keeps maintenance simple. This also
simplifies the process of copying folders into a production repository.
 Changes to an object referenced by shortcuts can invalidate the mappings or mapplets using
the shortcut and any sessions using these objects. To avoid invalidating repository objects,
create shortcuts objects in their finalized version only
 After editing a referenced object, make sure affected mappings are still valid
 Refresh views of shortcuts when working in a multiuser environment. To refresh a shortcut in
the workspace, click Edit > Revert to Saved OR use Repository > Close All Tools in the
destination folder then reopen the workspace.

Page 17 of 20
3.2. Mapplets

Within an environment there can be several transformations which can be grouped together to
form a specific business function that is always the same. Rather than just reuse the
transformations that make up that function, it is recommended to group the transformation into a
reusable mapplet.
Pros
 Changes done to any transformation in a mapplet will be reflected in all the mappings
 Functionality of mapplet becomes available to all the mappings which uses particular mapplet

Cons
 Use of some particular transformation, from mapplet is not possible. All the Transformations
have to be used in a group that is entire mapplet will have to be used.
 Prior any changes to reusable mapplet, all dependent mapping need to be analyzed for the
impact

Page 18 of 20
4. Working with Links

Links are used to connect each workflow task. Informatica provides the feature to specify
conditions with links to create branches in the workflow. It does not allow to use links to create
loops in the workflow. Each link in the workflow can run only once.

4.1. Link Conditions


Once two tasks are linked in workflow manager, link condition can be specified to determine the
order of execution in the workflow. In case link condition is not specified, the Integration Service
runs the next task in the workflow by default. Predefined or user-defined workflow variables can
be used in the link condition. If the link condition evaluates to True, the Integration Service runs
the next task in the workflow. If the link condition evaluates to False, the Integration Service does
not run the next task in the workflow.

4.1.1. Example of Link Conditions


For example, you have two Session tasks in the workflow, s_m_claims_extract and
s_m_claims_load. Requirement is that Integration Service runs the second Session task only if
the first Session task has no rows failed at target. To accomplish this, link condition in between
two sessions can be set so that s_m_claims_load runs only if the number of failed target rows for
s_m_claims_extract =0. In this scenario on the link following condition is set

$s_m_claims_extract.TgtFailedRows = 0

After the link condition is specified in the Expression Editor, the Workflow Manager validates the
link condition and displays it next to the link in the workflow.

Page 19 of 20
5. References

1. Informatica Powercenter, version 8.6 , Installation Guide


2. Informatica Powercenter, version 8.6 , Workflow Administrator
Guide
3. Informatica Powercenter, version 8.6, Performance tuning Guide.

Page 20 of 20

Das könnte Ihnen auch gefallen