Sie sind auf Seite 1von 8

Informatica PowerCenter Development Best Practices

TABLE OF CONTENTS

Abstract................................................................................................................................3 Content overview.................................................................................................................3 1. Lookup - Performance considerations.............................................................................3 1.1. Unwanted columns....................................................................................................3 1.2. Si e of t!e source versus si e of lookup...................................................................3 1.3. "#$% instead of Lookup............................................................................................& 1.&. Conditional call of lookup........................................................................................& 1.'. S(L )uer*.................................................................................................................& 1.+. $ncrease cac!e...........................................................................................................& 1.,. Cac!efile file-s*stem................................................................................................& 1.-. Useful cac!e utilities.................................................................................................& 2. .orkflow performance / basic considerations................................................................' 2.1. S(L tunin0....................................................................................................................+ 3. Pre1Post-Session command - Uses..................................................................................., &. Se)uence 0enerator / desi0n considerations....................................................................'. 23P Connection ob4ect / platform independence............................................................-

Abstract
This article explains a few of the important development best practices, like lookups, workflow performance etc.

Content overview
Lookup - Performance considerations Workflow performance basic considerations Pre/Post- ession commands - !ses e"uence #enerator desi#n considerations $TP %onnection ob&ect platform independence

1. Lookup - Performance considerations


What is a lookup transformation' (t is &ust not another transformation that fetches )ou data to look up a#ainst source data. * Lookup is an important and useful transformation when used effectivel). (f used improperl), performance of )our mappin# will be severel) impaired. Let us see the different scenarios where )ou can face problems with Lookup and also how to tackle them.

1.1. Unwanted columns


+) default, when )ou create a lookup on a table, Power%enter #ives )ou all the columns in the table. (f not all the columns are re"uired for the lookup condition or return, delete the unwanted columns from the transformations. +) not removin# the unwanted columns, the cache si,e will increase.

1.2. Size of the source versus size of lookup


Let us sa), )ou have -. rows in the source and one of the columns has to be checked a#ainst a bi# table /- million rows0. Then Power%enter builds the cache for the lookup table and then checks the -. source rows a#ainst the cache. (t takes more time to build the cache of - million rows than #oin# to the database -. times and lookup a#ainst the table directl). !se uncached lookup instead of buildin# the static cache, as the number of source rows is "uite less than that of the lookup.

1.3. J !" instead of Lookup


(n the same context as above, if the Lookup transformation is after the source "ualifier and there is no active transformation in-between, )ou can as well #o for the 1L over ride of source "ualifier and &oin traditionall) to the lookup table usin# database &oins, if both the tables are in the same database and schema.

1.#. $onditional call of lookup


(nstead of #oin# for connected lookups with filters for a conditional lookup call, #o for unconnected lookup. (s the sin#le column return botherin# for this' 2o ahead and chan#e the 1L override to concatenate the re"uired columns into one bi# column. +reak them at the callin# side into individual columns a#ain.

1.%. S&L 'uer(


$ind the execution plan of the Lookup 1L and see if )ou can add some indexes or hints to the "uer) to make it fetch data faster. 3ou ma) have to take the help of a database developer to accomplish this if )ou, )ourself are not a 1Ler.

1.). !ncrease cache


(f none of the above options provide performance enhancements, then the problem ma) lie with the cache. The cache that )ou assi#ned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn4t fit into the cache is spilt into the cache files desi#nated in 5P6%ache7ir. When the Power%enter doesn4t find the data )ou are lookin# for in the cache, it swaps the data from the file to the cache and keeps doin# this until the data is found. This is "uite expensive bein# that this t)pe of operation is ver) (/8 intense. To stop this issue from occurrin#, increase the si,e of the cache so the entire data set resides in memor). When increasin# the cache )ou also have to be aware of the s)stem constraints. (f )our cache si,e is #reater than the resources available, the session will fail due to the lack of resources.

1.*. $achefile file-s(stem


(n man) cases, if )ou have cache director) in a different file-s)stem than that of the hostin# server, the cache file pilin# up ma) take time and result in latenc). o with the help of )our s)stem administrators tr) to look into this aspect as well.

1.+. Useful cache utilities

(f the same lookup 1L is bein# used b) another lookup, then shared cache or a reusable lookup should be used. *lso, if )ou have a table where the data is not chan#ed often, )ou can use the persist cache option to build the cache once and use it man) times b) consecutive flows.

2. ,orkflow performance - .asic considerations


Thou#h performance tunin# has been the most feared part of development, it is the easiest, if the intricacies are known. With the newer and newer versions of Power%enter, there is added flexibilit) for the developer to build better performin# workflows. The ma&or blocks for performance are the desi#n of the mappin#, 1L tunin# if databases are involved. 9e#ardin# the desi#n of the mappin#, ( have few basic considerations to be made. Please note that these are not an) rules-of-thumb, but will make )ou act sensibl) in different scenarios.

1. ( would alwa)s su##est )ou to think twice before usin# an !pdate trate#), thou#h it adds a certain level of flexibilit) in the mappin#. (f )ou have a strai#ht-throu#h mappin# which takes data from source and directl) inserts all the records into the tar#et, )ou wouldn:t need an update strate#). 2. !se a pre- 1L delete statement if )ou wish to delete specific rows from tar#et before loadin# into the tar#et. !se truncate option in the session properties, if )ou wish to clean the table before loadin#. ( would avoid a separate pipe-line in the mappin# that runs before the load with update-strate#) transformation. 3. 3ou have ; sources and ; tar#ets with one-on-one mappin#. (f the load is independent accordin# to business re"uirement, ( would create ; different mappin#s and ; different session instances and the) all run in parallel in m) workflow after m) < tart= task. (:ve observed that the workflow runtime comes down between ;.->.? of serial processin#. &. Power%enter is built to work of hi#h volumes of data. o let the server be completel) bus). (nduce parallelism as far as possible into the mappin#/workflow. '. (f usin# a transformation like a @oiner or *##re#ator transformation, sort the data on the &oin ke)s or #roup b) columns prior to these transformations to decrease the processin# time. +. $ilterin# should be done at the database level instead within the mappin#. The database en#ine is much more efficient in filterin# than Power%enter.
The above examples are &ust some thin#s to consider when tunin# a mappin#.

2.1. S&L tunin/


1L "ueries/actions occur in Power%enter in one of the below wa)s.

9elational ource 1ualifier Lookup 1L 8verride tored Procedures 9elational Tar#et

!sin# the execution plan to tune a "uer) is the best wa) to #ain an understandin# of how the database will process the data. ome thin#s to keep in mind when readin# the execution plan includeA BFull Table Scans are not evilB, BIn e!es are not alwa"s fastB, and <In e!es can be slow tooB. *nal)se the table data to see if pickin# up C. records out of C. million is best usin# index or usin# table scan. $etchin# -. records out of -D usin# index is faster or usin# full table scan is easier. 6an) times the relational tar#et indexes create performance problems when loadin# records into the relational tar#et. (f the indexes are needed for other purposes, it is su##ested to drop the indexes at the time of loadin# and then rebuild them in post1L. When droppin# indexes on a tar#et )ou should consider inte#rit) constraints and the time it takes to rebuild the index on post load vs. actual load time.

3. Pre0Post-Session command - Uses

(t is a ver) #ood practice to email the success or failure status of a task, once it is done. (n the same wa), when a business re"uirement drives, make use of the Post ession uccess and $ailure email for proper communication. The built-in feature offers more flexibilit) with ession Lo#s as attachments and also provides other run-time data like Workflow run instance (7, etc. *n) archivin# activities around the source and tar#et flat files can be easil) mana#ed within the session usin# the session properties for flat file command support that is new in Power%enter vE.>. $or example, after writin# the flat file tar#et, )ou can setup a command to ,ip the file to save space. (f )ou have an) editin# of data in the tar#et flat files which )our mappin# couldn:t accommodate, write a shell/batch command or script and call it in the Post- ession command task. ( prefer takin# trade-offs between Power%enter capabilities and the 8 capabilities in these scenarios.

#. Se'uence /enerator - desi/n considerations


(n most of the cases, ( would advice )ou to avoid the use of se"uence #enerator transformation, while populatin# an (7 column in the relational tar#et table. ( su##est )ou rather create a se"uence on the tar#et database and enable the tri##er on that table to fetch the value from the database se"uence. There are man) advanta#es to usin# a database se"uence #eneratorA

$ewer Power%enter ob&ects will be present in a mappin# which reduces development time and also maintenance effort. (7 #eneration is Power%enter independent if a different application is used in future to populate the tar#et. 6i#ration between environments is simplified because there is no additional overhead of considerin# the persistent values of the se"uence #enerator from the repositor) database.

(n all of the above cases, a se"uence created in the tar#et database would make life lot easier for the table data maintenance and also for the Power%enter development. (n fact, databases will have specific mechanisms /focused0 to deal with se"uences and so )ou can implement manual Push-down optimi,ation on )our Power%enter mappin# desi#n for )ourself. 7+*s will alwa)s complain about tri##ers on the databases, but ( would still insist on usin# se"uence-tri##er combination for hu#e volumes of data as well.

%. 12P $onnection o.3ect - platform independence


(f )ou have an) files to be read as source from Windows server when )our Power%enter server is hosted on !F(G/L(F!G, then make use of $TP users on the Windows server and use $ile 9eader with $TP %onnection ob&ect. This connection ob&ect can be added as an) other connection strin#. This #ives the flexibilit) of platform independence. This will further reduce the overhead of havin# *6+* mounts on to the (nformatica boxes.

Das könnte Ihnen auch gefallen