Sie sind auf Seite 1von 8

Informatica PowerCenter

Performance Tuning Tips

Pittsburgh Informatica User Group


March 22nd 2016

Gregory Reynolds reynolds.gregory@gmail.com


Topics

General Tips

Source Qualifiers

Lookups

Sorters / Aggregators / Joiners

Targets

Other Options
General Tips
Remove Dead Weight
Filter unneeded records in the source qualifier
Turn off or remove unused ports
Filter and aggregate as soon as possible
Trim strings immediately

Set Proper Attributes on All Ports


Improper data types require behind the scenes conversions or cause loss of data
Incorrect data sizes require conversions and may truncate data unintentionally

Watch Your Logs


Dont ignore warnings and non-critical errors. It requires a significant amount
of resources to write to a log file
Choose terse level logging in production to save space and processing
Source Qualifier
Balance Work Effort
Consider the time it takes the source to do work instead
PowerCenter is faster at many things / DBMS is faster at others

Local Files
When reading from files it is almost always faster to ship the file to the local server
before reading it then it is to try and read it with a remote connection
Large data files likely would be better off
staged before trying to do any
complex transformations

Use Built-In Options


A sort is usually faster here
than anywhere in the mapping
Same thing for distinct
Choose Deterministic Output if possible
Lookup
Unconnected Lookups
Using an unconnected lookup multiple times saves all of the cost associated with building the cache
except the first time.

Cache Type
Static cache is the default for a reason. Best for a large amount of lookup records
Disabling the cache sends every lookup to the database as an individual request. Works well for a very
small amount of lookup requests versus a large table.
Persistent cache allows you to reuse a cache across multiple sessions. Warning: it may require a lot of
drive space
Dynamic cache keeps records in the lookup up to date automatically as new data is sent to the target

Size Matters
Too small a cache allotment will force large lookups to go to disk
Too large a cache will waste available memory that other transformations could use

Filter The Data


Source data can be filtered
without using an override
More records filtered mean less time to
retrieve and smaller more efficient caches
Sorter, Aggregator, and Joiner
Maximum Cost
No data will pass through until every record has been read and sorted
Large cache usage
Session will fail if cache grows larger than capacity

Waste Not Want Not


Filter out as many unneeded records as is possible before hand

Sorted Input
Whenever possible use sorted input to save the majority of processing and cache cost from
joiners as well as aggregators. Data will flow in incremental chunks if the input is sorted
versus waiting until the end, releasing memory and cache requirements as it does so.

Simplified Group Bys


Grouping by a simple port may save an aggregator having to complete all the grouping
before making any final decisions.

Smaller Master Set


Designate the master data in a sorter as the side that is expected to have fewer rows.
Two large data flows will not join as quickly
Targets
Rejected Rows
Rejecting rows from the target database will slow down normal operations by
potentially several seconds per record! Look out for unexpected NULL values, bad
dates, and invalid keys.

Update Else Insert


With this option selected, every insert is approximately twice as slow as an update
Other options include using a lookup to determine the appropriate action before hand

Commit Not
Setting too low a threshold for database commits will require transactions to wait more
often while the commit is completing. Setting it too high will cause the database to use
an increasing amount of temp space and may cause it to slow down transaction speeds

Bulk Loading
Bulk loading is a fast cheat to be able to push large amount of data into some types of
databases at high speed but there cant any indices on the target table
Other Tuning Options
Workflow Parallelization

Session Partitioning

Push-down Optimization

Data Partitions

Buffer Size

Indices and Primary Keys

System Competition

Redesign slow mappings to find a better way

Das könnte Ihnen auch gefallen