Sie sind auf Seite 1von 7

General Informatica Best Practices

Performance and Tuning Overview Identifying ETL Bottlenecks Target Bottlenecks Source Bottlenecks Mapping Bottlenecks Session Bottlenecks System Bottlenecks Partitioning How to make it fly

Identifying Bottlenecks

Target Bottleneck

Common sources of problems: indexes or key constraints database checkpoints small database network packets size too many target instances in your mapping target table is too wide Common solutions: drop indexes and key constraints before loading, rebuild after loading use bulk loading or external loaders when practical increase database network packets size decrease the frequency of database checkpoints optimize target database disks allocation when using partitions, consider partitioning your target table as well

Source Bottleneck

Common sources of problems: slow query small database network packets size wide source tables Common solutions: analyze the query issued by the Source Qualifier. It appears in the session log. consider using database optimizer hints when joining several tables in a Source Qualifier consider indexing tables when you have order by or group by clauses try database parallel queries if supported try partitioning the session if appropriate, try partitioning your source database as well test Source Qualifier conditional filter versus filtering at the database level increase database network packets size

Mapping Bottleneck

Common sources of problems: too many transforms unused links between ports too many input/output or outputs ports in aggregator or ranking transformations unnecessary data type conversions

Common solutions: eliminate transformation errors if several mappings read from the same source, try single pass reading optimize data types, use integers for comparisons. dont convert back and forth between data types optimize lookups and lookup tables, using cache and indexing tables put your filters early in the data flow, use a simple filter condition for aggregators, use sorted input, integer columns to group by and simplify expressions use reusable sequence generators, increase number of cached values if you use the same logic in different data streams, apply it before the streams branch off optimize expressions: isolate slow and complex expressions reduce or simplify aggregate functions

Session Bottleneck

Common sources of problems: inappropriate memory allocation settings running in series rather than in parallel error tracing override set to high level

Common solutions: calculate DTM buffer pool and buffer block size make sure to keep data caches and indexes in memory, paging to disk is very slow if your mapping allows it, use partitioning run sessions in parallel, within concurrent batches, whenever possible increase database commit interval turn off recovery and decimal arithmetic (theyre off by default) use debugger rather than high error tracing, always reduce your tracing level for production runs

System Bottleneck

Common sources of problems: slow network connections overloaded or under-powered servers slow disk performance Common solutions: get the best machines to run your server. Better yet, use several servers against the same repository (power center only) use multiple CPUs and session partitioning make sure Informatica servers and database servers are closely located in your network

if you have several CPUs, several disk drives and gobs of RAM, consider having Informatica server and database server on the same machine shutdown unneeded processes or network services on your servers use 7 bit ASCII data movement (the default) if you dont need Unicode evaluate hard disk performance, try locating sources and targets on different drives get as much RAM as you can for your servers

Partitioning
A partition is a pipeline stage that executes in a single thread Partition points mark the thread boundaries in a pipeline and divides the pipeline process into stages The partition strategy can be different at each partition point in the pipeline process Adding partitions increase the number of threads created by Informatica PowerCenter allows for up to 16 partitions at each partition point By increasing partition points, threads increase, allowing performance increase HOWEVER load on server is also increased, so if server is undersized partitioning is of no value, can actually decrease performance

Partitioning continued
Partition Types Round Robin Key Range Hash Key Pass Through Performance can be increased by changing partitioning strategy at different partition points Source Qualifier Key Range or Hash Auto Expression or Filter Round Robin Sorter and AggregatorHash Auto Keys Target Key Range

Das könnte Ihnen auch gefallen