Beruflich Dokumente
Kultur Dokumente
Description
Analyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links between transformations. For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Doing so can reduce the amount of data the transformations store in the data cache. Too many Lookups and Aggregators encumber performance because each requires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option. Limit the number of Aggregators in a mapping. A high number of Aggregators can increase I/O activity on the cache directory. Unless the seek/access time is fast on the directory itself, having too many Aggregators can cause a bottleneck. Similarly, too many Lookups in a mapping causes contention of disk and memory, which can lead to thrashing, leaving insufficient memory to run a mapping efficiently.
Number of Disk Reads Cached Lookup LKP_Manufacturer Build Cache Read Source Records Execute Lookup Total # of Disk Reads LKP_DIM_ITEMS Build Cache Read Source Records Execute Lookup Total # of Disk Reads 200 5000 0 5200 100000 5000 0 105000 Un-cached Lookup 0 5000 5000 10000 0 5000 5000 10000
Consider the case where MANUFACTURER is the lookup table. If the lookup table is cached, it will take a total of 5200 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it will take a total of 10,000 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison with the number of times the lookup is executed. So this lookup should be cached. This is the more likely scenario. Consider the case where DIM_ITEMS is the lookup table. If the lookup table is cached, it will result in 105,000 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads would total 10,000. In this case the number of records in the lookup table is not small in comparison with the number of times the lookup will be executed. Thus the lookup should not be cached. Use the following eight step method to determine if a lookup should be cached: 1. 2. 3. 4. 5. Code the lookup into the mapping. Select a standard set of data from the source. For example, add a where clause on a relational source to load a sample 10,000 rows. Run the mapping with caching turned off and save the log. Run the mapping with caching turned on and save the log to a different name than the log created in step 3. Look in the cached lookup log and determine how long it takes to cache the lookup object. Note this time in seconds: LOOKUP TIME IN SECONDS = LS.
8.
Within a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the PowerCenter Server will re-use the cache for the multiple instances of the lookup. Using the same lookup multiple times in the mapping will be more resource intensive with each successive instance. If multiple cached lookups are from the same table but are expected to return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
Across sessions of the same mapping, the use of an unnamed persistent cache allows multiple runs to use an existing cache file stored on the PowerCenter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the PowerCenter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used when the lookup table is not expected to change between session runs.
Across different mappings and sessions, the use of a named persistent cache allows sharing of an existing cache file.
Reducing the Number of Cached Rows There is an option to use a SQL override in the creation of a lookup cache. Options can be added to the WHERE clause to reduce the set of records included in the resulting cache.
Use Pre- and Post-Session SQL Commands You can specify pre- and post-session SQL commands in the Properties tab of the Source Qualifier transformation and in the Properties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them when the session completes.
You can use any command that is valid for the database type. However, the PowerCenter Server does not allow nested comments, even though the database might.
You can use mapping parameters and variables in SQL executed against the source, but not against the target. Use a semi-colon (;) to separate multiple statements. The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/. If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\). The Workflow Manager does not validate the SQL.
Use Environmental SQL For relational databases, you can execute SQL commands in the database environment when connecting to the database. You can use this for source, target, lookup, and stored procedure connection. For instance, you can set isolation levels on the source and target systems to avoid deadlocks. Follow the guidelines mentioned above for using the SQL statements.
10