Sie sind auf Seite 1von 239

Informatica

PowerMart / PowerCenter 8.6

Introduction
PowerMart and PowerCenter provide an environment that allows you to load data into a
centralized location, such as a datamart, data warehouse, or operational data store
(ODS).
You can extract data from multiple sources, transform the data according to business
logic you build in the client application, and load the transformed data into file and
relational targets.
Informatica provides the following integrated components:

Informatica repository. The Informatica repository is at the center of the


Informatica suite. You create a set of metadata tables within the repository
database that the Informatica applications and tools access. The Informatica Client
and Server access the repository to save and retrieve metadata.

Informatica Client. Use the Informatica Client to manage users, define sources
and targets, build mappings and mapplets with the transformation logic, and create
sessions to run the mapping logic. The Informatica Client has three client
applications: Repository Manager, Designer, and Server Manager.

Informatica Server. The Informatica Server extracts the source data, performs the
data transformation, and loads the transformed data into the targets.

PowerMart/PowerCenter Architecture

Sources
PowerMart and PowerCenter access the following sources:

Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL


Server, and Teradata.

File. Fixed and delimited flat file, COBOL file, and XML.

Extended. If you use PowerCenter, you can purchase additional


PowerConnect products to access business sources such as
PeopleSoft, SAP R/3, Siebel, and IBM MQSeries.

Mainframe. If you use PowerCenter, you can purchase


PowerConnect for IBM DB2 for faster access to IBM DB2 on MVS.

Other. Microsoft Excel and Access.


4

Targets
PowerMart and PowerCenter can load data into the following targets:

Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL
Server, and Teradata.

File. Fixed and delimited flat files and XML.

Extended. If you use PowerCenter, you can purchase an integration server


to load data into SAP BW. You can also purchase PowerConnect for IBM
MQSeries to load data into IBM MQSeries message queues.

Other. Microsoft Access.

You can load data into targets using ODBC or native drivers, FTP, or
external loaders.
5

Repository
The Informatica repository is a set of tables that stores the metadata you create using the
Informatica Client tools. You create a database for the repository, and then use the
Repository Manager to create the metadata tables in the database.
You add metadata to the repository tables when you perform tasks in the Informatica Client
application such as creating users, analyzing sources, developing mappings or mapplets,
or creating sessions. The Informatica Server reads metadata created in the Client
application when you run a session. The Informatica Server also creates metadata such as
start and finish times of a session or session status.
When you use PowerCenter, you can develop global and local repository to share
metadata:

Global repository. The global repository is the hub of the domain. Use the global
repository to store common objects that multiple developers can use through
shortcuts. These objects may include operational or application source definitions,
reusable transformations, mapplets, and mappings.

Local repositories. A local repository is within a domain that is not the global
repository. Use local repositories for development. From a local repository, you can
create shortcuts to objects in shared folders in the global repository. These objects
typically include source definitions, common dimensions and lookups, and enterprise
standard transformations. You can also create copies of objects in non-shared
folders.
6

Informatica Client
The Informatica Client is comprised of three applications that you use to manage the
repository, design mappings, mapplets, and create sessions to load the data.
Repository Manager. Use the Repository Manager to create and administer the metadata
repository. You can create repository users and groups, assign privileges and permissions,
manage folders and locks, and print Crystal Reports containing repository data.
Designer. Use the Designer to create mappings that contain transformation instructions for
the Informatica Server. Before you can create mappings, you must add source and target
definitions to the repository. The Designer has five tools that you use to analyze sources,
design target schemas, and build source-to-target mappings:

Source Analyzer. Import or create source definitions.


Warehouse Designer. Import or create target definitions.
Transformation Developer. Develop reusable transformations to use in mappings.
Mapplet Designer. Create sets of transformations to use in mappings.
Mapping Designer. Create mappings that the Informatica Server uses to extract,
transform, and load data.

Server Manager. Use the Server Manager to create, schedule, execute, and monitor
sessions. You create a session based on a mapping in the repository and schedule it to run
against an Informatica Server. You can view scheduled and running sessions for each
Informatica Server in the domain. You can also access details about those sessions.
7

Informatica Server
The Informatica Server reads mapping and session information from the
repository. It extracts data from the mapping sources and stores the data
in memory while it applies the transformation rules that you configure in
the mapping. The Informatica Server loads the transformed data into the
mapping targets.
You can install the Informatica Server on a Windows NT/2000 or UNIX
server machine.
You can communicate with the Informatica Server using pmcmd, a
command line program.

Connectivity
PowerMart and PowerCenter use the following types of connectivity:

Network Protocol
Native Drivers
ODBC

The Informatica Client uses ODBC and native drivers to connect to


source, target, and repository databases. The Server Manager and the
Informatica Server use TCP/IP or IPX/SPX to communicate with each
other. The Informatica Server uses native drivers to connect to the
databases to move data. You can optionally use ODBC to connect the
Informatica Server to databases. The Informatica Client uses ODBC and
native drivers to connect to source, target, and repository databases.

Connectivity Overview

10

Metadata Reporter
PowerMart and PowerCenter use the following types of connectivity:

Network Protocol
Native Drivers
ODBC

The Informatica Client uses ODBC and native drivers to connect to


source, target, and repository databases. The Server Manager and the
Informatica Server use TCP/IP or IPX/SPX to communicate with each
other. The Informatica Server uses native drivers to connect to the
databases to move data. You can optionally use ODBC to connect the
Informatica Server to databases. The Informatica Client uses ODBC and
native drivers to connect to source, target, and repository databases.

11

Using Repository Manager


Use the Repository Manager to administer your repositories. The Repository Manager
allows you to navigate through multiple folders and repositories, and perform the following
tasks:

Perform repository maintenance. You can create, copy, restore, upgrade, backup,
and delete repositories. With a global repository, you can register and unregister local
repositories. You can import and export repository connection information in the
registry and edit repository connection information.

Implement repository security. You can create, edit, and delete repository users
and user groups. You can assign and revoke repository privileges and folder
permissions.

Perform folder functions. You can create, edit, copy, and delete folders. All the work
you perform in the Designer is stored in folders. If you want to share metadata, you
can configure a folder to be shared.

View metadata. You can analyze sources, targets, mappings, and shortcut
dependencies, search by keyword, and view the properties of repository objects.

Customize the Repository Manager. You can add, edit, and remove repositories in
the Navigator, view or hide windows.

Run repository reports. You can run repository reports such as the Source to Target
Dependency report or the Session report. You can also add and remove customized
reports.
12

Repository Manager Windows


The Repository Manager can display the following windows:

Navigator. Displays all objects that you create in the Repository Manager,
the Designer, and the Server Manager. It is organized first by repository,
then by folder and folder version. Viewable objects include sources, targets,
dimensions, cubes, mappings, mapplets, transformations, sessions, and
batches. You can also view folder versions and business components.

Main. Provides properties of the object selected in the Navigator window.


The columns in this window change depending on the object selected in the
Navigator window.

Dependency. Shows dependencies on sources, targets, mappings, and


shortcuts for objects selected in either the Navigator or Main window.

Output. Provides the output of tasks executed within the Repository


Manager, such as creating a repository.

13

Repository Manager Windows

14

Repository Manager Navigator

15

Repository Objects
You create repository objects using the Repository Manager, Designer, and Server
Manager client tools. You can view the following objects in the Navigator window of the
Repository Manager:

Source definitions. Definitions of database objects (tables, views, synonyms) or files


that provide source data.

Target definitions. Definitions of database objects or files that contain the target
data.

Multi-dimensional metadata. Target definitions that are configured as cubes and


dimensions.

Mappings. A set of source and target definitions along with transformations


containing business logic that you build into the transformation. These are the
instructions that the Informatica Server uses to transform and move data.

Reusable transformations. Transformations that you can use in multiple mappings.

Mapplets. A set of transformations that you can use in multiple mappings.

Session and batches. Sessions and batches store information about how and when
the Informatica Server moves data. Each session corresponds to a single mapping.
You can group several sessions together in a batch
16

Design Process
The goal of the design process is to create mappings that depict the flow of
data between sources and targets, including changes made to the data before it
reaches the targets. However, before you can create a mapping, you must first
create or import source and target definitions. You might also want to create
reusable objects such as reusable transformations or mapplets.
Perform the following design tasks in the Designer:
1.

Import source definitions. Use the Source Analyzer to connect to the


sources and import the source definitions.

2.

Create or import target definitions. Use the Warehouse Designer to


define relational, flat file, or XML targets to receive data from sources. You
can import target definitions from a relational database, or you can
manually create a target definition.

3.

Create the target tables. If you add a target definition to the repository that
does not exist in a relational database, you need to create target tables in
your target database. You do this by generating and executing the
necessary SQL code within the Warehouse Designer.

17

Design Process
4. Design mappings. Once you have source and target definitions in the
repository, you can create mappings in the Mapping Designer. A mapping is a
set of source and target definitions linked by transformation objects that define
the rules for data transformation. A transformation is an object that performs a
specific function in a mapping, such as looking up data or performing
aggregation.
5. Create mapping objects. Optionally, you can create reusable objects for
use in multiple mappings. Use the Transformation Developer to create reusable
transformations. Use the Mapplet Designer to create mapplets. A mapplet is a
set of transformations that may contain sources and transformations.
6. Debug mappings. Use the Mapping Designer to debug a valid mapping to
gain troubleshooting information about data and error conditions.
7. Import and export repository objects. You can import and export
repository objects, such as sources, targets, transformations, mapplets, and
mappings to archive or share metadata.
18

Designer Windows
You can display the following windows in the Designer:

Navigator. Connect to repositories, and open folders within the Navigator.


You can also copy objects and create shortcuts within the Navigator.
Workspace. Open different tools in this window to create and edit
repository objects such as sources, targets, mapplets, transformations, and
mappings.
Output. View details about tasks you perform, such as saving your work or
validating a mapping.
Status bar. Displays the status of the operation you perform.
Overview. An optional window to simplify viewing a workspace that
contains a large mapping or multiple objects.
Instance data. View transformation data while you run the Debugger to
debug a mapping.
Target data. View target data while you run the Debugger to debug a
mapping.
19

Designer Windows

20

Debugger Window

21

Server Manager
Use the Server Manager to create, schedule, monitor, edit, copy, and abort
sessions. You can group multiple sessions to run as a single unit, known as a
batch. When you create a session, you select a valid mapping and configure
other settings such as connections, error handling, and scheduling. You may
also be able to override some transformation properties.
When you monitor sessions, the Server Manager displays status such as
scheduled, completed, and failed sessions. It also displays some errors
encountered while running the session. You can find a complete log of errors in
the session log and server log files. Before you create a session, you must
configure the following connection information:

Informatica Server connection. Register the Informatica Server with the


repository before you can start it or create a session to run against it.

Database connections. Create connections to source and target systems.

Other connections. If you want to use external loaders or FTP, you


configure access within the Server Manager.

22

Session Properties
You can set the following properties when you create a session:

Informatica Server. If you use PowerCenter, you can select an Informatica


Server to run a session.

Source and target location. Select a connection or specify a path for the
source and target data.

Scheduling information. Schedule the session to run on demand or on a


repeating schedule.

Error handling. Configure error handling parameters that determine how


the Informatica Server behaves when it encounters errors.

Post-session email. Send post-session email dependent on success and


failure of session.

Pre- and post-session scripts. Run shell commands before or after the
session.
23

Server Manager Windows


The Server Manager displays the following windows:

Navigator. View and select configured sessions.

Configure. Create and edit sessions.

Monitor. View information about running and completed sessions.

Output. View messages from the Informatica Server.

Status. Displays the status of the operation you perform.

24

Server Manager Windows

25

Creating a Repository
To create Repository
1.

Launch the Repository Manager by choosing Programs-PowerCenter (or


PowerMart) Client-Repository Manager from the Start Menu.

2.

In the Repository Manager, choose Repository-Create Repository.


Note: You must be running the Repository Manager in Administrator
mode to see the Create Repository option on the menu. Administrator
mode is the default when you install the program.
3.
In the Create Repository dialog box, specify the name of the new
repository, as well as the parameters needed to connect to the repository
database through ODBC.

26

Creating a Repository

27

Creating Repository User & Groups


You can create a repository user profile for everyone working in the repository,
each with a separate username and password. You can also create user
groups and assign each user to one or more groups. Then, grant repository
privileges to each group, so users in the group can perform tasks within the
repository (such as use the Designer or create sessions).
The repository user profile is not the same as the database user profile. While a
particular user might not have access to a database as a database user, that
same person can have privileges to a repository in the database as a repository
user.
Informatica tools include two basic types of security:

Privileges. Repository-wide security that controls which task or set of tasks


a single user or group of users can access.

Permissions. Security assigned to individual folders within the repository.


You can perform various tasks for each privilege.

28

Repository Privileges
Privilege

Description

Use Designer

Can edit metadata, import and export objects in the Designer, with
read and write permission at the folder-level

Browse Repository

Can browse repository content through the Repository Manager, add


and remove reports, import, export or remove the registry and change
user password

Create Sessions and Batches

Can create, import, export, modify, start, stop and delete sessions and
batches through the Server Manager with folder level read, write and
execute permissions. Can configure some connections used by the
Informatica Server

Session Operator

Can use the command line program (pmcmd) to start sessions and
batches, Can start, view, monitor and stop sessions or batches with
folder-level read permission and the create sessions and batches
privilege using the Server Manager

29

Repository Privileges
Privilege

Description

Administer Repository

Can create, upgrade, backup, delete and restore repositories, Can


create and modify folders, create and modify users and groups, and
assign privileges to users and groups

Administer Server

Can configure connections to the Informatica server through the


Server Manager and pmcmd.

Super User

Can perform all the tasks across all folders in the repository,
including unlocking locks and managing global object permissions

30

Folders
Folders provide a way to organize and store all metadata in the repository,
including mappings, schemas, and sessions. Folders are designed to be
flexible, to help you organize your data warehouse logically. Each folder has a
set of properties you can configure to define how users access the folder. For
example, you can create a folder that allows all repository users to see objects
within the folder, but not to edit them. Or you can create a folder that allows
users to share objects within the folder.
Shared Folders
When you create a folder, you can configure it as a shared folder. Shared
folders allow users to create shortcuts to objects in the folder. If you have
reusable transformation that you want to use in several mappings or across
multiple folders, you can place the object in a shared folder.
For example, you may have a reusable Expression transformation that
calculates sales commissions. You can then use the object in other folders by
creating a shortcut to the object.

31

Folder Permissions
Permissions allow repository users to perform tasks within a folder. With folder
permissions, you can control user access to the folder, and the tasks you
permit them to perform.
Folder permissions work closely with repository privileges. Privileges grant
access to specific tasks while permissions grant access to specific folders with
read, write, and execute qualifiers.
However, any user with the Super User privilege can perform all tasks across
all folders in the repository. Folders have the following types of permissions:

Read permission. Allows you to view the folder as well as objects in the
folder.

Write permission. Allows you to create or edit objects in the folder.

Execute permission. Allows you to execute or schedule a session or


batch in the folder.

32

Folder Permission Levels


You can grant folder permissions on the following levels of security:
Owner. The owner of the folder.
Owners Group. Each user in the owners repository group. If
the owner belongs to more than one group, you must select
one of those groups for the owners group.
Repository. All groups and users in the repository.
Each permission level includes the permissions of the level above it.

33

Creating Folders
To Create a New Folder:
Choose Folder-Create

34

Importing Sources
Source Analyzer is used to import or create source definitions for flat file, XML, Cobol, ERP,
and relational sources.

35

Import from Database


For importing from database you need to create an ODBC connection and select the tables
from the database.

36

Import from File


For importing a file into the source analyzer, select the file from the local disk.

37

Viewing Source Definitions

Double-click the title bar of the source definition for the table.

The Edit Tables dialog box opens and displays all the properties of this
source definition. The Table tab shows the name of the table, business name,
owner name, and the database type. You can add a comment in
the
Description section.
Note: To change the source table name, click Rename.
Click the Columns tab.
The Columns tab displays the column descriptions for the source. You
can modify the source definition, change or delete columns. Any changes
you make in this dialog box affect the source definition, not the source.

38

Viewing Source Definitions

39

Creating Targets
You can create target definitions in the Warehouse Designer for file and relational
sources. Create definitions in the following ways:

Import the definition for an existing target. Import the target definition
from a relational target.

Create a target definition based on a source definition. Drag one of


the following existing source definitions into the Warehouse Designer to
make a target definition:
o Relational source definition
o Flat file source definition
o COBOL source definition

Manually create a target definition. Create and design a target definition


in the Warehouse Designer.

Design several related targets. Create several related target definitions


at the same time. You can create the overall relationship, called a
schema, as well as the target definitions, through wizards in the Designer.
The Cubes and Dimensions Wizards follow common principles of data
warehouse design to simplify the process of designing related targets.
40

Creating Targets

41

Creating a Pass-Through Mapping


The next step is to create a mapping to depict the flow of data between sources
and targets. To create and edit mappings, you use the Mapping Designer tool in the
Designer. The mapping interface in the Designer is component-based, meaning
that it shows you every step in the process of moving data between sources and
targets. In addition, transformations depict how the Informatica Server modifies
data before it loads a target.

42

Creating Simple Mapping

Switch to the Mapping Designer.


Choose Mappings-Create.
While the workspace may appear blank, in fact it contains a new mapping
without any sources, targets, or transformations.
In the Mapping Name dialog box, enter <Mapping Name> as the name of
the new mapping and click OK.
The naming convention for mappings is m_MappingName.
In the Navigator, under the <Repository Name> repository and <Folder
Name> folder, click the Sources node to view source definitions added to
the repository.

43

Creating Simple Mapping

Click the icon representing the EMPLOYEES source and drag it into the
workbook.

44

Creating Simple Mapping


The source definition appears in the workspace. The Designer automatically
connects a Source Qualifier transformation to the source definition. After you add
the target definition, you connect the Source Qualifier to the target.
Click

the Targets icon in the Navigator to open the list of all target
definitions.
Click and drag the icon for the T_EMPLOYEES target into the
workspace.
The target definition appears. The final step is connecting the Source
Qualifier to this target definition.

45

Creating Simple Mapping


To Connect the Source Qualifier to Target Definition:
Click once in the middle of the <Column Name> in the Source Qualifier. Hold down
the mouse button, and drag the cursor to the <Column Name> in the target. Then
release the mouse button.
An arrow (called a connector) now appears between the row columns

46

Transformations
A transformation is any part of a mapping that generates or modifies data. Every mapping
includes a Source Qualifier transformation, representing all the columns of information read
from a source and temporarily stored by the Informatica Server. In addition, you can add
transformations such as a calculating sum, looking up a value, or generating a unique ID that
modify information before it reaches the target.
When you build a mapping, you add transformations and configure them to handle data
according to your business purpose. Perform the following tasks to incorporate a
transformation into a mapping:
Create

the transformation. Create it in the Mapping Designer as part of a


mapping, in the Mapplet Designer as part of a mapplet, or in the Transformation
Developer as a reusable transformation.
Configure

the transformation. Each type of transformation has a unique set of

options that you can configure.


Connect

the transformation to other transformations and target definitions.

Drag one port to another to connect them in the mapping or mapplet.

47

Transformation Descriptions
Transformation

Type

Description

Advanced External Procedure

Active/
Connected

Calls a procedure in a shared


library or in the COM layer of
Windows NT.

Aggregator

Active/
Connected

Performs aggregate
calculations.

ERP Source Qualifier

Active/
Connected

Expression

Passive/
Connected

Calculates a value.

External Procedure

Passive/
Connected or Unconnected

Calls a procedure in a shared


library or in the COM layer of
Windows NT.

Filter

Active/
Connected

Filters records.

Represents the rows that the


Informatica Server reads from
an ERP source when it runs a
session.

48

Transformation Descriptions
Input

Passive/
Connected

Defines mapplet input rows. Available


only in the Mapplet Designer.

Joiner

Active/
Connected

Joins records from different


databases or flat file systems.

Lookup

Passive/
Connected or Unconnected

Looks up values.

Normalizer

Active/
Connected

Normalizes records, including those


read from COBOL sources.

Output

Passive/
Connected

Defines mapplet output rows.


Available only in the Mapplet
Designer.

Rank

Active/
Connected

Limits records to a top or bottom


range.

Sequence Generator

Passive/
Connected

Generates primary keys.

49

Transformation Descriptions
Source Qualifier

Active/
Connected

Represents the rows that the


Informatica Server reads from a
relational or flat file source when it
runs a session.

Router

Active/
Connected

Routes data into multiple


transformations based on a group
expression.

Stored Procedure

Passive/
Connected or Unconnected

Calls a stored procedure.

Update Strategy

Active/
Connected

Determines whether to insert,


delete, update, or reject records.

XML Source Qualifier

Passive/
Connected

Represents the rows that the


Informatica Server reads from an
XML source when it runs a session.

50

Transformations Toolbar

51

Aggregator Transformation
The Aggregator transformation allows you to perform aggregate
calculations, such as averages and sums. The Aggregator transformation
is unlike the Expression transformation, in that you can use the Aggregator
transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis
only.
When using the transformation language to create aggregate expressions,
you can use conditional clauses to filter records, providing more flexibility
than SQL language.
The Informatica Server performs aggregate calculations as it reads, and
stores necessary data group and row data in an aggregate cache.

52

Ports in Aggregator Transformation


To configure ports in the Aggregator transformation you can:
Enter

an aggregate expression in any output port, using conditional

clauses or non-aggregate functions in the port.


Create

multiple aggregate output ports.

Configure

any input, input/output, output, or variable port as a Group By

port, and use non-aggregate expressions in the port.


Improve

performance by connecting only the necessary input/output ports

to subsequent transformations, reducing the size of the data cache.


Use

variable ports for local variables.

Create

connections to other transformations as you enter an expression.

53

Components of Aggregator Transformation


The Aggregator is an active transformation, changing the number of rows in the data flow. It
must be connected to the data flow. The Aggregator transformation has several components
and options:
Aggregate

expression. Entered in an output port. Can include non-aggregate

expressions and conditional clauses.


Group

by port. Indicates how to create groups. Can be any input, input/output,

output, or variable port. When grouping data, the Aggregator transformation

outputs

the last row of each group unless otherwise specified.


Sorted

Input option. Use to improve session performance. To use Sorted Input,

you must pass data to the Aggregator transformation sorted by group by port, in
ascending or descending order.
Aggregate

cache. The Aggregator stores data in the aggregate cache until it

completes aggregate calculations. It stores group values in an index cache and row
data in the data cache.

54

Aggregate Cache
When you run a session that uses an Aggregator transformation, the Informatica
Server creates index and data caches in memory to process the transformation. If
the Informatica Server requires more space, it stores overflow values in cache files.
You configure the cache parameters in the session properties.

55

Creating an Aggregator Transformation


To use an Aggregator transformation in a mapping, you add the Aggregator transformation to
the mapping, then configure the transformation with an aggregate expression and group by
ports, if desired.
To create an Aggregator transformation:
1.In

the Mapping Designer, choose Transformation-Create. Select the Aggregator

transformation. The naming convention for Aggregator transformations is


AGG_TransformationName. Enter a description for the transformation. This
description appears in the Repository Manager, making

it easier for you or

others to understand what the transformation does.


2.Enter

a name for the Aggregator, click Create. Then click Done. The Designer

creates the Aggregator transformation.


3.Drag

the desired ports from the source transformation to the Aggregator

transformation. The
4.Double-click

Designer creates input/output ports for each port you include.

the title bar of the transformation to open the Edit Transformations

dialog box.
5.Select

the Ports tab.


56

Creating an Aggregator Transformation


6.Click

the Group By option for each column you want the Aggregator to use in

creating groups. You can optionally enter a default value to replace null groups.
7.If

you want to use a non-aggregate expression to modify groups, click the Add

button and enter a name and datatype for the port. Make the port an output port by
clearing Input (I). Click in the right corner of the Expression field, enter the nonaggregate expression using one of the input ports, then click OK. Select Group By.
8.Click

Add and enter a name and datatype for the aggregate expression port. Make

the port an output port by clearing Input (I). Click in the right corner of the
Expression field to open the Expression Editor. Enter the aggregate expression,
click Validate, then click OK. Make sure the expression validates before closing the
Expression Editor.
9.Add

default values for specific ports as necessary. If certain ports are likely to

contain null values, you might specify a default value if the target database does
not handle null values.
10.Select

the Properties tab.

57

Creating an Aggregator Transformation

58

Creating an Aggregator Transformation


Aggregator
Setting

Description

Cache Directory

Local directory where the Informatica Server creates the index and data
caches and, if necessary, index and data files. By default, the Informatica
Server uses the directory entered in the Server Manager for the server
variable $PMCacheDir. If you enter a new directory, make sure the directory
exists and contains enough memory/disk space for the aggregate caches.

Tracing Level

Amount of detail displayed in the session log for this transformation.

Sorted Input

Indicates input data is presorted by groups. Select this option only if the
mapping passes data to the Aggregator that is sorted by the Aggregator
group by ports and by the same sort order configured for the session.
Note: Use the Source Qualifier Number of Sorted Ports option to sort
relational sources.

59

Expression Transformation
You can use the Expression transformations to calculate values in a single row before
you write to the target.
For example, you might need to adjust employee salaries, concatenate first and last
names, or convert strings to numbers.
You can use the Expression transformation to perform any non-aggregate calculations.
You can also use the Expression transformation to test conditional statements before
you output the results to target tables or other transformations.

60

Expression Transformation
Calculating Values
To use the Expression transformation to calculate values for a single row, you must include
the following ports:
Input

or input/output ports for each value used in the calculation. For


example, when calculating the total price for an order, determined by multiplying the
unit price by the quantity

ordered, the input or input/output ports.

provides the unit price and the other


Output

One port

provides the quantity ordered.

port for the expression. You enter the expression as a configuration

option for the

output port. The return value for the output port needs to

the return value of the expression.

61

match

Expression Transformation
Adding Multiple Calculations
You can enter multiple expressions in a single Expression transformation. As long as you
enter only one expression for each output port, you can create any number of output ports in
the transformation. In this way, you can use one Expression transformation rather than
creating separate transformations for each calculation that requires the same set of data.
For example, you might want to calculate several types of withholding taxes from each
employee paycheck, such as local and federal income tax, Social Security and Medicare.
Since all of these calculations require the employee salary, the withholding category, and/or
the corresponding tax rate, you can create one Expression transformation with the salary and
withholding category as input/output ports and a separate output port for each necessary
calculation.

62

Creating an Expression Transformation


To create an Expression transformation:
1.In

the Mapping Designer, choose Transformation-Create. Select the Expression

transformation and add it to the mapping. Enter a name for it (the convention is
EXP_TransformationName) and click OK.
2.Create

the input ports. If you have the input transformation available, you can

select Link Columns from the Layout menu and then click and drag each port used
the calculation into the Expression transformation. With this method, the

in

Designer

copies the port into the new transformation and creates a connection between the two
ports. Or, you can open the Edit dialog box and create each port

manually.

Note: If you want to make this transformation reusable, you must create each port
manually within the transformation.
3.Repeat

the previous step for each input port you want to add to the expression.

4.Create

the output ports (O) you need, making sure to assign a port datatype that

matches the expression return value. The naming convention for output ports is
OUT_PORTNAME.

63

Creating an Expression Transformation


5. Click

enter

the expression in the Expression Editor. To prevent typographic errors, where

possible,
6. If

the small button that appears in the Expression section of the dialog box and
use the listed port names and functions.

you select a port name that is not connected to the transformation, the Designer

copies the port into the new transformation and creates a connection between the two
ports.
7. Port

stricter

names used as part of an expression in an Expression transformation follow

rules than port names in other types of transformations:

A port name must begin with a single- or double-byte letter or single- or doublebyte underscore (_).

It can contain any of the following single- or double-byte characters: a letter,


number, underscore (_), $, #, or @.

6.Check the expression syntax by clicking Validate. If necessary, make corrections to the
expression and check the syntax again. Then save the expression and exit the

Expression Editor.

7.Connect the output ports to the next transformation or target.


8.Select a tracing level on the Properties tab to determine the amount of transaction detail
reported in the session log file.
9.Choose Repository-Save.

64

Lookup Transformation
Use a Lookup transformation in your mapping to look up data in a relational table, view,

or

synonym. Import a lookup definition from any relational database to which both the
Informatica Client and Server can connect. You can use multiple Lookup

transformations

in a mapping.
The Informatica Server queries the lookup table based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup table column values
based on the lookup condition. Use the result of the lookup to pass to other transformations
and the target.
You can use the Lookup transformation to perform many tasks, including:
Get

a related value. For example, if your source table includes employee ID, but

you want to include the employee name in your target table to make your summary
data easier to read.
Perform

a calculation. Many normalized tables include values used in a

calculation,

such as gross sales per invoice or sales tax, but not the calculated value

(such as net

sales).

Update

slowly changing dimension tables. You can use a Lookup transformation

to determine whether records already exist in the target.


65

Lookup Transformation
You can configure the Lookup transformation to perform different types of lookups. You can
configure the transformation to be connected or unconnected, cached or uncached:
Connected

or unconnected. Connected and unconnected transformations receive


input and send output in different ways.
Cached

caching

or uncached. Sometimes you can improve session performance by

the lookup table. If you cache the lookup table, you can choose to use a

dynamic or

static cache. By default, the lookup cache remains static and does not

change during
rows into the cache

the session. With a dynamic cache, the Informatica Server inserts


during the session. Informatica recommends that you cache the

target table as the

lookup. This enables you to look up values in the target and insert

them if they do not

exist.

66

Connected Lookup Transformation


The following steps describe the way the Informatica Server processes a connected Lookup
transformation:
1.A

connected Lookup transformation receives input values directly from another

transformation in the pipeline.


2.For

each input row, the Informatica Server queries the lookup table or cache based

on the lookup ports and the condition in the transformation.


3.If

the transformation is uncached or uses a static cache, the Informatica Server

returns values from the lookup query. If the transformation uses a dynamic cache,
the

Informatica Server inserts the row into the cache when the lookup query does not

find

the row in the cache. It flags the row as new or existing, based on the result of

the

lookup query.
4.The

Lookup transformation passes return values from the query to the next

transformation. If the transformation uses a dynamic cache, you can pass rows
to a

Filter or Router transformation to filter new rows to the target.

67

Unconnected Lookup Transformation


An unconnected Lookup transformation receives input values from the result of a :LKP
expression in another transformation. You can call the Lookup transformation more than once
in a mapping.
A common use for unconnected Lookup transformations is to update slowly changing
dimension tables. The following steps describe the way the Informatica Server processes an
unconnected Lookup transformation:
1.An

unconnected Lookup transformation receives input values from the result of a

:LKP expression in another transformation, such as an Update Strategy


transformation.
2.The

Informatica Server queries the lookup table or cache based on the lookup ports

and condition in the transformation.


3.The

Informatica Server returns one value into the return port of the Lookup

transformation.
4.The

Lookup transformation passes the return value into the :LKP expression.

68

Differences between Connected and Unconnected Lookup

Connected Lookup

Unconnected Lookup

Receives input values directly from the pipeline.

Receives input values from the result of a :LKP


expression in another transformation.

You can use a dynamic or static cache.

You can use a static cache.

Cache includes all lookup columns used in the


mapping (that is, lookup table columns included in the
lookup condition and lookup table columns linked as
output ports to other transformations).

Cache includes all lookup/output ports in the lookup


condition and the lookup/return port.

Can return multiple columns from the same row or


insert into the dynamic lookup cache.

Designate one return port (R). Returns one column


from each row.

69

Differences between Connected and Unconnected Lookup

Connected Lookup

Unconnected Lookup

If there is no match for the lookup condition, the


Informatica Server returns the default value for all
output ports. If you configure dynamic caching, the
Informatica Server inserts rows into the cache.

If there is no match for the lookup condition, the


Informatica Server returns NULL.

Pass multiple output values to another transformation.


Link lookup/output ports to another transformation.

Pass one output value to another transformation. The


lookup/output/return port passes the value to the
transformation calling :LKP expression.

Supports user-defined default values.

Does not support user-defined default values.

70

Lookup Components
When you configure a Lookup transformation in a mapping, you define the
following components:
Lookup table
Ports
Properties
Condition

71

Lookup Table
You can import a lookup table from the mapping source or target database, or
you can import a lookup table from any database that both the Informatica
Server and Client machine can connect to. If your mapping includes
heterogeneous joins, you can use any of the mapping sources or mapping
targets as the lookup table.
The lookup table can be a single table, or you can join multiple tables in the
same database using a lookup query override. The Informatica Server queries
the lookup table or an in-memory cache of the table for all incoming rows into
the Lookup transformation.
Connect to the database to import the lookup table definition. The Informatica
Sever can connect to a lookup table using a native database driver or an ODBC
driver. However, the native database drivers improve session performance.

72

Lookup Table
Indexes and a Lookup Table
If you have privileges to modify the database containing a lookup table, you can
improve lookup initialization time by adding an index to the lookup table. This is
important for very large lookup tables. Since the Informatica Server needs to
query, sort, and compare values in these columns, the index needs to include
every column used in a lookup condition.
You can improve performance by adding indexes for the following lookups:
Cached

lookups. You can improve performance by indexing the columns in the

lookup ORDER BY. The session log contains the ORDER BY statement.
Uncached

for

lookups. Because the Informatica Server issues a SELECT statement

each row passing into the Lookup transformation, you can improve performance by
indexing the columns in the lookup condition.

73

Lookup Ports
The Ports tab contains options similar to other transformations, such as port
name, datatype, and scale. In addition to input and output ports, the Lookup
transformation includes a lookup port type that represents columns of data in
the lookup table. An unconnected Lookup transformation also includes a return
port type that represents the return value.

74

Lookup Ports
Ports

Type of
Lookup

Connected
Unconnected

Number
Required

Description

Minimum of 1

Input port. Create an input port for each lookup port you want to
use in the lookup condition. You must have at least one input or
input/output port in each Lookup transformation.

Connected
Unconnected

Minimum of 1

Output port. Create an output port for each lookup port you want
to link to another transformation. You can designate both input
and lookup ports as output ports. For connected lookups, you
must have at least one output port. For unconnected lookups, use
the return port (R) to designate a return value.

Connected
Unconnected

Minimum of 1

Lookup port. The Designer automatically designates each column


in the lookup table as a lookup (L) and output port (O).

1 only

Return port. Use only in unconnected Lookup transformations.


Designates the column of data you want to return based on the
lookup condition. You can designate one lookup/output port as the
return port.

Unconnected

75

Lookup Transformation Properties


Properties for the Lookup transformation identify the database source, how the
Informatica Server processes the transformation, and how it handles caching
and multiple matches.
On the Properties tab, you can configure properties such as a SQL override for
the lookup, the lookup table name, and tracing level for the transformation.
Most of the options on this tab allow you to configure caching properties.

76

Lookup Transformation Properties


Option

Description

Lookup SQL
Override

Overrides the default SQL statement to query the lookup table.


Specifies the SQL statement you want the Informatica Server to use for querying lookup
values. Use only with the lookup cache enabled.
Enter only the SELECT FROM and WHERE clauses when you enter the SQL override. Do not
enter the ORDER BY clause.

Lookup Table
Name

Specifies the name of the table from which the transformation looks up and caches values.
You can import a table, view, or synonym from another database by selecting the Import button
on the dialog box that displays when you first create a Lookup transformation.
If you enter a lookup SQL override, you do not need to add an entry for this option.

Lookup Caching
Enabled

Indicates whether the Lookup transformation caches lookup values during the session.
When lookup caching is enabled, the Informatica Server queries the lookup table once,
caches the values, and looks up values in the cache during the session. This can improve
session performance.
When you disable caching, each time a row passes into the transformation, the Informatica
Server issues a select statement to the lookup table for lookup values.

77

Lookup Transformation Properties


Option

Description

Lookup Policy on
Multiple Match

Available for Lookup transformations that are uncached or use a static cache. Determines
what happens when the Lookup transformation finds multiple rows that match the lookup
condition. You can select the first or last record returned from the cache or lookup table, or
report an error.
The Informatica Server fails a session when it encounters a multiple match while processing a
Lookup transformation with a dynamic cache.

Lookup Condition

Displays the lookup condition you set in the Condition tab.

Location
Information

Specifies the database containing the lookup table. You can select the exact database or you
can use the $Source or $Target variable. If you use one of these variables, the lookup table
must reside in the source or target database you specify when you configure the session.
When you have more than one relational source in the mapping, the session fails if you use
$Source.

Source Type

Indicates that the Lookup transformation reads values from a relational database.

78

Lookup Transformation Properties


Option

Description

Recache if Stale

The Recache from Database option replaces the Recache if Stale and Lookup Cache
Initialize options.

Tracing Level

Sets the amount of detail included in the session log when you run a session containing
this transformation.

Lookup Cache
Directory Name

Specifies the directory used to build the lookup cache files when the Lookup
transformation is configured to cache the lookup table. Also used to save the persistent
lookup cache files when the Lookup Persistent option is selected.
By default, the Informatica Server uses the $PMCacheDir directory configured for the
Informatica Server.

Lookup Cache
Initialize

The Recache from Database option replaces the Lookup Cache Initialize and Recache if
Stale options.

Lookup Cache
Persistent

Indicates whether the Informatica Server uses a persistent lookup cache, which consists
of at least two cache files. If a Lookup transformation is configured for a persistent lookup
cache and persistent lookup cache files do not exist, the Informatica Server creates the
files during the session. You can use this only when you enable lookup caching.

79

Lookup Transformation Properties


Option

Description

Lookup Data
Cache Size

Indicates the maximum size the Informatica Server allocates to the data cache in memory. If
the Informatica Server cannot allocate the configured amount of memory when initializing the
session, it fails the session. When the Informatica Server cannot store all the data cache data
in memory, it pages to disk as necessary.
The Lookup Data Cache Size is 2,000,000 bytes by default. The minimum size is 1,024 bytes.
Use only with the lookup cache enabled.

Lookup Index
Cache Size

Indicates the maximum size the Informatica Server allocates to the index cache in memory. If
the Informatica Server cannot allocate the configured amount of memory when initializing the
session, it fails the session. When the Informatica Server cannot store all the index cache data
in memory, it pages to disk as necessary.
The Lookup Index Cache Size is 1,000,000 bytes by default. The minimum size is 1,024 bytes.
Use only with the lookup cache enabled.

Dynamic Lookup
Cache

Indicates to use a dynamic lookup cache. Inserts new rows into the lookup cache as it passes
rows to the target table. You can use this only when you enable lookup caching.

Cache File Name


Prefix

Specifies the file name prefix to use with persistent lookup cache files. The Informatica Server
uses the file name prefix as the file name for the persistent cache files it saves to disk. Only
enter the prefix. Do not enter .idx or .dat.
If the named persistent cache files exist, the Informatica Server builds the memory cache from
the files. If the named persistent cache files do not exist, the Informatica Server rebuilds the
persistent cache files. Use only with persistent lookup cache.

80

Lookup Condition
The Informatica Server uses the lookup condition to test incoming values. It is similar to the
WHERE clause in an SQL query. When you configure a lookup condition for the
transformation, you compare transformation input values with values in the lookup table or
cache, represented by lookup ports. When you run a session, the Informatica Server queries
the lookup table or cache for all incoming values based on the condition.
You must enter a lookup condition in all Lookup transformations. Some guidelines for the
lookup condition apply for all Lookup transformations, and some guidelines vary depending
on how you configure the transformation.

81

Lookup Condition
Use the following guidelines when you enter a condition for any Lookup transformation:
The

datatypes in a condition must match.

Use

one input port for each lookup port used in the condition. You can use the same

input port in more than one condition in a transformation.


When

condition
all the
The

you enter multiple conditions, the Informatica Server evaluates each


as an AND, not an OR. The Informatica Server returns only rows that match
conditions you specify.

Informatica Server matches null values. For example, if an input lookup

condition column is NULL, the Informatica Server evaluates the NULL equal to a
NULL in the lookup table.
The lookup condition guidelines and the way the Informatica Server processes matches
varies depending on whether you configure the transformation for a dynamic cache or for an
uncached or static cache.

82

Lookup Condition Uncached or Static Cache


Uncached or Static Cache
Use the following guidelines when you configure a Lookup transformation without a cache or
to use a static cache:
If

you configure a Lookup transformation to use a static cache, or not to cache, you

can use the following operators when you create the lookup condition: =, >, <, >=, <=,
!=
If

you include more than one lookup condition, place the conditions with an equal

sign first to optimize lookup performance.


The

input value must meet all conditions for the lookup to return a value.

The condition can match equivalent values or supply a threshold condition. For example, you
might look for customers who do not live in California, or employees whose salary is greater
than $30,000. Depending on the nature of the source and condition, the Lookup might return
multiple values.

83

Lookup Condition Uncached or Static Cache


Handling Multiple Matches
Lookups find a value based on the conditions you set in the Lookup transformation. If the
lookup condition is not based on a unique key, or if the lookup table is denormalized, the
Informatica Server might find multiple matches in the lookup table or cache.
You can configure the static Lookup transformation to handle multiple matches in the following
ways:
Return

the first matching value, or return the last matching value. You can configure the

transformation either to return the first matching value or the last matching value. The first and last
values are the first values and last values found in the lookup cache that match the lookup condition.
When you cache the lookup table, the Informatica Server determines which record is first and which is
last by generating an ORDER BY clause for each column in the lookup cache.
then sorts each lookup source column in the lookup condition in ascending
Server sorts numeric columns in ascending numeric order (such as 0 to 10
from January to December and from the first of the month to the end of the

The Informatica Server


order. The Informatica
), date/time columns
month, and string

columns based on the sort order configured for the session.


Return

an error. The Informatica Server returns the default value for the output ports.

Note: The Informatica Server fails the session when it encounters multiple keys for a Lookup
transformation configured to use a dynamic cache.
84

Lookup Condition Dynamic Cache


Dynamic Cache
If you configure a Lookup transformation to use a dynamic cache, you can only use the
equality operator (=) in the lookup condition.
Handling Multiple Matches
You cannot configure handling for multiple matches in a Lookup transformation configured to
use a dynamic cache. The Informatica Server fails the session when it encounters multiple
matches either while caching the lookup table or looking up values in the cache that contain
duplicate keys.

85

Lookup Caches
The Informatica Server builds a cache in memory when it processes the first row of data in a
cached Lookup transformation. It allocates memory for the cache based on the amount you
configure in the transformation or session properties. The Informatica Server stores condition
values in the index cache and output values in the data cache. The Informatica Server
queries the cache for each row that enters the transformation.
The Informatica Server also creates cache files by default in the $PMCacheDir. If the data
does not fit in the memory cache, the Informatica Server stores the overflow values in the
cache files. When the session completes, the Informatica Server releases cache memory and
deletes the cache files unless you configure the Lookup transformation to use a persistent
cache.
When configuring a lookup cache, you can specify any of the following options:
Persistent

cache. You can save the lookup cache files and reuse them the next time

the Informatica Server processes a Lookup transformation configured to use the


cache.
Recache

from Database. If the persistent cache is not synchronized with the lookup

table, you can configure the Lookup transformation to rebuild the lookup cache.
86

Lookup Caches
Static

cache. You can configure a static, or read-only, cache for any lookup table.

By default, the Informatica Server creates a static cache. It caches the lookup table
and looks up values in the cache for each row that comes into the transformation.
When the lookup condition is true, the Informatica Server returns a value in the
lookup cache. The Informatica Server does not update the cache while it processes
the Lookup transformation.
Dynamic

cache. If you want to cache the target table and insert new rows into the

cache and the target, you can create a Lookup transformation to use a dynamic
cache. The Informatica Server dynamically inserts data into the lookup cache and
passes data to the target table.
Shared

cache. You can share the lookup cache between multiple transformations.

You can share an unnamed cache between transformations in the same mapping.
You can share a named cache between transformations in the same or different
mappings.

87

Creating Lookup Transformation


To create a Lookup transformation:
1.

In the Mapping Designer, choose Transformation-Create. Select the Lookup


transformation. Enter a name for the lookup. The naming convention for Lookup
transformations is LKP_TransformationName. Click OK.

2.

In the Select Lookup Table dialog box, you can choose the lookup table. Click
the Import button if the lookup table is not in the source or target database.

88

Creating Lookup Transformation


3.

If you want to manually define the lookup transformation, click the Skip button.

4.

Define input ports for each Lookup condition you want to define.

5.

For an unconnected Lookup transformation, create a return port for the value
you want to return from the lookup.

6.

Define output ports for the values you want to pass to another transformation.

7.

For Lookup transformations that use a dynamic lookup cache, associate an


input port or sequence ID with each lookup port.

8.

Add the lookup conditions. If you include more than one condition, place the
conditions using equal signs first to optimize lookup performance. On the
Properties tab, set the properties for the lookup. Click OK.

11.

For unconnected Lookup transformations, write an expression in another


transformation using :LKP to call the unconnected Lookup transformation.

89

Sequence Generator Transformation


The Sequence Generator transformation generates numeric values. You can use the
Sequence Generator to create unique primary key values, replace missing primary keys, or
cycle through a sequential range of numbers.
The Sequence Generator transformation is a connected transformation. It contains two
output ports that you can connect to one or more transformations. The Informatica Server
generates a value each time a row enters a connected transformation, even if that value is
not used. When NEXTVAL is connected to the input port of another transformation, the
Informatica Server generates a sequence of numbers. When CURRVAL is connected to the
input port of another transformation, the Informatica Server generates the NEXTVAL value
plus one.
You can make a Sequence Generator reusable, and use it in multiple mappings. You might
reuse a Sequence Generator when you perform multiple loads to a single target.
For example, if you have a large input file that you separate into three sessions running in
parallel, you can use a Sequence Generator to generate primary key values. If you use
different Sequence Generators, the Informatica Server might accidentally generate
duplicate key values. Instead, you can use the same reusable Sequence Generator for all
three sessions to provide a unique value for each target row.
90

Creating Sequence Generator Transformation


To create a Sequence Generator transformation:
1.

In the Mapping Designer, select Transformation-Create. Select the Sequence


Generator transformation. The naming convention for Sequence Generator
transformations is SEQ_TransformationName.

2.

Enter a name for the Sequence Generator, click Create. Then click Done. The
Designer creates the Sequence Generator transformation.

3.

Double-click the title bar of the transformation to open the Edit Transformations
dialog box.

91

Creating Sequence Generator Transformation


4.

Enter a description for the transformation. This description appears in the


Repository Manager, making it easier for you or others to understand what the
transformation does.

5.

Select the Properties tab. Enter settings as necessary.

92

Stored Procedure Transformation


A Stored Procedure transformation is an important tool for populating and
maintaining databases. Database administrators create stored procedures to
automate time-consuming tasks that are too complicated for standard SQL
statements.
Not all databases support stored procedures, and database implementations
vary widely on their syntax. You might use stored procedures to:

Drop and recreate indexes.

Check the status of a target database before moving records into it.

Determine if enough space exists in a database.

Perform a specialized calculation.


The stored procedure must exist in the database before creating a Stored
Procedure transformation, and the stored procedure can exist in a source, target,
or any database with a valid connection to the Informatica Server.

93

Creating Stored Procedure Transformation


There are two ways to configure the Stored Procedure transformation:

Use the Import Stored Procedure dialog box to automatically configure


the ports used by the stored procedure.

Configure the transformation manually, creating the appropriate ports


for any input or output parameters.
Stored Procedure transformations are created as Normal type by default, which
means that they run during the mapping, not before or after the session.
New Stored Procedure transformations are not created as reusable
transformations. To create a reusable transformation, click Make Reusable in the
Transformation properties after creating the transformation.

94

Import Stored Procedure


When you import a stored procedure, the Designer creates ports based on the
stored procedure input and output parameters. You should import the stored
procedure whenever possible.

There are three ways to import a stored procedure in the Mapping


Designer:

Select the stored procedure icon and add a Stored Procedure


transformation.
Select Transformation-Import Stored Procedure.
Select Transformation-Create, and then select Stored Procedure.

95

Modes of Stored Procedure Transformation


The modes of stored procedure are

Connected
Unconnected

The type you use depends on what your stored procedure does, and how often
the stored procedure should run in a mapping.

96

Connected
The flow of data through a mapping in connected mode also passes through the Stored
Procedure transformation. All data entering the transformation through the input ports
affects the stored procedure. You should use a connected stored procedure when you
need data from an input port sent as an input parameter to the stored procedure, or the
results of a stored procedure sent as an output parameter to another transformation.

97

Configuring connected Stored Procedure Transformation

To configure a connected Stored Procedure transformation:

Create the Stored Procedure transformation in your mapping.


Drag ports from upstream transformations to connect to any available
input ports.
Drag ports from the output ports of the Stored Procedure to other
transformations or targets.
Double-click the transformations, and select the Properties tab. Select
the appropriate database in the Connection Information item if you did
not select it when creating the transformation.
Select the Tracing level for the transformation. If you are testing the
mapping, select the Verbose Initialization option to provide the most
information in the event that the transformation fails. Click OK.
Choose Repository-Save to save changes to the mapping.

98

Unconnected
The unconnected Stored Procedure transformation is not connected directly to
the flow of the mapping. It either runs before or after the session, or is called by
an expression in another transformation in the mapping.

99

Configuring Unconnected Stored Procedure Transformation

An unconnected Stored Procedure transformation is not directly connected to


the flow of data through the mapping. Instead, the stored procedure runs either:

From an expression. Called from an expression written in the


Expression Editor within another transformation in the mapping.

Pre- or post-session. Runs before or after a session.

100

Source Qualifier Transformation


When you add a relational or a flat file source definition to a mapping, you need to connect
it to a Source Qualifier transformation. The Source Qualifier represents the records that the
Informatica Server reads when it runs a session.
You can use the Source Qualifier to perform the following tasks:

Join data originating from the same source database. You can join two or more tables
with primary-foreign key relationships by linking the sources to one Source Qualifier.

Filter records when the Informatica Server reads source data. If you include a filter
condition, the Informatica Server adds a WHERE clause to the default query.

Specify an outer join rather than the default inner join. If you include a user-defined
join, the Informatica Server replaces the join information specified by the metadata in the
SQL query.

Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds
an ORDER BY clause to the default SQL query.

Select only distinct values from the source. If you choose Select Distinct, the
Informatica Server adds a SELECT DISTINCT statement to the default SQL query.

Create a custom query to issue a special SELECT statement for the Informatica
Server to read source data. For example, you might use a custom query to perform
aggregate calculations or execute a stored procedure
101

Configuring Source Qualifier Transformation


To configure a Source Qualifier:

In the Designer, open a mapping.

Double-click the title bar of the Source Qualifier.

In the Edit Transformations dialog box, click Rename, enter a


descriptive name for the transformation, and click OK. The naming
convention for Source Qualifier transformations is
SQ_TransformationName,.

Click the Properties tab.

102

Configuring Source Qualifier Transformation


Option

Description

SQL Query

Defines a custom query that replaces the default query the Informatica Server uses to read data from
sources represented in this Source Qualifier

User-Defined
Join

Specifies the condition used to join data from multiple sources represented in the same Source
Qualifier transformation

Source Filter

Specifies the filter condition the Informatica Server applies when querying records.

Number of Sorted
Ports

Indicates the number of columns used when sorting records queried from relational sources. If you
select this option, the Informatica Server adds an ORDER BY to the default query when it reads
source records. The ORDER BY includes the number of ports specified, starting from the top of
the Source Qualifier.
When selected, the database sort order must match the session sort order.

Tracing Level

Sets the amount of detail included in the session log when you run a session containing this
transformation.

Select Distinct

Specifies if you want to select only unique records. The Informatica Server includes a SELECT
DISTINCT statement if you choose this option.

103

Filter Transformation
The Filter transformation provides the means for filtering rows in a mapping. You
pass all the rows from a source transformation through the Filter transformation,
and then enter a filter condition for the transformation. All ports in a Filter
transformation are input/output, and only rows that meet the condition pass
through the Filter transformation.

104

Creating a Filter Transformation


To create a Filter transformation:

In the Designer, switch to the Mapping Designer and open a mapping.

Choose Transformation-Create. Select Filter transformation, and enter the name of


the new transformation. The naming convention for the Filter transformation is
FIL_TransformationName. Click Create, and then click Done.

Select and drag all the desired ports from a source qualifier or other transformation
to add them to the Filter transformation. After you select and drag ports, copies of
these ports appear in the Filter transformation. Each column has both an input and
an output port.

Double-click the title bar of the new transformation.

Click the Properties tab. A default condition appears in the list of conditions. The
default condition is TRUE (a constant with a numeric value of 1).

105

Joiner Transformation
While a Source Qualifier transformation can join data originating from a common source
database, the Joiner transformation joins two related heterogeneous sources residing in
different locations or file systems. The combination of sources can be varied. You can use
the following sources:

Two relational tables existing in separate databases

Two flat files in potentially different file systems

Two different ODBC sources

Two instances of the same XML source

A relational table and a flat file source

A relational table and an XML source

If two relational sources contain keys, then a Source Qualifier transformation can easily join
the sources on those keys. Joiner transformations typically combine information from two
different sources that do not have matching keys, such as flat file sources.
The Joiner transformation allows you to join sources that contain binary data.

106

Creating a Joiner Transformation


To create a Joiner Transformation:

In the Mapping Designer, choose Transformation-Create. Select the Joiner transformation.


Enter a name for the Joiner. Click OK. The naming convention for Joiner transformations is
JNR_TransformationName. Enter a description for the transformation. This description appears
in the Repository Manager, making it easier for you or others to understand or remember what
the transformation does.
The Designer creates the Joiner transformation. Keep in mind that you cannot use a Sequence
Generator or Update Strategy transformation as a source to a Joiner transformation.
Drag all the desired input/output ports from the first source into the Joiner transformation. The
Designer creates input/output ports for the source fields in the Joiner as detail fields by default.
You can edit this property later.
Select and drag all the desired input/output ports from the second source into the Joiner
transformation. The Designer configures the second set of source fields and master fields by
default.
Double-click the title bar of the Joiner transformation to open the Edit Transformations dialog
box.
Select the Ports tab.
Click any box in the M column to switch the master/detail relationship for the sources. Change
the master/detail relationship if necessary by selecting the master source in the M column.

107

Creating a Joiner Transformation

Select the Condition tab and set the condition.

108

Creating a Joiner Transformation


Select the Properties tab and enter any additional settings for the transformations. .
Joiner Setting

Description

Case-Sensitive String
Comparison

If selected, the Informatica Server uses case-sensitive string comparisons when


performing joins on string columns.

Cache Directory

Specifies the directory used to cache master records and the index to these records. By
default, the caches are created in a directory specified by the server variable
$PMCacheDir. If you override the directory, be sure there is enough disk space on the file
system. The directory can be a mapped or mounted drive.

Join Type

Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.

Null Ordering in
Master

Not applicable for this transformation type.

Null Ordering in Detail

Not applicable for this transformation type.

Tracing Level

Amount of detail displayed in the session log for this transformation. The options are
Terse, Normal, Verbose Data, and Verbose Initialization.

109

Rank Transformation
The Rank transformation allows you to select only the top or bottom rank of data. You can
use a Rank transformation to return the largest or smallest numeric value in a port or
group. You can also use a Rank transformation to return the strings at the top or the bottom
of a session sort order. During the session, the Informatica Server caches input data until it
can perform the rank calculations.
The Rank transformation differs from the transformation functions MAX and MIN, in that it
allows you to select a group of top or bottom values, not just one value. For example, you
can use Rank to select the top 10 salespersons in a given territory. Or, to generate a
financial report, you might also use a Rank transformation to identify the three departments
with the lowest expenses in salaries and overhead. While the SQL language provides
many functions designed to handle groups of data, identifying top or bottom strata within a
set of rows is not possible using standard SQL functions.

110

Creating a Rank Transformation


To create a Rank transformation:

In the Mapping Designer, choose Transformation-Create. Select the Rank


transformation. Enter a name for the Rank. The naming convention for Rank
transformations is RNK_TransformationName. Enter a description for the
transformation. This description appears in the Repository Manager.

Click OK, and then click Done. The Designer creates the Rank transformation.

Link columns from an input transformation to the Rank transformation.

Click the Ports tab, and then select the Rank (R) option for the port used to
measure ranks.

111

Creating a Rank Transformation

112

Creating a Rank Transformation


Click the Properties tab and select whether you want the top or bottom rank.
Setting

Description

Cache directory

Local directory where the Informatica Server creates the index and data caches and,
inecessary, index and data files. By default, the Informatica Server uses the directory
entered in the Server Manager for the server variable $PMCacheDir. If you enter a new
directory, make sure the directory exists and contains enough disk space for the rank
caches.

Top/Bottom

Specifies whether you want the top or bottom ranking for a column.

Number of Ranks

The number of rows you want to rank.

Case-Sensitive String
Comparison

When running in Unicode mode, the Informatica Server ranks strings based on the sort
order selected for the session. If the session sort order is case-sensitive, select this option
to enable case-sensitive string comparisons, and clear this option to have the Informatica
Server ignore case for strings. If the sort order is not case-sensitive, the Informatica Server
ignores this setting. By default, this option is selected.

Tracing level

Determines the amount of information the Informatica Server writes to the session log about
data passing through this transformation during a session.

113

Router Transformation
A Router transformation is similar to a Filter transformation because both transformations
allow you to use a condition to test data. A Filter transformation tests data for one condition
and drops the rows of data that do not meet the condition. However, a Router
transformation tests data for one or more conditions and gives you the option to route rows
of data that do not meet any of the conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router
Transformation in a mapping instead of creating multiple Filter transformations to perform
the same task. The Router transformation is more efficient when you design a mapping and
when you run a session. For example, to test data based on three conditions, you only
need one Router transformation instead of three filter transformations to perform this task.
Likewise, when you use a Router transformation in a mapping, the Informatica Server
processes the incoming data only once. When you use multiple Filter transformations in a
mapping, the Informatica Server processes the incoming data for each transformation.

114

Router Transformation Components


A Router transformation consists of input and output groups, input and output ports, group filter
conditions, and properties that you configure in the Designer.

115

Working with Groups


Router transformation has the following types of groups:

Input

Output
Input Group
The Designer copies property information from the input ports of the input group to
create a set of output ports for each output group.
Output Groups
There are two types of output groups:

User-defined groups
Default group

You cannot modify or delete output ports or their properties.

116

Creating Group Filter Conditions

117

Creating a Router Transformation


To create a Router transformation:
1.

In the Mapping Designer, open a mapping.

2.

Choose Transformation-Create. Select Router transformation, and enter the name of the
new transformation. The naming convention for the Router transformation is
RTR_TransformationName. Click Create, and then click Done.

3.

Select and drag all the desired ports from a transformation to add them to the Router
transformation, or you can manually create input ports on the Ports tab.

4.

Double-click the title bar of the Router transformation to edit transformation properties.

5.

Click the Transformation tab and configure transformation properties as desired.

6.

Click the Properties tab and configure tracing levels as desired.

7.

Click the Groups tab, and then click the Add button to create a user-defined group. The
Designer creates the default group when you create the first user-defined group.

8.

Click the Group Filter Condition field to open the Expression Editor.

9.

Enter a group filter condition.

10.

Click Validate to check the syntax of the conditions you entered.

118

Update Strategy Transformation


When you design your data warehouse, you need to decide what type of information to store in targets.
As part of your target table design, you need to determine whether to maintain all the historic data or
just the most recent changes.
For example, you might have a target table, T_CUSTOMERS, that contains customer data. When a
customer address changes, you may want to save the original address in the table, instead of updating
that portion of the customer record. In this case, you would create a new record containing the updated
address, and preserve the original record with the old customer address. This illustrates how you might
store historical information in a target table. However, if you want the T_CUSTOMERS table to be a
snapshot of current customer data, you would update the existing customer record and lose the original
address.
The model you choose constitutes your update strategy, how to handle changes to existing records. In
PowerMart and PowerCenter, you set your update strategy at two different levels:

Within a session. When you configure a session, you can instruct the Informatica Server
to either treat all records in the same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for different database
operations.

Within a mapping. Within a mapping, you use the Update Strategy transformation to flag
records for insert, delete, update, or reject.

119

Setting up Update Strategy for a Session

120

Specifying an Option for all Rows


During session configuration, you can select a single database operation for all records.
For the Treat Rows As setting, you have the following options:
Setting

Description

Insert

Treat all records as inserts. If inserting the record violates a primary or foreign key constraint in the
database, the Informatica Server rejects the record.

Delete

Treat all records as deletes. For each record, if the Informatica Server finds a corresponding record in the
target table (based on the primary key value), the Informatica Server deletes it. Note that the primary key
constraint must exist in the target definition in the repository.

Update

Treat all records as updates. For each record, the Informatica Server looks for a matching primary key value
in the target table. If it exists, the Informatica Server updates the record. Again, the primary key constraint
must exist in the target definition.

Data
Driven

The Informatica Server follows instructions coded into Update Strategy transformations within the session
mapping to determine how to flag records for insert, delete, update, or reject.
If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven
by default.
If you do not choose Data Driven setting, the Informatica Server ignores all Update Strategy transformations
in the mapping.

121

Update Strategy Settings


The setting you choose depends on your update strategy and the status of data in target
tables:
Setting

Use To

Insert

Populate the target tables for the first time, or maintaining a historical data warehouse. In the latter case,
you must set this strategy for the entire data warehouse, not just a select group of target tables.

Delete

Clear target tables.

Update

Update target tables. You might choose this setting whether your data warehouse contains historical data
or a snapshot. Later, when you configure how to update individual target tables, you can determine
whether to insert updated records as new records or use the updated information to modify existing
records in the target.

Data
Driven

Exert finer control over how you flag records for insert, delete, update, or reject. Choose this setting if
records destined for the same table need to be flagged on occasion for one operation (for example,
update), or for a different operation (for example, reject). In addition, this setting provides the only way you
can flag records for reject.

122

Specifying Options for individual Target Tables


Once you determine how to treat all rows in the session (insert, delete, update, or datadriven), you also need to set update strategy options for individual targets. You set the
following options in the Targets section of the Session Wizard:

123

Specifying Options for individual Target Tables

Insert. Select this option to insert a row into a target table.


Delete. Select this option to delete a record from a table.
Truncate table. Select this option to truncate the target table before loading data.
Update. You have three different options in this situation:

Option

Description

Update as
update

Update each record flagged for update if it exists in the


target table.

Update as
insert

Insert each record flagged for update.

Update else
insert

Update the record if it exists. Otherwise, insert it.

124

Update Strategy Within a Mapping


For the greatest degree of control over your update strategy, you add Update Strategy
transformations to a mapping. The most important feature of this transformation is its
update strategy expression, used to flag individual records for insert, delete, update, or
reject.
Operation

Constant

Numeric Value

Insert

DD_INSERT

Update

DD_UPDATE

Delete

DD_DELETE

Reject

DD_REJECT

125

Creating Update Strategy Transformation


To create an Update Strategy transformation:
1.

In the Mapping Designer, open or create a mapping.

2.

Click the Update Strategy button on the Transformations toolbar.

3.

Click and drag across the area where you want the transformation to appear. When you release the
mouse button, a new Update Strategy transformation appears.

4.

Choose Layout-Link Columns.

5.

Click and drag all the ports from another transformation representing data you want to pass through
the Update Strategy transformation. In the Update Strategy transformation, the Designer creates a
copy of each port you click and drag. The Designer also connects the new port to the original port.
Each port in the Update Strategy transformation is a combination input/output port. Normally, you
would select all of the columns destined for a particular target. After they pass through the Update
Strategy transformation, this information is flagged for update, insert, delete, or reject.

6.

Double-click the transformation title bar.

7.

Click Rename, enter a descriptive name, and click OK. The naming convention for Update Strategy
transformations is UPD_TransformationName.

126

Creating Update Strategy Transformation


8.

Click the Properties tab.

9.

Click the button in the Update Strategy Expression field. The Expression Editor appears.

10.

Enter an update strategy expression to flag records as inserts, deletes, updates, or rejects.

11.

Validate the expression and click OK to close the Expression Editor.

12.

Click OK to return to the Designer.

13.

Connect the ports in the Update Strategy transformation to another transformation or a target
instance.

14.

Choose Repository-Save.

127

XML Source Qualifier Transformation


When you add an XML source definition to a mapping, you need to connect it to an XML Source
Qualifier transformation. The XML Source Qualifier represents the data elements that the Informatica
Server reads when it runs a session with XML sources.
You can use the XML Source Qualifier only with an XML source definition. You can link only one XML
Source Qualifier to one XML source definition. An XML Source Qualifier always has one input/output
port for every column in the XML source. When you create an XML Source Qualifier for a source
definition, the Designer automatically links each port in the XML source definition to a port in the XML
Source Qualifier. You cannot remove or edit any of the links. If you remove an XML source definition
from a mapping, the Designer also removes the corresponding XML Source Qualifier.
You can link ports of one group to ports in different transformations to form separate data flows.
However, you cannot link ports from more than one group in an XML Source Qualifier to ports in the
same target transformation.
If you drag columns of more than one group in an XML Source Qualifier to one transformation, the
Designer copies the columns of all the groups to the transformation. However, it links only the ports of
the first group to the corresponding ports of the columns created in the transformation.
A group in an XML Source Qualifier can link to one group in an XML target definition. You can link more
than one group in an XML Source Qualifier to an XML target definition.
You cannot use an XML Source Qualifier in a mapplet.

128

Normalizer Transformation
The Normalizer transformation normalizes records from COBOL and relational sources, allowing you to
organize the data according to your own needs. A Normalizer transformation can appear anywhere in a
data flow when you normalize a relational source. Use a Normalizer transformation instead of the
Source Qualifier transformation when you normalize a COBOL source. When you drag a COBOL
source into the Mapping Designer workspace, the Normalizer transformation automatically appears,
creating input and output ports for every column in the source.
You primarily use the Normalizer transformation with COBOL sources, which are often stored in a
denormalized format. The OCCURS statement in a COBOL file nests multiple records of information in
a single record. Using the Normalizer transformation, you break out repeated data within a record into
separate records. For each new record it creates, the Normalizer transformation generates a unique
identifier. You can use this key value to join the normalized records.
You can also use the Normalizer transformation with relational sources to create multiple rows from a
single row of data.

129

External Procedure Transformation


External Procedure transformations operate in conjunction with procedures you create outside of the
Designer interface to extend PowerMart/PowerCenter functionality.
Although the standard transformations provide you with a wide range of options, there are occasions
when you might want to extend the functionality provided with PowerMart and PowerCenter. For
example, the range of standard transformations (Expression, Stored Procedure, Filter, and so forth)
may not provide the exact functionality you need. If you are an experienced programmer, you may want
to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of
creating the necessary Expression transformations in a mapping.
To obtain this kind of extensibility, you can use the Transformation Exchange (TX) dynamic invocation
interface built into PowerMart and PowerCenter. Using TX, you can create an Informatica External
Procedure transformation and bind it to an external procedure that you have developed. You can bind
External Procedure transformations to two kinds of external procedures:

COM external procedures (available on Windows NT/2000 only)


Informatica external procedures (available on Windows NT/2000 and Solaris, HPUX, and
AIX)

To use TX, you must be an experienced C, C++, or Visual Basic programmer.


You can use multi-threaded code in both external procedures and advanced external procedures.

130

Advanced External Procedure Transformation


Use the Advanced External Procedure transformation to create external transformation applications,
such as sorting and aggregation, which require all input rows to be processed before emitting any
output rows. To support this process, the input and output functions occur separately in Advanced
External Procedure transformation. The advanced external procedure specified in the transformation is
an input function, and is passed only through the input ports. The output function is a separate callback
function provided by Informatica that can be called from the Advanced External Procedure library. The
output callback function is used to pass all the output port values from the Advanced External
Procedure library to the Informatica Server. In contrast, in the External Procedure transformation, an
external procedure function does both input and output, and its parameters consist of all the ports of the
transformation.
Advanced External Procedure transformations are connected transformations. You cannot reference an
Advanced External Procedure transformation in an expression.

131

Differences Between External and Advanced External Procedures

External Procedure

Advanced External Procedure

Single return value: One row in, one row out. Each input has
one or zero output.

Multiple outputs: Multiple rows in, multiple


rows out.

Supports COM and Informatica procedures.

Supports Informatica procedures only.

Passive: Allowed in concatenation data flows.

Active: Not allowed in concatenation data


flows.

Connected or Unconnected. Can be called from an


expression.

Connected only. Cannot be called from an


expression.

132

Mapplets
A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic and can contain as many transformations as you need. You create mapplets in the
Mapplet Designer.
Create a mapplet when you want to use a standardized set of transformation logic in several mappings.
For example, if you have several fact tables that require a series of dimension keys, you can create a
mapplet containing a series of Lookup transformations to find each dimension key. You can then use
the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping.
To create a mapplet, you add, connect, and configure transformations to complete the desired
transformation logic.
After you save a mapplet, you can use it in a mapping to represent the transformations within the
mapplet. When you use a mapplet in a mapping, you use an instance of the mapplet. Like a reusable
transformation, any changes made to the mapplet are automatically inherited by all instances of the
mapplet.

133

Mapplet Input
Data passing through a mapplet comes from a source. Source data for a mapplet can
originate from one of two places:

Sources within the mapplet. Mapplet input can originate from within the mapplet if you
include one or more source definitions in the mapplet. When you use more than one
source definition in a mapplet, you must connect the sources to a single Source Qualifier or
ERP Source Qualifier transformation. When you use the mapplet in a mapping, the
mapplet provides source data for the mapping.

Sources outside the mapplet. Mapplet input can originate from outside a mapplet if you
include an Input transformation to define mapplet input ports. When you use the mapplet in
a mapping, data passes through the mapplet as part of the mapping pipeline.

134

Mapplet Input Using Sources within Mapplet


You can use one or more source definitions in a mapplet to provide source data for the
mapplet. Source definitions can represent either file, relational, or ERP data. When you
include source definitions in a mapplet, you can connect them to one of the following
transformations:

Source Qualifier

ERP Source Qualifier

You cannot connect sources to a Normalizer transformation. You cannot use COBOL, MQ,
or XML source definitions in a mapplet.

135

Mapplet Input Using Sources Outside Mapplet


You can connect a mapplet to sources in a mapping by creating mapplet input ports. To create mapplet
input ports, you add an Input transformation to the mapplet. Each port in the Input transformation
connected to another transformation in the mapplet becomes a mapplet input port.
When you use an Input transformation in a mapplet, you must connect at least one port in the Input
transformation to another transformation in the mapplet. You cannot connect ports in an Input
transformation directly to an Output transformation.
You can connect an Input transformation to multiple transformations in a mapplet. However, you can
connect each port in the Input transformation to only one transformation in the mapplet. For example,
you can connect one port in an Input transformation to a Lookup transformation and a different port to
an Expression transformation. You cannot connect the same port to both the Lookup transformation and
the Expression transformation.
When you use the mapplet in a mapping, the Designer displays all available input ports below the Input
transformation name. You do not have to use all mapplet input ports in each mapping, but you must use
at least one.

136

Mapplet Output
To pass data out of a mapplet, you create mapplet output ports. To create mapplet output ports, you add
Output transformations to the mapplet. Each port in an Output transformation connected to another
transformation in the mapplet becomes a mapplet output port. Each mapplet must contain at least one
Output transformation, and at least one port in the Output transformation must be connected within the
mapplet.
Each Output transformation in a mapplet represents a group of mapplet output ports, or output group.
Each output group can pass data to a single pipeline in the mapping. To pass data from a mapplet to
more than one pipeline, create an Output transformation for each pipeline.
When you use a mapplet in a mapping, you connect ports in each output group to different pipelines.
You do not have to use all mapplet output ports in a mapping, but you must use at least one.

137

Creating Cubes and Dimensions


The Warehouse Designer provides an interface to let you create and edit cubes
and dimensions.
Multi-dimensional metadata refers to the logical organization of data used for
analysis in OLAP applications. This logical organization is generally specialized
for the most efficient data representation and access by end users of the OLAP
application.

138

Creating a Dimension
Before you can create a cube, you need to create dimensions. Complete each of the
following steps to create a dimension:
1.
2.
3.
4.

Enter a dimension description.


Add levels to the dimension.
Add hierarchies to the dimension.
Add level instances to the hierarchies.

139

Step 1: Creating a Dimension

1.
2.

3.

4.

In the Warehouse Designer, choose Targets-Create/Edit Dimension. The Dimension Editor


displays.
Select Add Dimension.

Enter the following information:

Name. Dimension names must be unique in a folder.

Description.

Database type. The database type of a dimension must match the database type of the
cube. Note: You cannot change the database type once you create the dimension.
Click OK.

140

Step 2: Add Levels to the Dimension

After you create the dimension, add as many levels as needed. Levels hold the properties necessary to
create target tables.
1.
In the Dimension Editor select Levels and click Add Level.

141

Step 2: Add Levels to the Dimension

142

Step 2: Add Levels to the Dimension

3.

Click Level Properties.

4.

Click the Import from Source Fields button .

143

Step 2: Add Levels to the Dimension

5.

Select a source table from which you want to copy columns to the level. The columns display in the
Source Fields section.

144

Step 3: Add Hierarchies to the Dimension

1.

In the Dimension Editor, select Hierarchies.

2.

Click Add Hierarchy.

145

Step 3: Add Hierarchies to the Dimension

3.

Enter a hierarchy name, description, and select normalized or non-normalized .

Normalized cubes restrict redundant data.


Non-normalized cubes allow for redundant data, which increases speed for retrieving data.

146

Step 4: Add Levels to Hierarchy


After you create a hierarchy, you add levels to it. You can have only one root level in a hierarchy.
To add a level to a hierarchy:
1.
From the Dimension Editor, drill down to view the levels in the dimension.
2.
Drag the level you want to define as the root level in the hierarchy.

147

Step 4: Add Levels to Hierarchy

3.

Enter a target table name and description of the target table.

148

Creating a Cube: Step 1

After you create dimensions, you can create a cube.


To create a cube:
1.

From the Warehouse Designer, choose Targets-Create Cube.

149

Creating a Cube: Step 1

2.

Enter the following information:


o

Cube name. The cube name must be unique in a folder.

Cube type: Normalized or Non-normalized. Normalized dimensions must have a


normalized cube. Likewise, non-normalized dimensions must have a nonnormalized cube.

Database type. The database type for the cube must match the database type for
the dimensions in the cube.

3.

Click Next.

150

Creating a Cube: Step 2


4.

Specify the dimensions and hierarchies to include in the cube.

151

Creating a Cube: Step 3

Add measures to the cube.

152

Creating a Cube: Step 4

Add a name for the fact table.

153

Viewing Metadata for Cubes and Dimensions

You can view the metadata for cubes and dimensions in the Repository Manager.
To view cube or dimension metadata:

In the Repository Manager, open a folder.


Drill down to the cube or dimension you want to analyze.
The Repository Manager displays the metadata for each object.

154

Mapping Wizards

The Designer provides two mapping wizards to help you create mappings quickly and easily. Both
wizards are designed to create mappings for loading and maintaining star schemas, a series of
dimensions related to a central fact table. You can, however, use the generated mappings to load
other types of targets.
You choose a different wizard and different options in each wizard based on the type of target you
want to load and the way you want to handle historical data in the target:

Getting Started Wizard. Creates mappings to load static fact and dimension tables, as
well as slowly growing dimension tables.

Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing


dimension tables based on the amount of historical dimension data you want to keep and
the method you choose to handle historical dimension data.

After using a mapping wizard, you can edit the generated mapping to further customize it.

155

Using Getting Started Wizards

The Getting Started Wizard creates mappings to load static fact and dimension tables, as well as
slowly growing dimension tables.
The Getting Started Wizard can create two types of mappings:

Simple Pass Through. Loads a static fact or dimension table by inserting all rows. Use
this mapping when you want to drop all existing data from your table before loading new
data.

Slowly Growing Target. Loads a slowly growing fact or dimension table by inserting new
rows. Use this mapping to load new data when existing data does not require updates.

156

Using Slowly Changing Dimension Wizards


The Slowly Changing Dimensions Wizard creates mappings to load slowly changing dimension tables:

Type 1 Dimension mapping. Loads a slowly changing dimension table by inserting new
dimensions and overwriting existing dimensions. Use this mapping when you do not want a
history of previous dimension data.

Type 2 Dimension/Version Data mapping. Loads a slowly changing dimension table by


inserting new and changed dimensions using a version number and incremented primary key to
track changes. Use this mapping when you want to keep a full history of dimension data and to
track the progression of changes.

Type 2 Dimension/Flag Current mapping. Loads a slowly changing dimension table by


inserting new and changed dimensions using a flag to mark current dimension data and an
incremented primary key to track changes. Use this mapping when you want to keep a full history
of dimension data, tracking the progression of changes while flagging only the current
dimension.

Type 2 Dimension/Effective Date Range mapping. Loads a slowly changing dimension table
by inserting new and changed dimensions using a date range to define current dimension data.
Use this mapping when you want to keep a full history of dimension data, tracking changes with
an exact effective date range.

Type 3 Dimension mapping. Loads a slowly changing dimension table by inserting new
dimensions and updating values in existing dimensions. Use this mapping when you want to
keep the current and previous dimension values in your dimension table.

157

Steps for Creating Slowly Growing Target Mapping


To create a Slowly Growing Target mapping:
1.
In the Mapping Designer, choose Mappings-Wizards-Getting Started.
2.
Enter a mapping name and select Slowly Growing Target, and click Next. The naming
convention for mapping names is mMappingName. .

158

Steps for Creating Slowly Growing Target Mapping


Select a source definition to be used in the mapping.
All available source definitions appear in the Select Source Table list. This list
includes shortcuts, flat file, relational, and ERP sources.

159

Steps for Creating Slowly Growing Target Mapping

Enter a name for the mapping target table. Click Next. The naming convention for target
definitions is T_TARGET_NAME.

Select the column or columns from the Target Table Fields list that you want the
Informatica Server to use to look up data in the target table. Click Add.
The wizard adds selected columns to the Logical Key Fields list.
Tip: The columns you select should be a key column in the source.
When you run the session, the Informatica Server performs a lookup on existing target
data. The Informatica Server returns target data when Logical Key Fields columns match
corresponding target columns.
To remove a column from Logical Key Fields, select the column and click Remove.
Note: The Fields to Compare for Changes field is disabled for the Slowly Growing
Targets mapping.

160

Steps for Creating Slowly Growing Target Mapping

161

Configuring a Slowly Growing Target Session

The Slowly Growing Target mapping flags new source rows, and then inserts them to the target
with a new primary key. The mapping uses an Update Strategy transformation to indicate new
rows must be inserted. Therefore, when you create a session for the mapping, configure the
session as follows:

For the source, set Treat Rows As to Data Driven.

To ensure rows are inserted into the target properly, click the Target Options button to
access the Targets dialog box and select Insert.

162

Understanding the Transformations


Transformation Name

Transformation
Type

SQ_SourceName

Source Qualifier
or ERP Source
Qualifier

Selects all rows from the source you choose in the Mapping Wizard.

LKP_GetData

Lookup

Caches the existing target table.


Compares a logical key column in the source against the corresponding key
column in the target.

EXP_DetectChanges

Expression

Uses the following expression to flag source rows that have no matching key
in the target (indicating they are new):
IIF(ISNULL(PM_PRIMARYKEY),TRUE,FALSE)
Populates the NewFlag field with the results.
Passes all rows to FIL_InsertNewRecord.

FIL_InsertNewRecord

Filter

Uses the following filter condition to filter out any rows from
EXP_DetectChanges that are not marked new (TRUE): NewFlag. Passes new
rows to UPD_ForceInserts.

UPD_ForceInserts

Update Strategy

Uses DD_INSERT to insert rows to the target.

SEQ_GenerateKeys

Sequence
Generator

Generates a value for each new row written to the target, incrementing values
by 1. Passes values to the target to populate the PM_PRIMARYKEY column.

T_TargetName

Target Definition

Instance of the target definition for new rows to be inserted into the target.

Description

163

Creating a Type 1 Dimension Mapping

The Type 1 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to
existing dimensions are updated in the target by overwriting the existing dimension. In the
Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you
do not need to keep any previous versions of dimensions in the table.

164

Understanding the Mapping

The Type 1 Dimension mapping performs the following:

Selects all rows


Caches the existing target as a lookup table
Compares logical key columns in the source against corresponding columns in the
target lookup table
Compares source columns against corresponding target columns if key columns
match
Flags new rows and changed rows
Creates two data flows: one for new rows, one for changed rows
Generates a primary key for new rows
Inserts new rows to the target
Updates changed rows in the target, overwriting existing rows

165

Steps for Creating Type 1 Dimension Mapping


To create a Type 1 Dimension mapping:

In the Mapping Designer, choose Mappings-Wizards-Slowly Changing Dimension.


Enter a mapping name and select Type 1 Dimension, and click Next. The naming
convention for mappings is mMappingName.

166

Steps for Creating Type 1 Dimension Mapping

Select a source definition to be used by the mapping. All available source definitions appear in
the Select Source Table list. This list includes shortcuts, flat file, relational, and ERP sources .

167

Steps for Creating Type 1 Dimension Mapping

Enter a name for the mapping target table. Click Next. The naming convention for target
definitions is T_TARGET_NAME.

Select the column or columns you want to use as a lookup condition from the Target Table
Fields list and click Add.
The wizard adds selected columns to the Logical Key Fields list.
Tip: The columns you select should be a key column in the source.
When you run the session, the Informatica Server performs a lookup on existing target data.
The Informatica Server returns target data when Logical Key Fields columns match
corresponding target columns.
To remove a column from Logical Key Fields, select the column and click Remove.

168

Steps for Creating Type 1 Dimension Mapping

169

Configuring a Type 1 Dimension Session

The Type 1 Dimension mapping inserts new rows with a new primary key and updates
existing rows. When you create a session for the mapping, configure the session as
follows:

For the source, set Treat Rows As to Data Driven and select the source
database.

Select the target database. Then to ensure the Informatica Server loads
rows to the target properly, click the Target Options button. Select Insert and
Update (as Update).

170

Creating a Type 2 Dimension Mapping / Version Data Mapping

The Type 2 Dimension/Version Data mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are
tracked in the target table by versioning the primary key and creating a version number for
each dimension in the table. In the Type 2 Dimension/Version Data target, the current
version of a dimension has the highest version number and the highest incremented primary
key of the dimension.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension
table when you want to keep a full history of dimension data in the table. Version numbers
and versioned primary keys track the order of changes to each dimension.
When you use this option, the Designer creates two additional fields in the target:

PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.

PM_VERSION_NUMBER. The Informatica Server generates a version number for


each row written to the target.

171

Handling Keys

In a Type 2 Dimension/Version Data mapping, the Informatica Server generates a new


primary key value for each new dimension it inserts into the target. An Expression
transformation increments key values by 1,000 for new dimensions.
When updating an existing dimension, the Informatica Server increments the existing primary
key by 1.
For example, the Informatica Server inserts the following new row with a key value of 65,000
since this is the sixty-fifth dimension in the table.
PM_PRIMARYKEY ITEM STYLES
65000
Sandal

The next time you run the session, the same item has a different number of styles. The
Informatica Server creates a new row with updated style information and increases the
existing key by 1 to create a new key of 65,001. Both rows exist in the target, but the row
with the higher key version contains current dimension data.
PM_PRIMARYKEY ITEM STYLES
65000
Sandal
65001
Sandal

5
14

172

Numbering Versions

In addition to versioning the primary key, the Informatica Server generates a


matching version number for each row inserted into the target. Version numbers
correspond to the final digit in the primary key. New dimensions have a version
number of 0.
For example, in the data below, the versions are 0, 1, and 2. The highest version
number contains the current dimension data.
PM_PRIMARYKEY
ITEM
PM_VERSION_NUMBER
65000
Sandal
0
65001
Sandal
1
65002
Sandal
2

STYLES
5
14
17

173

Understanding the Mapping

The Type 2 Dimension/Version Data mapping performs the following:

Selects all rows

Caches the existing target as a lookup table

Compares logical key columns in the source against corresponding columns in the
target lookup table

Compares source columns against corresponding target columns if key columns


match

Flags new rows and changed rows

Creates two data flows: one for new rows, one for changed rows

Generates a primary key and version number for new rows

Inserts new rows to the target

Increments the primary key and version number for changed rows

Inserts changed rows in the target

174

Steps for Creating a Type 2 Dimension Mapping / Version Data


Mapping

To create a Type 2 Dimension/Version Data mapping:

In the Mapping Designer, choose Mappings-Wizards-Slowly Changing


Dimensions...

Enter a mapping name and select Type 2 Dimension. Click Next. The naming
convention for mappings is mMappingName.

175

Steps for Creating a Type 2 Dimension Mapping / Version Data


Mapping

Select a source definition to be used by the mapping.


All available source definitions appear in the Select Source Table list. This list includes shortcuts,
flat file, relational, and ERP sources.

176

Steps for Creating a Type 2 Dimension Mapping / Version Data


Mapping

Select the column or columns you want to use as a lookup condition from the Target Table Fields
list and click Add.

177

Steps for Creating a Type 2 Dimension Mapping / Version Data Mapping

Click Next. Select Keep Version Number in Separate Column.

178

Configuring a Type 2 Dimension Mapping / Version Data Mapping

The Type 2 Dimension/Version Data mapping inserts both new and updated rows with
a unique primary key. When you create a session for the mapping, configure the
session as follows:
For the source, set Treat Rows As to Data Driven and select the source
database.
To ensure rows are inserted into the target properly, click the Target Options
button to access the Targets dialog box and select Insert.

179

Creating a Type 2 Dimension Mapping / Flag Current Mapping

The Type 2 Dimension/Flag Current mapping filters source rows based on user defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by flagging the current version of each dimension and versioning the primary
key. In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current
flag set to 1 and the highest incremented primary key.
Use the Type 2 Dimension/Flag Current mapping to update a slowly changing dimension table
when you want to keep a full history of dimension data in the table, with the most current data
flagged. Versioned primary keys track the order of changes to each dimension.
When you use this option, the Designer creates two additional fields in the target:

PM_CURRENT_FLAG. The Informatica Server flags the current row 1 and all
previous versions 0.

PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.

180

Understanding the Mapping


The Type 2 Dimension/Flag Current mapping performs the following:

Selects all rows

Caches the existing target as a lookup table

Compares logical key columns in the source against corresponding columns in the
target lookup table

Compares source columns against corresponding target columns if key columns


match

Flags new rows and changed rows

Creates two data flows: one for new rows, one for changed rows

Generates a primary key and current flag for new rows

Inserts new rows to the target

Increments the existing primary key and sets the current flag for changed rows

Inserts changed rows in the target

Updates existing versions of the changed rows in the target, resetting the current
flag to indicate the row is no longer current
181

Steps for Creating a Type 2 Dimension Mapping / Flag Current


Mapping

Select Mark the Current Dimension Record with a Flag.

182

Creating a Type 2 Dimension Mapping / Effective Date Range Mapping

The Type 2 Dimension/Effective Date Range mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by maintaining an effective date range for each version of each dimension in the
target. In the Type 2 Dimension/Effective Date Range target, the current version of a dimension
has a begin date with no corresponding end date.
Use the Type 2 Dimension/Effective Date Range mapping to update a slowly changing dimension
table when you want to keep a full history of dimension data in the table. An effective date range
tracks the chronological history of changes for each dimension.
When you use this option, the Designer creates three additional fields in the target:

PM_BEGIN_DATE. For each new and changed dimension written to the target, the
Informatica Server uses the system date to indicate the start of the effective date range
for the dimension.

PM_END_DATE. For each dimension being updated, the Informatica Server uses the
system date to indicate the end of the effective date range for the dimension.

PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.

183

Understanding the Mapping

The Type 2 Dimension/Effective Date Range mapping performs the following:

Selects all rows

Caches the existing target as a lookup table

Compares logical key columns in the source against corresponding columns in the target
lookup table

Compares source columns against corresponding target columns if key columns match

Flags new rows and changed rows

Creates three data flows: one for new rows, one for changed rows, one for updating
existing rows

Generates a primary key and beginning of the effective date range for new rows

Inserts new rows to the target

Generates a primary key and beginning of the effective date range for changed rows

Inserts changed rows in the target

Updates existing versions of the changed rows in the target, generating the end of the
effective date range to indicate the row is no longer current

184

Steps for Creating a Type 2 Dimension Mapping / Flag Current Mapping

Select Mark the Dimension Records with their Effective Date Range.

185

Creating a Type 3 Dimension Mapping

The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica
Server saves existing data in different columns of the same row and replaces the existing data
with the updates. The Informatica Server optionally enters the system date as a timestamp for
each row it inserts or updates. In the Type 3 Dimension target, each dimension contains current
dimension data.
Use the Type 3 Dimension mapping to update a slowly changing dimension table when you want
to keep only current and previous versions of column data in the table. Both versions of the
specified column or columns are saved in the same row.
When you use this option, the Designer creates additional fields in the target:

PM_PREV_ColumnName. The Designer generates a previous column corresponding


to each column for which you want historical data. The Informatica Server keeps the
previous version of dimension data in these columns.

PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.

PM_EFFECT_DATE. An optional field. The Informatica Server uses the system date to
indicate when it creates or updates a dimension.
186

Understanding the Mapping


The Type 3 Dimension mapping performs the following:

Selects all rows

Caches the existing target as a lookup table

Compares logical key columns in the source against corresponding columns in the target
lookup table

Compares source columns against corresponding target columns if key columns match

Flags new rows and changed rows

Creates two data flows: one for new rows, one for updating changed rows

Generates a primary key and optionally notes the effective date for new rows

Inserts new rows to the target

Writes previous values for each changed row into previous columns and replaces
previous values with updated values

Optionally uses the system date to note the effective date for inserted and updated
values

Updates changed rows in the target


187

Steps for Creating a Type 3 Dimension Mapping

If you want the Informatica Server to timestamp new and changed rows, select Effective Date.
The wizard displays the columns the Informatica Server compares and the name of the column to
hold historic values.

188

Server Architecture

The Informatica Server moves data from sources to targets based on mapping and session
metadata stored in a repository.
Session Process
The Informatica Server uses both process memory and system shared memory to perform these
tasks. It runs as a daemon on UNIX and as a service on Windows NT/2000. The Informatica
Server uses the following processes to run a session:

The Load Manager process. Starts the session, creates the DTM process, and sends
post-session email when the session completes.

The DTM process. Creates threads to initialize the session, read, write, and transform
data, and handle pre- and post-session operations.

189

Load Manager Process

The Load Manager is the primary Informatica Server process. It performs the following tasks:

Manages session and batch scheduling.

Locks the session and reads session properties.

Reads the parameter file.

Expands the server and session variables and parameters.

Verifies permissions and privileges.

Validates source and target code pages.

Creates the session log file.

Creates the Data Transformation Manager (DTM) process, which executes the session.

190

Data Transformation Manager (DTM) Process

The DTM process is the second process associated with a session run. The primary purpose of
the DTM process is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divides it into buffers. This is also known
as buffer memory. The default memory allocation is 12,000,000 bytes. It creates the main thread,
which is called the master thread. The master thread creates and manages all other threads.
Thread Type

Description

Master Thread

Main thread of the DTM process. Creates and manages all other threads. Handles stop and
abort requests from the Load Manager.

Mapping Thread

One thread for each session. Fetches session and mapping information. Compiles the
mapping. Cleans up after session execution.

Pre- and PostSession Threads

One thread each to perform pre- and post-session operations.

Reader Thread

One thread for each partition for each source pipeline. Reads sources. Relational sources
use relational threads, and file sources use file threads.

Writer Thread

One thread for each partition, if a target exists in the source pipeline. Writes to targets.

Transformation
Thread

One or more transformation threads for each partition.

191

Running a Session

When the Informatica Server runs a session, it performs the following tasks as configured in the
session properties:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.

Load Manager locks the session and reads session properties.


Load Manager reads the parameter file.
Load Manager expands the server and session variables and parameters.
Load Manager verifies permissions and privileges.
Load Manager validates source and target code pages.
Load Manager creates the session log file.
Load Manager creates the DTM process.
DTM process allocates DTM process memory.
DTM initializes the session and fetches the mapping.
DTM executes pre-session commands and procedures.
DTM creates reader, transformation, and writer threads for each source pipeline. If the
pipeline is partitioned, it creates a set of threads for each partition.
DTM executes post-session commands and procedures.
DTM writes historical incremental aggregation and lookup data to disk, and it writes
persisted sequence values and mapping variables to the repository.
Load Manager sends post-session email.

192

System Resources

The Informatica Server uses the following system resources:

CPU

Shared memory

Buffer memory

Cache Memory
The DTM process creates in-memory index and data caches to temporarily store data used by the
following transformations:

Aggregator transformation (without sorted input)

Rank transformation

Joiner transformation

Lookup transformation (with caching enabled)

193

Output Files and Caches

The Informatica Server creates the following output files:

Informatica Server Log

Session Log File

Session Details File

Performance Detail File

Reject Files

Control File

Post-Session Email

Output File

Cache Files

194

Cache Files

The Informatica Server writes to the index and data cache files during the session in the following
cases:

The mapping contains one or more Aggregator transformations, and the session is
configured for incremental aggregation.

The mapping contains a Lookup transformation that is configured to use a persistent


lookup cache, and the Informatica Server runs the session for the first time.

The mapping contains a Lookup transformation that is configured to initialize the


persistent lookup cache.

The DTM runs out of cache memory and pages to the local cache files. The DTM may
create multiple files when processing large amounts of data. The session fails if the
local directory runs out of disk space.

After the session completes, the DTM generally deletes the overflow index and data files. It does
not delete the cache files under the following circumstances:

The session is configured to perform incremental aggregation.

The session is configured with a persistent lookup cache.

195

Configuring Server Manager

You can configure the following settings in the Server Manager:

Configure Server Manager display options. You can configure the display options
such as grouping sessions or docking and undocking windows.

Register Informatica Servers. Before you can start an Informatica Server, you must
register it with the repository.

Create source and target database connections. Create connections to each


source and target database. You must create connections to a database before you
can create a session that accesses the database.

Create FTP connections. After you create FTP connections, you can configure a
session to use FTP to access source or target files

Create external loader connections. Create connections to Oracle, Sybase IQ, and
Teradata external loaders. You must create these connections before you can
configure a session to use an external loader.

196

Server Manager Window

197

Server Variables

Server Variable

Required/Optional

Description

$PMRootDir

Required

A root directory to be used by any or all other server variables. Informatica


recommends you use the Server installation directory as the root directory.

$PMSessionLogDir

Required

Default directory session logs. Defaults to $PMRootDir/SessLogs.

$PMBadFileDir

Required

Default directory for reject files. Defaults to $PMRootDir/BadFiles

$PMCacheDir

Required

Default directory for the lookup cache, index and data caches, and index and
data files. Defaults to $PMRootDir/Cache. To avoid performance problems,
always use a drive local to the Informatica Server for the cache directory. Do
not use a mapped or mounted drive for cache files.

$PMTargetFileDir

Required

Default directory for target files. Defaults to $PMRootDir/TgtFiles

$PMSourceFileDir

Required

Default directory for source files. Defaults to $PMRootDir/SrcFiles

$PMExtProcDir

Required

Default directory for external procedures. Defaults to $PMRootDir/ExtProc

$PMTempDir

Required

Default directory for temporary files. Defaults to $PMRootDir/Temp.

198

Server Variables

Optional

Email address to receive post-session email when the session


completes successfully. Use to address post-session email.

$PMFailureEmailUser

Optional

Email address to receive post-session email when the session


fails. Use to address post-session email. The default value is an
empty string..

$PMSessionLogCount

Optional

Number of session logs the Informatica Server archives for the


session. Defaults to 0. Use to archive session logs..

$PMSessionErrorThreshhold

Optional

Number of errors the Informatica Server allows before failing the


session. Defaults to 0. Use to configure the Stop On option in the
session property sheet

$PMSuccessEmailUser

199

Working with Sessions

A session is a set of instructions that tells the Informatica Server how and when to
move data from sources to targets.

You create and maintain sessions in the Server Manager.

When you create a session, you enter general information such as the session
name, session schedule, and the Informatica Server to run the session.

You can also select options to execute pre-session shell commands, send postsession email, and FTP source and target files.

Using session properties, you can also override parameters established in the
mapping, such as source and target location, source and target type, error tracing
levels, and transformation attributes.

200

Creating a Session

To create a new session, you must enter the following information:

Mapping used for the session

Session name, which must be unique among all sessions in a given folder

Source type

Update strategy for writing to targets

Target type

Schedule for the session to run

Server on which you want the session to run

201

Using Session Wizard

Use the Session Wizard to create a session for a valid mapping. The Session Wizard has the
following pages, and each of those pages has multiple dialog boxes where you enter session
properties:

General page. Enter source and target information and performance configuration.

Sources page. Enter source information for heterogeneous sessions.

Time page. Schedule the session.

Log Files page. Enter log file and error handling information.

Transformation page. Override transformation properties.

Partition page. Configure the session for partitioning.

202

Starting a Session

You can start a session using

Server Manager

PMCMD Command

203

Monitoring a Session

The Server Manager allows you to monitor sessions on an Informatica Server. When monitoring a
session, you can use information provided through the Server Manager to troubleshoot sessions
and improve session performance.
When you poll the Informatica Server, it indicates the following types of session status in the
Monitor window:

Initializing. Indicates that the session is initializing.

Scheduled. Indicates that the session is scheduled.

Running. Indicates that the session is running.

Completed. Indicates that the session completed successfully.

Failed. Indicates that the session has failed.

204

Monitor Window Contents


Column
Name

Description

Session
Name

Session name.

Server Name

Server running the session.

Top Level
Batch

Top or outermost batch containing the session. If the session is not a part of a nested batch, this
field displays the folder name.

Batch

Batch containing the session, if the session is batched. If the session is a standalone session,
this field displays the folder name.

Status

Session status.

Start Time

Time the session started.

Completion
Time

Time the session completed.

First Error

First error that occurred in the session.

Mapping
Name

Mapping used in the session.

Session Run
Mode

Indicates whether the session is a regular or debug session.

User Name

User who started the session.

205

Monitoring Session Details

When you run a session, the Server Manager creates session details that provide load statistics
for each target in the mapping. You can view session details during the session or after the
session completes.
Session
Detail

Description

Table Name

Name of target table. If you have multiple instances of a target, this field shows both the target
instance name and the table name. The target instance display format is Table Name:Instance
Name.

Loaded

Number of rows written to the target.

Failed

Number of rows rejected by the target.

Read
Throughput

Rate at which the Informatica Server read rows from the source (bytes/sec).

Write
Throughput

Rate at which the Informatica Server wrote data into the target (rows/sec).

Current
Message

The most recent error message written to the session log. If you view details after the session
completes, this field displays the last error message.

206

Stopping a Session

To stop a session in the Server Manager:

In the Server Manager Navigator, select the session you want to stop.

To stop a session running against the Informatica Server configured in the session
properties, choose Server Requests-Stop or use the Stop button on the toolbar.

To stop a session running against an Informatica Server other than the one configured
in the session properties, use the Stop button on the toolbar to select the Informatica
Server running the session.

To abort a session in the Server Manager:

In the Server Manager Navigator, select the session you want to abort.

To abort a session running against the Informatica Server configured in the session
properties, choose Server-Requests-Abort.

207

Managing Batches

Batches provide a way to group sessions for either serial or parallel execution by the Informatica
Server. There are two types of batches:

Sequential. Runs sessions one after the other.

Concurrent. Runs sessions at the same time.

Nesting Batches
Each batch can contain any number of sessions or other batches. You can nest batches several
levels deep, defining batches within batches. Nested batches are useful when you want to control
a complex series of sessions that must run sequentially or concurrently.
Scheduling
When you place sessions in a batch, the batch schedule overrides the session schedule by
default. However, you can configure a batched session to run on its own schedule by selecting the
Use Absolute Time session option.

208

Recovering a Batch

When a session or sessions in a batch fail, you can perform recovery to complete the batch. The
steps you take vary depending on the type of batch:

Sequential batch. If the batch is sequential, you can recover data from the session
that failed and run the remaining sessions in the batch.

Concurrent batch. If a session within in a concurrent batch fails, but the rest of the
sessions complete successfully, you can recover data from the failed session targets to
complete the batch. However, if all sessions in a concurrent batch fail, you might want
to truncate all targets and run the batch again.

209

Using PMCMD

You can use the command line program pmcmd to communicate with the Informatica Server. This
does not replace the Server Manager, since there are many tasks that you can perform only with
the Server Manager.
You can perform the following actions with pmcmd:

Determine if the Informatica Server is running.

Start sessions and batches.

Stop sessions and batches.

Recover sessions.

Stop the Informatica Server.

pmcmd returns zero on success and non-zero on failure.

210

Parameters for PMCMD command

You need the following information to use pmcmd:

Repository username. This can be configured optionally as an environment variable.

Repository password. This can be configured optionally as an environment variable.

Connection type. The type of connection from the client machine to the Informatica
Server (TCP/IP or IPX/SPX).

Port or connection. The TCP/IP port number or IPX/SPX connection (Windows NT/2000
only) to the Informatica Server.

Host name. The machine hosting the Informatica Server (if running pmcmd from a remote
machine through a TCP/IP connection).

Session or batch name. The names of any sessions or batches you want to start or stop.

Folder name. The folder names for those sessions or batches (if their names are not
unique in the repository).

Parameter file. The directory and name of the parameter file you want the Informatica
Server to use with the session or batch.

211

Pinging Informatica Server

To determine if the Informatica Server is running,


Use the following syntax to ping the Informatica Server on a Windows NT/2000 system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
{[TCP/IP:][hostname:]portno | IPX/SPX:ipx/spx_address}
Use the following syntax to ping the Informatica Server on a UNIX system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
[hostname:]portno

212

Ping Return Values

Return Value

Description

The Informatica Server is running.

The Informatica Server is down, or pmcmd cannot connect to the Informatica Server.
The TCP/IP host name or port number, or IPX/SPX address (if applicable) may be
incorrect, or a network problem occurred.

An internal pmcmd error occurred. Contact Informatica Technical Support.

Informatica Server timed out while waiting for the request. Try sending it again.

213

Starting Sessions / Batches Using PMCMD

Use the following syntax to start a session or batch on a Windows NT/2000 system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var } {[TCP/IP:]
[hostname:]portno | IPX/SPX: ipx/spx_address} [folder_name:] {session_name |
batch_name} [:pf=param_file] session_flag wait_flag

Use the following syntax to start a session or batch on a UNIX system:


pmcmd start {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno [folder_name:]{session_name | batch_name}
[:pf=param_file] session_flag wait_flag

214

PMCMD Return Values


Return
Value

Description

If pmcmd was called in wait mode (wait flag = 1), 0 indicates the session or batch ran successfully.
If pmcmd was not called in wait mode (wait flag = 0), 0 indicates the request to start the session was
successfully transmitted to the Informatica Server, and it acknowledged the request.

The Informatica Server is down, or pmcmd cannot connect to the Informatica Server. The TCP/IP host name
or port number, or IPX/SPX address (if applicable) may be incorrect, or a network problem occurred.

The specified session or batch name does not exist. Or, if you specified a folder name, the folder does not
contain the specified session or batch.

An error occurred in starting or running the session or batch.


This return value may appear if you run pmcmd in non-wait mode (wait_flag = 0) and the Informatica Server
returns a negative acknowledgment. If this is not the case, look for more information in the server error log.

Usage error: The wrong parameters were passed to pmcmd.

An internal pmcmd error occurred. Contact Informatica Technical Support.

You used an invalid username or password.

You do not have the appropriate permissions or privileges to perform this action.

Informatica Server timed out while waiting for the request. Try sending it again.

215

PMCMD Return Values

10

(This error occurs only under extremely rare conditions.)


pmcmd successfully started the session or batch in wait mode (wait_flag =1). However, while checking the
status of the running session or batch, pmcmd attempted to communicate with the Informatica Server 100
consecutive times, and the Informatica Server timed out trying to receive the request.
If pmcmd returns from wait mode with this error code, the session/batch may or may not be running.
If you use pmcmd in wait mode to see if the session or batch completed successfully, use the Server Manager
to check its status or open the server error log file.
If you use pmcmd in wait mode as a series of related commands, you may need to work around this return
value.

13

The username environment variable [variable name] is not defined.

14

The password environment variable [variable name] is not defined.

15

The username environment variable is missing.

16

The password environment variable is missing.

17

Parameter file does not exist.

18

The Informatica Server found the parameter file, but experienced errors expanding the start values for the
session parameters. The parameter file may not have the start values for the session parameters.

216

Stopping Sessions / Batches Using PMCMD

Use the following syntax to stop a session or batch on a Windows NT/2000 system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var } {[TCP/IP:]
[hostname:]portno | IPX/SPX: ipx/spx_address} [folder_name:] {session_name |
batch_name} [:pf=param_file] session_flag wait_flag

Use the following syntax to stop a session or batch on a UNIX system:


pmcmd stop {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno [folder_name:]{session_name | batch_name}
[:pf=param_file] session_flag wait_flag

217

Recovering Sessions Using PMCMD

Use pmcmd to recover a standalone session. You cannot use pmcmd to recover a
session in a batch.
Use the following syntax to recover a standalone session on a Windows NT/2000
system:
pmcmd startrecovery {user_name | %user_env_var}
{password | %password_env_var}
{[TCP/IP:][hostname:]portno | IPX/SPX:ipx/spx_address}
[folder_name:]session_name [:pf=param_file] wait_flag

Use the following syntax to recover a standalone session on a UNIX system:


pmcmd startrecovery {user_name | %user_env_var}
{password | %password_env_var} [hostname:]portno
[folder_name:]session_name [:pf=param_file] wait_flag

218

Stopping the Server Using PMCMD

Use the following syntax to stop Informatica Server on a Windows NT/2000 system:
pmcmd stopserver {user_name | %user_env_var}
{password | %password_env_var}
{[TCP/IP:][hostname:]portno | IPX/SPX:ipx/spx_address}

Use the following syntax to stop Informatica Server on a UNIX system:


pmcmd stopserver {user_name | %user_env_var}
{password | %password_env_var} [hostname:]portno

219

PMCMD Stop Server Return Values


Return
Value

Description

The Informatica Server successfully stopped.

The Informatica Server is down, or pmcmd cannot connect to the Informatica Server. The TCP/IP host name
or port number, or IPX/SPX address (if applicable) may be incorrect, or a network problem occurred.

An internal pmcmd error occurred. Contact Informatica Technical Support.

An error occurred while stopping the Informatica Server. Contact Informatica Technical Support.

You used an invalid username or password.

You do not have the appropriate permissions or privileges to perform this action.

Server timed out while waiting for the request. Try sending it again.

13

The username environment variable [variable name] is not defined.

14

The password environment variable [variable name] is not defined.

15

The username environment variable is missing.

16

The password environment variable is missing.

220

Reject Loading

During a session, the Informatica Server creates a reject file for each target instance in the
mapping. If the writer or the target rejects data, the Informatica Server writes the rejected row into
the reject file.
The reject file and session log contain information that helps you determine the cause of the
reject.
You can correct reject files and load them to relational targets using the Informatica reject loader
utility. The reject loader also creates another reject file for the data that the writer or target reject
during the reject loading.
Each time you run a session, the Informatica Server appends rejected data to the reject file.
Complete the following tasks to load reject data into the target:

Locate the reject file.

Correct bad data.

Run the reject loader utility.

221

Reading Reject Files

Reject files contain rows of data rejected by the writer or the target database. Though the
Informatica Server writes the entire row in the reject file, the problem generally centers on one
column within the row. To help you determine which column caused the row to be rejected, the
Informatica Server adds row and column indicators to give you more information about each
column:

Row indicator. The first column in each row of the reject file is the row indicator. The
numeric indicator tells whether the row was marked for insert, update, delete, or reject.

Column indicator. Column indicators appear after every column of data. The alphabetical
character indicators tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators:
3,D,1,D,,D,0,D,1094945255,D,0.00,D,-0.00,D
0,D,1,D,April,D,1997,D,1,D,-1364.22,D,-1364.22,D
0,D,1,D,April,D,2000,D,1,D,2560974.96,D,2560974.96,D
3,D,1,D,April,D,2000,D,0,D,0.00,D,0.00,D
0,D,1,D,August,D,1997,D,2,D,2283.76,D,4567.53,D
0,D,3,D,December,D,1999,D,1,D,273825.03,D,273825.03,D
0,D,1,D,September,D,1997,D,1,D,0.00,D,0.00,D

222

Reading Reject Files

Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator tells
the writer what to do with the row of data.

Row Indicator

Meaning

Rejected By

Insert

Writer or target

Update

Writer or target

Delete

Writer or target

Reject

Writer

223

Reading Reject Files

Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and another
column indicator. Column indicators appear after every column of data and define the type of the
data preceding it.
Column Indicator

Type of data

Writer Treats As

Valid data.

Good data. Writer passes it to the target


database. The target accepts it unless a
database error occurs, such as finding a
duplicate key.

Overflow. Numeric data exceeded


the specified precision or scale for
the column.

Bad data, if you configured the mapping


target to reject overflow or truncated data.

Null. The column contains a null


value.

Good data. Writer passes it to the target,


which rejects it if the target database does
not accept null values.

Truncated. String data exceeded a


specified precision for the column,
so the Informatica Server truncated
it.

Bad data, if you configured the mapping


target to reject overflow or truncated data.

224

Running Reject Loading Session

After you correct the reject file and rename it to reject_file.in, you can use the reject loader to send
those files through the writer to the target database.
Use the reject loader utility from the command line to load rejected files into target tables. The
syntax for reject loading differs on UNIX and Windows NT/2000 platforms.
Use the following syntax for UNIX:
pmrejldr pmserver.cfg [folder_name:]session_name
Use the following syntax for Windows NT/2000:
pmrejldr [folder_name:]session_name

225

Commit Points

A commit interval is the interval at which the Informatica Server commits data to relational targets
during a session. You can choose between the following types of commit interval:

Target-based commit. The Informatica Server commits data based on the number of
target rows and the key constraints on the target table. The commit point also depends on
the buffer block size and the commit interval.

Source-based commit. The Informatica Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session
properties.

226

Target Based Commit


During a target-based commit session, the Informatica Server continues to fill the writer buffer
after it reaches the commit interval. When the buffer block is filled, the Informatica Server issues a
commit command. As a result, the amount of data committed at the commit point generally
exceeds the commit interval.
For example, a session is configured with target-based commit interval of 10,000. The writer
buffers fill every 7,500 rows. When the Informatica Server reaches the commit interval of 10,000, it
continues processing data until the writer buffer is filled. The second buffer fills at 15,000 rows,
and the Informatica Server issues a commit to the target. If the session completes successfully,
the Informatica Server issues commits after 15,000, 22,500, 30,000, and 40,000 rows.

227

Source Based Commit


During a source-based commit session, the Informatica Server commits data to the target based
on the number of rows from an active source in a single pipeline. These rows are referred to as
source rows. An active source can be any of the following active transformations:

Advanced External Procedure

Source Qualifier

Normalizer

Aggregator

Joiner

Rank

Mapplet, if it contains one of the above transformations

Note: Although the Filter, Router, and Update Strategy transformations are active transformations,
the Informatica Server does not use them as active sources in a source-based commit session.
The Informatica Server generates a commit row from the active source at every commit interval.
When each target in the pipeline receives the commit row, the Informatica Server performs the
commit.
The number of rows held in the writer buffers does not affect the commit point for a source-based
commit session.
228

Performance Tuning
The most common performance bottleneck occurs when the Informatica Server writes to a target
database. You can identify performance bottlenecks by the following methods:

Running test sessions. You can configure a test session to read from a flat file source or
to write to a flat file target to identify source and target bottlenecks.

Studying performance details. You can create a set of information called performance
details to identify session bottlenecks. Performance details provide information such as
buffer input and output efficiency.

Monitoring system performance. You can use system monitoring tools to view percent
CPU usage, I/O waits, and paging to identify system bottlenecks.

229

Identifying Performance Bottleneck


Performance bottlenecks can occur in the source and target databases, the mapping,
the session, and the system. Generally, you should look for performance bottlenecks
in the following order:
1.

Target

2.

Source

3.

Mapping

4.

Session

5.

System

230

Identifying Target Bottleneck


The most common performance bottleneck occurs when the Informatica Server writes to a target
database. You can identify target bottlenecks by configuring the session to write to a flat file target.
If the session performance increases significantly when you write to a flat file, you have a target
bottleneck.
Optimizing the Target Database
If your session writes to a flat file target, you can optimize session performance by writing to a flat
file target that is local to the Informatica Server. If your session writes to a relational target,
consider performing the following tasks to increase performance:

Drop indexes and key constraints.

Increase checkpoint intervals.

Use bulk loading.

Use external loading.

Turn off recovery.

Increase database network packet size.

Optimize Oracle target databases

231

Identifying Source Bottleneck


Performance bottlenecks can occur when the Informatica Server reads from a source database. If
your session reads from a flat file source, you probably do not have a source bottleneck. You can
improve session performance by setting the number of bytes the Informatica Server reads per line
if you read from a flat file source.
If the session reads from relational source, you can use

Filter transformation

Read test mapping

Database query
Optimizing the Source Database
If your session reads from a relational source, review the following suggestions for improving
performance:

Optimize the query.

Create tempdb as in-memory database.

Use conditional filters.

Increase database network packet size.

Connect to Oracle databases using IPC protocol.

232

Identifying Mapping Bottleneck


You can identify mapping bottlenecks by using a Filter transformation in the mapping.
If you determine that you do not have a source bottleneck, you can add a Filter transformation in
the mapping before each target definition. Set the filter condition to false so that no data is loaded
into the target tables. If the time it takes to run the new session is the same as the original
session, you have a mapping bottleneck.
You can also identify mapping bottlenecks by using performance details. High errorrows and
rowsinlookupcache counters indicate a mapping bottleneck.
High Rowsinlookupcache Counters
Multiple lookups can slow down the session. You might improve session performance by locating
the largest lookup tables and tuning those lookup expressions.
High Errorrows Counters
Transformation errors impact session performance. If a session has large numbers in any of the
Transformation_errorrows counters, you might improve performance by eliminating the errors.

233

Optimizing a Mapping
Generally, you reduce the number of transformations in the mapping and delete unnecessary links
between transformations to optimize the mapping. You should configure the mapping with the least
number of transformations and expressions to do the most amount of work possible. You should
minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:

Configure single-pass reading.

Optimize datatype conversions.

Eliminate transformation errors.

Optimize transformations.

Optimize expressions.

234

Identifying Session Bottleneck


You can identify a session bottleneck by using the performance details. The Informatica Server
creates performance details when you enable Collect Performance Data on the General tab of the
session properties.
Performance details display information about each Source Qualifier, target definition, and
individual transformation. All transformations have some basic counters that indicate the number
of input rows, output rows, and error rows.
Any value other than zero in the readfromdisk and writetodisk counters for Aggregator, Joiner, or
Rank transformations indicate a session bottleneck. Low BufferInput_efficiency and
BufferOutput_efficiency counter values also indicate a session bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.

235

Optimizing a Session
You can perform the following tasks to improve overall performance:

Run concurrent batches.

Partition sessions.

Reduce errors tracing.

Remove staging areas.

Tune session parameters.

236

Identifying System Bottleneck


You can identify system bottlenecks by using system tools to monitor CPU usage, memory usage,
and paging.
The Informatica Server uses system resources to process transformation, session execution, and
reading and writing data. The Informatica Server also uses system memory for other data such as
aggregate, joiner, rank, and cached lookup tables. You can use system performance monitoring
tools to monitor the amount of system resources the Informatica Server uses and identify system
bottlenecks.
On Windows NT/2000, you can use system tools in the Task Manager or Administrative Tools.
On UNIX systems you can use system tools such as vmstat and iostat to monitor system
performance.

237

Identifying System Bottleneck on Windows NT / 2000


Use the Windows NT/2000 Performance Monitor to create a chart that provides the following
information:

Percent processor time. If you have several CPUs, monitor each CPU for percent
processor time. If the processors are utilized at more than 80%, you may consider
adding more processors.

Pages/second. If pages/second is greater than five, you may have excessive memory
pressure (thrashing). You may consider adding more physical memory.

Physical disks percent time. This is the percent time that the physical disk is busy
performing read or write requests. You may consider adding another disk device or
upgrading the disk device.

Physical disks queue length. This is the number of users waiting for access to the
same disk device. If physical disk queue length is greater than two, you may consider
adding another disk device or upgrading the disk device.

Server total bytes per second. This is the number of bytes the server has sent to and
received from the network. You can use this information to improve network bandwidth.

238

Identifying System Bottleneck on UNIX


You can use UNIX tools to monitor user background process, system swapping actions, CPU
loading process, and I/O load operations. When you tune UNIX systems, tune the server for a
major database system. Use the following UNIX tools to identify system bottlenecks on the UNIX
system:

lsattr -E -I sys0. Use this tool to view current system settings. This tool shows
maxuproc, the maximum level of user background processes. You may consider
reducing the amount of background process on your system.

iostat. Use this tool to monitor loading operation for every disk attached to the database
server. Iostat displays the percentage of time that the disk was physically active. High
disk utilization suggests that you may need to add more disks. If you use disk arrays, use
utilities provided with the disk arrays instead of iostat.

vmstat or sar -w. Use this tool to monitor disk swapping actions. Swapping should not
occur during the session. If swapping does occur, you may consider increasing your
physical memory or reduce the number of memory-intensive applications on the disk.

sar -u. Use this tool to monitor CPU loading. This tool provides percent usage on user,
system, idle time, and waiting time. If the percent time spent waiting on I/O (%wio) is
high, you may consider using other under-utilized disks. For example, if your source
data, target data, lookup, rank, and aggregate cache files are all on the same disk,
consider putting them on different disks.

239

Das könnte Ihnen auch gefallen