Sie sind auf Seite 1von 55

Informatica ETL Standards and Guidelines

September 17, 2016


Version: 1.0

Informatica ETL Standards and Guidelines

Document Control
Version History

Version

Date

Author

Modification Description

Reviewers

Date

Reviewer

Department

Page 2 of 55

Informatica ETL Standards and Guidelines


TABLE OF CONTENTS
1.

Overview...................................................................................................................................................................... 5
1.1

2.

Security Standards....................................................................................................................................................... 5
2.1
2.2
2.3
2.4
2.5

3.

Repository Folder Structure............................................................................................................................... 13


Informatica Server Structure.............................................................................................................................. 13
Parameter File.................................................................................................................................................... 15
Reusable Objects............................................................................................................................................... 16
Shared Folders................................................................................................................................................... 16

Data Quality................................................................................................................................................................ 17
6.1
6.2

7.

Informatica Team-based Deploy.........................................................................................................................12

Informatica Environment Standards...........................................................................................................................13


5.1
5.2
5.3
5.4
5.5

6.

Informatica Server Components...........................................................................................................................8


Database Connections......................................................................................................................................... 8
Tranformations..................................................................................................................................................... 9
Ports..................................................................................................................................................................... 9
Mappings............................................................................................................................................................ 10
Mapplets............................................................................................................................................................. 10
Sessions............................................................................................................................................................. 10
Workflows........................................................................................................................................................... 10
Worklet............................................................................................................................................................... 10
Parameter Naming Standards............................................................................................................................ 11

Version Control........................................................................................................................................................... 12
4.1

5.

Informatica Roles & Responsibilities.................................................................................................................... 5


UNIX Security....................................................................................................................................................... 6
Informatica Security.............................................................................................................................................. 6
Informatica Authentication Security......................................................................................................................7
Informatica Authorization Security........................................................................................................................ 7

Naming Conventions.................................................................................................................................................... 8
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10

4.

Purpose................................................................................................................................................................ 5

Data Transformations......................................................................................................................................... 17
Default Data....................................................................................................................................................... 17

Informatica Development Guidelines.........................................................................................................................19


7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15

Mapping Guidelines............................................................................................................................................ 19
Mapping and Workflow Documentation..............................................................................................................19
Transformation Object Documentation...............................................................................................................20
Expression Ports Documentation....................................................................................................................... 20
Mapping Recovery............................................................................................................................................. 21
Mapping Templates............................................................................................................................................ 21
Source Qualifier.................................................................................................................................................. 21
Filter.................................................................................................................................................................... 22
Joiner.................................................................................................................................................................. 22
Router................................................................................................................................................................. 23
Sorter.................................................................................................................................................................. 23
Aggregator.......................................................................................................................................................... 24
Lookup................................................................................................................................................................ 26
Sequence Generator.......................................................................................................................................... 28
Update Strategy................................................................................................................................................. 29
Page 3 of 55

Informatica ETL Standards and Guidelines


7.16
7.17
7.18
7.19
8.

Change Data Capture (CDC)..................................................................................................................................... 40


8.1
8.2
8.3

9.

Target.................................................................................................................................................................. 30
Mapplet............................................................................................................................................................... 31
Session............................................................................................................................................................... 32
Workflow/Worklet................................................................................................................................................ 37

Full Master.......................................................................................................................................................... 40
Incremental......................................................................................................................................................... 40
CDC Techniques................................................................................................................................................. 40

Logging....................................................................................................................................................................... 42
9.1
9.2
9.3

Informatica Session Logging..............................................................................................................................42


Informatica Workflow Logging............................................................................................................................42
Non-Informatica Logging.................................................................................................................................... 42

10. Error/Exception Handling and Recovery....................................................................................................................43


10.1
10.2

Essential Error Trapping and Recovery..............................................................................................................43


Informatica Errors and Warnings........................................................................................................................ 44

11. ETL Control................................................................................................................................................................ 45


11.1
11.2
11.3
11.4
11.5
11.6
11.7

ETL Control Data Model..................................................................................................................................... 45


ETL Transform Rule Data Model........................................................................................................................46
Rolling ETL Control Forward..............................................................................................................................46
Parameter File and ETL_CONTROL Synchronization.......................................................................................48
ETL Start............................................................................................................................................................ 48
ETL End.............................................................................................................................................................. 49
ETL Control Workflow......................................................................................................................................... 50

12. UNIX........................................................................................................................................................................... 51
12.1
12.2
12.3
12.4
12.5

UNIX Data Directories........................................................................................................................................ 51


UNIX Landing Directories................................................................................................................................... 52
UNIX Informatica Environment Variables...........................................................................................................52
UNIX Logging..................................................................................................................................................... 52
UNIX Naming..................................................................................................................................................... 52

13. File Processing........................................................................................................................................................... 53


13.1
13.2
13.3
13.4
13.5
13.6

File Types........................................................................................................................................................... 53
File Format and Transfer Validation....................................................................................................................53
File Naming........................................................................................................................................................ 53
File Notification................................................................................................................................................... 54
File Landing........................................................................................................................................................ 54
File Archiving...................................................................................................................................................... 54

14. Scheduling.................................................................................................................................................................. 55
14.1
14.2
14.3

Informatica Workflow.......................................................................................................................................... 55
Corporate Scheduler.......................................................................................................................................... 55
Data Warehouse Administration Console (DAC)................................................................................................55

Page 4 of 55

Informatica ETL Standards and Guidelines

1. Overview
1.1 Purpose
The purpose of this document is to define the ETL standards and guidelines that will be utilized in the design,
development of the Informatica processes for the Advantage 1.0 project. Due to the fact that additional ETL
standards may need to be incorporated in this document, this document will evolve overtime. The initial
version is based on standards and best practices already developed, practiced, and shared by Informatica
(Velocity) and client. Note, some guidelines will not apply to the pre-built Informatica mappings that come
with Oracle BI applications.

2. Security Standards

2.1 Informatica Roles & Responsibilities

A number of different Roles have been identified which will require different levels of access. It is
recommended that the Group feature in Informaticas security model be leveraged to aid in these role
assignments. It is understood that initially some individuals may be serving multiple roles as Support will
primarily be handled within the development groups at the outset.

Developer
o Full access to assigned project folders in Development
o Execute access to project specific database connections in Development
o Read access to assigned project folders in QA
o Read access to assigned project folders in Production
o Update access to assigned deployment groups in Development
o Read access to assigned project folders in QA
o Execute access to project specific database connections in QA
o Update access to assigned deployment groups in QA
Tester
o Read and possibly execute access to assigned project folders in QA
o Execute access to project specific database connections in QA
o Update access to assigned deployment groups in QA
o Read access to assigned project folders in Production
Support
o Full access to Break/Fix (B/F) environment in Development
o Read access to all project folders in Development
o Update access to all B/F deployment groups in Development
o Execute access to all database connections in Development
o Full access to all project folders in QA
o Update access to all B/F deployment groups in QA
o Execute access to all database connections in QA
o Read access to all project folders in Production
o Execute access to all database connections in Production
Administrator
Page 5 of 55

Informatica ETL Standards and Guidelines

o Full access to all environments including security administration


o Administrators will perform configuration changes and software upgrades
Deployer
o Must perform all deployments via deployment groups

2.2 UNIX Security

The Informatica installation resides on UNIX servers. Thus, in order for Informatica to be installed, and
function properly, an Informatica system account must be created. This account must have full privileges to
the directory in which Informatica will be installed. Additionally, it is recommended that Informatica have full
privileges to any directory it may need to process data. This includes, but is not limited to:
Temp directories
File processing directories
Archive directories
Log directories
Bad file directories
Cache directories
In order to accommodate multiple projects with distinct parent folders in Unix it may be preferable to utilize
the OS Profiles functionality in Informatica. This would require distinct Unix userids for each project with its
own set of system variables. The main Informatica environment would still run with the admin id but the
workflows in each folder would be run under distinct Unix userids.
2.3 Informatica Security

Informatica allows for the following types of authentication security as found in the PowerCenter
Administrator Guide, pages 55 - 57:
Native Authentication
For native authentication, the Service Manager stores all user account information and performs all user
authentication within the PowerCenter domain. When a user logs in, the Service Manager uses the native
security domain to authenticate the user name and password.
By default, the PowerCenter domain contains a native security domain. The native security domain is created
at installation and cannot be deleted. A PowerCenter domain can have only one native security domain. You
create and maintain user accounts of the native security domain in the Administration Console. The Service
Manager stores details of the user accounts, including passwords and groups, in the domain configuration
database.
LDAP Authentication
LDAP Authentication is not used currently for Informatica. However, if you have user accounts in an
enterprise LDAP directory service that you want to give access to the PowerCenter applications, you can
configure the PowerCenter domain to use LDAP authentication. Create an LDAP security domain and set up
a filter to specify the users and groups in the LDAP directory service who can access PowerCenter and be
included in the security domain.
The Service Manager imports the users and groups from the LDAP directory service into an LDAP security
domain. You can set up a schedule for the Service Manager to periodically synchronize the list of users and
groups in the LDAP security domain with the list of users and groups in the LDAP directory service. During
synchronization, the Service Manager imports new users and groups from the LDAP directory service and
deletes any user or group that no longer exists in the LDAP directory service.

Page 6 of 55

Informatica ETL Standards and Guidelines

When a user in an LDAP security domain logs in to a PowerCenter repository or application, the Service
Manager passes the user account name and password to the LDAP directory service for authentication. If
the LDAP server uses SSL security protocol, the Service Manager sends the user account name and
password to the LDAP directory service using the appropriate SSL certificates.
You can use the following LDAP directory services for LDAP authentication:

Microsoft Active Directory Service


Sun Java System Directory Service
Novell e-Directory Service
IBM Tivoli Directory Service
Open LDAP Directory Service

You create and manage LDAP users and groups in the LDAP directory service; assign roles, privileges, and
permissions to users and groups in an LDAP security domain; and assign LDAP user accounts to native
groups to organize them based on their roles in the PowerCenter domain. You cannot use the Administration
Console to create, edit, or delete users and groups in an LDAP security domain.

2.4 Informatica Authentication Security

The Informatica installation will use a combination of native and LDAP security. Native security is necessary
in the event a connection to the Corporate LDAP cannot be established.
For normal use, a security domain named Corporate_LDAP can be created. Developers, testers, and
administrators will log into this Informatica security domain using their Directory Services user ID and
password. This approach reduces the total amount of logins a user needs to maintain throughout the
security landscape.
In order to connect to the Corporate LDAP, the following must be set in the LDAP Configuration:
In order to add users to the Corporate_LDAP security domain, the User Search Base field must be set to
ou=person, o=??.com, c=us as shown in Figure 2 below:
Additionally, specific syntax must be used to add each individual user. The entire string specified in the User
filter filed must first be enclosed in parentheses. The | (pipe) symbol must be used to indicate that the
pursuant user IDs are to be concatenated. Lastly, each user ID must be contained in parentheses, and
follow the uid= LDAP syntax.
Finally, after adding new users, the Administrator must click the Synchronize Now button to retrieve the
user information from the Corporate LDAP.
In the future, an LDAP role may be created in the Corporate LDAP system, with the appropriate users
assigned to the role in Corporate LDAP. This would mean that rather than maintaining a long string of users
in the User filter section, a single group can be specified in the Group filter section. Since there are only a
handful of users in the system currently, the User filter approach is preferred, and does not require the
Corporate LDAP team to maintain another LDAP group.
2.5 Informatica Authorization Security

Informatica comes with predefined roles for users. For further description regarding these roles, see the
Informatica Administrator Guide, Chapter 5: Managing Privileges and Roles.
Page 7 of 55

Informatica ETL Standards and Guidelines

It is important to note that these roles only grant authorization as to what a user has the ability to do within
the Informatica PowerCenter utilities. It does not grant authorization to specific objects (i.e.: sources, targets,
mappings, etc) within the PowerCenter utilities. If a user wants to allow another user to have access to
their objects, they must explicitly grant access to those objects from within the PowerCenter utility.

3. Naming Conventions

3.1 Informatica Server Components

All Informatica server components are prefixed with the Informatica Domain name, followed by
_<COMPONENT NAME>. Additionally, since multiple nodes may be used to enable the use of Informatica
as a service across the Enterprise, nodes have the additional designation of a numeric suffix. The following
are the names of the Informatica server components:
Development
Domain: INFADEV
Node(s): INFADEV_node01
Integration: INFADEV_integration
Repository: INFADEV_repository
Quality Assurance
Domain: INFAQA
Node(s): INFAQA_node01
Integration: INFAQA_integration
Repository: INFAQA_repository
Production
Domain: INFAPROD
Node(s): INFAPROD_node01
Integration: INFAPROD_integration
Repository: INFAPROD_repository
3.2 Database Connections

The PowerCenter designer utilizes ODBC connections to databases. As such, there is no standard for
defining ODBC connections.
The PowerCenter Workflow Manager contains a Relational Connection Browser, however, in which
connections to relational databases are defined for the workflows. When naming these connections, spaces
and special characters are not allowed, with the exception of the _ (underscore) character. It is suggested
that the name of the relational connection contain the login name, followed by the _ (underscore) character,
followed by the SID of the relational database. Below is the current connection name:
DB Connection Name: F2INFOD
User: INFDDBR1
SID: F2INFOD

Page 8 of 55

Informatica ETL Standards and Guidelines


3.3 Tranformations

The following Figure 3 gives standard naming conventions for transformations:


Transformation

Naming Convention

Transformation

Naming Convention

Aggregator

AGG_TransformationName

Router

RTR_TransformationName

Application Source Qualifier

SQ_TransformationName

Sequence Generator

SEQ_TransformationName

Custom

CT_TransformationName

Sorter

SRT_TransformationName

Expression

EXP_TransformationName

Stored Procedure

SP_TransformationName

External Procedure

EXT_TransformationName

Source Qualifier

SQ_TransformationName

Filter

FIL_TransformationName

SQL

SQL_TransformationName

HTTP

HTTP_TransformationName

Transaction Control

TC_TransformationName

Java

JTX_TransformationName

Union

UN_TransformationName

Joiner

JNR_TransformationName

Update Strategy

UPD_TransformationName

Lookup

LKP_TransformationName

XML Generator

XG_TransformationName

MQ Source Qualifier

SQ_MQ_TransformationNam
e

XML Parser

XP_TransformationName

Normalizer

NRM_TransformationName

XML Source Qualifier

XSQ_TransformationName

Rank

RNK_TransformationName
Figure 1 - Transformation Naming Conventions

There may be instances in naming a Lookup transformation when the developer needs to distinguish how a
lookup is performed (i.e.: via a key vs. a code). In these instances, a lookup may be named
LKP_TransformationName_KEY or LKP_TransformationName_CD.

3.4 Ports

Ports are generally named for the field they represent. For example, if there is a column in a target table
called FIRST_NAME, the Informatica target definition should have a port of the same name. Thus, it is
recommended that for data lineage, as well as for developer best practices, a port should be named for the
source field from which it comes, or the target field to which it will be populated appropriately.
Quite often, however, transformations may have ports not found in sources or targets. In these cases, it is
preferable that the developer use ALL CAPS to define the field names, the _ (underscore) character as a
separator, and judiciousness when naming the field so as not to cause confusion for other developers.
In circumstances when a transformation contains inputs that will not be output, the port should be prefixed
with IN_ indicating that it is just an input port. Similarly, if the transformation contains output fields that were
not inputs to the transformation, the prefix OUT_ should be used to indicate they are output only ports.
When naming variable ports, the prefix V_ should be used.

Page 9 of 55

Informatica ETL Standards and Guidelines

Figure 2 - Port Naming Example

3.5 Mappings

A mapping is a set of source and target definitions linked by transformation objects that define the rules for
data transformation. Mappings represent the data flow between sources and targets. The recommended
naming convention for mappings is m_MappingName, where MappingName is a reasonable description of
the purpose of the mapping. This description should contain the target database schema and possibly the
target load table or subject area. For example, a staging mapping may be called m_STG_TABLE_NAME
while a IDR mapping maybe called m_FDW_TABLE_NAME.
3.6 Mapplets

Mapplets are reusable objects containing transformations. The purpose of a mapplet is to allow multiple
mappings the ability to use the same block of transformation logic. By creating the mapplet once, developers
increase coding efficiency, while at the same time decreasing diagnosis and maintenance. The
recommended naming convention for Mapplets is mplt_MappletName, where MappletName is a reasonable
description of the purpose of the Mapplet.
3.7 Sessions

Sessions are the Workflow representation of a mapping, and all the instructions that surround that mapping.
The recommended naming convention for Sessions is s_MappingName (without the m_). This is the
default naming convention and indicates that a session is related to a specific mapping.
3.8 Workflows

A workflow is a set of instructions that tells the Integration Service how to run tasks such as sessions,
Worklets, email notifications, and shell commands. The recommended naming convention for a Workflow is
wfk_WorkflowName, where WorkflowName is a reasonable description of the purpose of the Workflow.
3.9 Worklet

A Worklet is a reusable object that contains a series of tasks. Each Worklet can be used in multiple
Workflows similar to the way a Mapplet can be used in multiple Mappings. The recommended naming
convention for a Worklet is wlkt_WorkletName where WorkletName is a reasonable description of the
purpose of the Worklet.

Page 10 of 55

Informatica ETL Standards and Guidelines


3.10 Parameter Naming Standards

Informatica allows Mapping and Session parameters to be overridden at runtime through the use of
Parameter Files.
Mapping Parameters should be defined in a consistent case separated by underscores to improve
readability. Mapping parameters are prefixed with $$. The ETL design documents should detail out those
standard parameters utilized by every mapping or targeted mappings.
Session Parameters must match the formats defined in Informatica and are typically mixed case; these must
follow predefined naming conventions. Session parameters allow the overriding of source, target and log files
as well as database connection names. Session parameters are prefixed with $.
The following examples are provided:
Mapping Parameter
$$LAST_EXTRACT_DATE
$$EFF_START_DATE
$$ETL_CREATE_DATE
Session Parameter
$DBConnection_Target
$InputFile1
$PMSessionLogFile

Page 11 of 55

Informatica ETL Standards and Guidelines

4. Version Control

4.1 Informatica Team-based Deploy

Informatica contains built-in version control allowing developers to track their work as objects are built.
Information about Informatica versioning can be found in the Informatica Repository Guide, Chapter 6:
Working with Versioned Objects. The following guidelines are recommended to ensure object tracking,
repository backups.

Objects must be checked-in before they can be made available to the public repository. If an object
is not checked in, other developers, and other parts of the Informatica PowerCenter suite will not be
able to access that object.
Similarly, existing checked-in objects must be checked-out in order to make changes to them.
Clear descriptions about the modifications must be entered into the Description box prior to checkin, to ensure proper communication of modifications.
Prior to modifying an object that is checked-in, it is recommended that developers use one or more
of the functions in the Analyze menu of the Repository Manager to view dependencies of the
object. This will allow the developer to understand and communicate the effects of the change they
are about to make. If the developer does not have permissions to perform this function, they should
request that this analysis be done by an Administrator.

In addition to using the built-in object versioning of Informatica, it is recommended that the ETL Architect
implement code-freeze checkpoints. During this time, the Informatica objects should be labeled, and the
entire repository should be exported to XML, and checked-in to a separate version control system (i.e.
ClearCase).

Page 12 of 55

Informatica ETL Standards and Guidelines

5. Informatica Environment Standards

5.1 Repository Folder Structure

The Informatica ETL tool will be used to support multiple applications across the Enterprise. The planned
installation architecture will consist of multiple Informatica servers across each of the planned environments
(Development, Test, and Production).
Folders will be created in the repository for each key application. These folders must be created in
ClearCase also. To reduce the issue of a cluttered repository, rather than create personal folders, sandbox
folders will be created for each project. The sandbox folder would have full permissions for everyone in that
project group. Developers in the project can work in the sandbox folder, and once they are finished
researching or experimenting, they can move their code to the main project folder. Each developer should
clean up all objects in the sandbox folder once the developer is finished. A periodic purge process will be
initiated to remove objects in the sandbox folders that have not been modified for 60 days. The following
figure provides a list of the known applications that will be developed and the structures for the repository
configurations. Additional folders for sharing may also be created.
Each application should also have a folder for break/fix. This folder would represent the current production
version of the application, and should only be editable by the Informatica Support Team. Figure 5 shows an
example of each Informatica application, and their break/fix counterpart. Note: the pre-built Informatica
mappings are already organized in separate pre-built folders.
Application

Repository Folder Name

Expense Planning and Corporate Forecasting

FM_EPCF

Expense Planning and Corporate Forecasting Break/Fix

FM_ EPCF_BF

Expense Planning and Corporate Forecasting Sandbox

FM_ EPCF_SANDBOX

Financial Data Warehouse

FM_FDW

Financial Data Warehouse Break/Fix

FM_FDW_BF

Financial Data Warehouse Sandbox

FM_FDW_SANDBOX
Figure 3 - Repository Folders

5.2 Informatica Server Structure

Informatica provides a directory structure out of the box for development. The following figure provides the
session variables that are provided with Informatica and the descriptions for their usage. Applications should
adhere to these usage rules. Rather than default to $PMRootDir with relative pathing off of root it may be
preferable to utilize the OS Profiles functionality where a distinct set of system variables will be set up for
distinct users with a different user per Folder, allowing the base system variable names to be utilized directly
in the sessions.
Informatica Session Variable

Description of Usage

$PMRootDir

A root directory to be used by any or all other server variables.


Page 13 of 55

Informatica ETL Standards and Guidelines

Informatica Session Variable

Description of Usage

Informatica recommends you use the PowerCenter Server


installation directory as the root directory. Instead, we
maintain application specific folders separately from the
Root installation directory. As such $PMRootDir should be
set to a common application folder /
{/opt/Informatica/PowerCenter8.1.1/server/infa_shared}. In
the production environment, this folder is logically shared
in the Unix file system across the 2-failover servers.
$PMSessionLogDir

Default directory for session logs. Defaults to


$PMRootDir/SessLogs but should be set in the environment to
$PMRootDir.

$PMBadFileDir

Default directory for reject files. Defaults to


$PMRootDir/BadFiles but should be set in the environment to
$PMRootDir.

$PMCacheDir

Default directory for the lookup cache, index and data caches,
and index and data files. To avoid performance problems,
always use a drive local to the PowerCenter Server for the
cache directory. Do not use a mapped or mounted drive for
cache files. Defaults to $PMRootDir/Cache but should be set in
the environment to $PMRootDir.

$PMTargetFileDir

Default directory for target files. Defaults to $PMRootDir/TgtFiles


but should be set in the environment to $PMRootDir.

$PMSourceFileDir

Default directory for source files. Defaults to


$PMRootDir/SrcFiles but should be set in the environment to
$PMRootDir.

$PMExtProcDir

Default directory for external procedures. Defaults to


$PMRootDir/ExtProc but should be set in the environment to
$PMRootDir.

$PMTempDir

Default directory for temporary files. Defaults to


$PMRootDir/Temp but should be set in the environment to
$PMRootDir.

$PMWorkflowLogDir

Default directory for workflow logs. Defaults to


$PMRootDir/WorkflowLogs but should be set in the environment
to $PMRootDir.

$PMLookupFileDir

Default directory for lookup files. Defaults to


$PMRootDir/LkpFiles but should be set in the environment to
$PMRootDir.
Figure 4 - Informatica Provided Server Variables
Page 14 of 55

Informatica ETL Standards and Guidelines

To support multiple applications and to separate application data at both a repository and UNIX level, the out
of the box Informatica server variables will not be used. In order to separate application data, all session
variables will be defined as $PMRootDir and all applications must override any session variables with the
directories for their application. These directories must be defined relative to $PMRootDir to avoid hard
coding full paths of directory names. For example, the cache directory for the IDR application should point to
$PMRootDir/IDR/Cache for the $PMCacheDir value.
Based on the table above, the following directory structures must be created to support the different
applications. Rather than using the default server variable, the one $PMRootDir reference will be used with
the path name appended and recorded in each session/workflow.
Default Server Variable

Workflow Variable/Session Variable

Application Unix Path Name

Override Path Names


$PMSessionLogDir

$PMRootDir/EPCF/SessLogs

/{fmacdata}/EPCF/SessLogs

$PMBadFileDir

$PMRootDir/EPCF/BadFiles

/{fmacdata}/EPCF/BadFiles

$PMCacheDir

$PMRootDir/EPCF/Cache

/{fmacdata}/EPCF/Cache

$PMTargetFileDir

$PMRootDir/EPCF/TgtFiles

/{fmacdata}/EPCF/TgtFiles

$PMSourceFileDir

$PMRootDir/EPCF/SrcFiles

/{fmacdata}/EPCF/SrcFiles

$PMTempDir

$PMRootDir/EPCF/Temp

/{fmacdata}/EPCF/Temp

$PMExtProcDir

$PMRootDir/EPCF/ExtProc

/{fmacdata}/EPCF/ExtProc

$PMWorkflowLogDir

$PMRootDir/EPCF/WorkflowLogs

/{fmacdata}/EPCF/WorkflowLogs

$PMLookupFileDir

$PMRootDir/EPCF/LkpFiles

/{fmacdata}/EPCF/LkpFiles

TBD

$PMRootDir/EPCF/Parameters

/{fmacdata}/EPCF/Parameters

TBD

$PMRootDir/EPCF/DataIn

/{fmacdata}/EPCF/DataIn

TBD

$PMRootDir/EPCF/DataOut

/{fmacdata}/EPCF/DataOut

Figure 5 - Application Informatica UNIX Structures for EPCF

5.3 Parameter File

All mappings will use parameters to help drive processing and flow. See section 10 for specifics relevant to
ETL Control interaction with parameter file usage. However, each mapping, depending on the style of
mapping, will contain standard parameters. Following are some standards and guidelines regarding
parameter usage.

Page 15 of 55

Informatica ETL Standards and Guidelines

1) Parameter files should be referenced at the workflow level only versus the session. Should a special
parameter exist for a mapping, incorporate it in the workflow referenced parameter file tagged
appropriately to the session. (See Informatica parameter file configuration)
2) The following parameters are used in all mappings
a. $$EXT_FROM_TM a timestamp
b. $$EXT_THRU_TM a timestamp
c. $$LAST_BUSN_PROC_TM a timestamp
d. $$ETL_CREATE_TM a timestamp
e. $$SRC_SYS_CD a string field
f. $$TGT_SYS_CD a string field
3) The following parameters are used (in addition to other parameters) in type 2 mappings
a. $$ETL_EFF_FROM_TM a timestamp
b. $$ETL_EFF_THRU_TM a timestamp
c. $$ETL_LAST_MOD_TM a timestamp
d. $$INFINITY_TM a timestamp
4) Other parameters may be used to ensure consistency, such as
a. $$CURR_IND_Y, $$CURR_IND_N, $$ACTV_IND_Y, $$ACTV_IND_N all string fields

5.4 Reusable Objects

Any mapping transformation and session can be marked reusable. However, there are key ETL components
that should always be marked reusable:
Sequence generators
Lookups for foreign keys (i.e. Reference tables, Dimensions, Hubs and Links)
Sessions
Other mapping transformations should be marked reusable when the transformation can be used in more
than one mapping or mapplet. Keep in mind the following:
1) Changes to any reusable object may affect everywhere it is used. Before applying the change,
determine the impact of the change by examining where it is used and how it is used.
2) Lookups these should be configured to work the exact fashion in every case, for ports, return
values and conditions. For example, since a lookup condition can be augmented or altered in a
mapping, the reusable lookup condition should include only the conditions that are mandatory.
5.5 Shared Folders

Informatica allows the sharing of objects by using shared folders.


Sharing objects via a Shared Folder results in increased maintenance and is managed centrally. Care must
be taken when importing or re-importing shared objects in the repository since many applications may be
relying on these shared objects and any relationships that exist between those shared objects. Therefore
sharing of any objects is to be avoided unless approved centrally by the Informatica Administrator. The
default for any object should be to place it in a non-shared folder.
If an object has been approved for sharing, it should be placed in a shared folder for that project. Each
project that uses shared objects should have a corresponding Informatica shared project folder. The name of
that folder should include the word shared after the project name. For example: FDW_Shared.
Shared folders may be appropriate when:
1) Downstream consumers shortcut into the source
Page 16 of 55

Informatica ETL Standards and Guidelines

2) Reusable objects are made available for downstream consumers


Shared folders are not recommended for the following reasons:
1) Upon deployment, issues often arise where those unaffected mappings which reference
shared objects become invalidated
2) The development environment and break fix operations may conflict since there are pending
development changes while potential production fixes are occurring.

6. Data Quality

6.1 Data Transformations

Currently, no data quality tools have been purchased to assist with the Advantage project. Thus it becomes
necessary for Business Analysts to ensure that all data quality related business rules are captured, and
necessary for the ETL designers to ensure these business rules are reflected in the ETL designs using Data
Transformations provided by Informatica.
There are basic transformations that are applied regardless of requirements. These are typically common
sense application of data quality. Some of those are:
1) Trimming of strings assuming database string columns will be varchar in nature; it is standard to
trim leading and trailing spaces. This is necessary to avoid join issues, data comparisons
2) Data type conversion convert incoming data into the target data type.
a. For dates, examine the incoming fields for the various forms of dates possible
b. For numeric fields, remove any punctuation (commas, currency and percent signs)
3) Case for certain domain types (Codes, Identifiers) capitalizing or placing in a consistent case may
be warranted.
4) Truncated fields this is problematic since there will be data loss. However, trimming may help.
5) Missing data refer to section 6.2
6) Domain value mapping there may be instances where there are known mapping of values into a
common set of values. For example, a flag field may contain these values to indicate true/false or
yes/no (0, 1, Y, y, N, n, X, space). Standardize the flag to either a true/false bit or a Y/N character
field for consistency
6.2 Default Data

In many instances, there are aspects of data that either require a relationship to another piece of data or is a
required field. In these events, fundamental solutions can be instituted to fill in the gaps or build the
relationships so the data continues to maintain integrity and value and error conditions are avoided. Some of
the options are:
1) Nulls not allowed when a database column contains a null constraints, which requires a non-null
value, often it may be required that when there is no content a default value is required. The best
approach may be to alter the database column and allow for nulls. If this is not possible, see the
following for possible solutions:
a. Date set to some high or low level date (e.g. 1900-01-01 or 9999-12-31)
b. Text (Code) any default value that is consistent to the domain. E.g. UNK
c. Text any default value that is consistent to the domain. E.g. Unknown
d. Numeric either 0 or some other value consistent to the domain. The danger here is the
non-null value may skew the expectation of the column meaning.
e. Error if the column is truly required and a default value is not acceptable, which will be the
case for foreign keys, surrogate keys and ETL audit fields then return an error condition.
Page 17 of 55

Informatica ETL Standards and Guidelines

2) Missing relative this situation occurs when a foreign key constraint is violated. The typical design
approach is to stop the process, since there may be an expectation that the relationship must exist.
A couple of approaches are
a. Ensure the ETL mappings that load the parent or foreign table, completely populate the table
from the same file that is being used in the down stream mapping. This will ensure the
foreign key relationship will exist.
b. Fall back to a default foreign key or set the key to null. If the field is not null, a default row is
identified in the relationship and that is substituted for the key. This may require seeding of
rows that contain the default key. E.g. 0 for a surrogate key.
3) Data does not fit this situation occurs when the domain of the field does not match the target
column. Should this occur, a default can be set (see Nulls not allowed). A note of caution is, the
data content will be lost and if not reported, there will be no indication of the situation.
4) Data Priming if key relationships are required, (i.e. loading basic default data into a hub or link)
this may require defaulting of a relationship, in the event the lookup cannot find the relative. For
example, a one-time SQL load can be run to load a hub or link, where the surrogate key is some
default value (e.g. 0 or -99999). When it is determined that a relative is not found and the relationship
is mandatory, the foreign key reference will be set to the default surrogate key.

Page 18 of 55

Informatica ETL Standards and Guidelines

7. Informatica Development Guidelines


The following sections provide general programming guidelines and should be used as a guide. There may
be cases when these guidelines are not appropriate and the ETL team should use its best judgment in
exercising these rules. In addition, for those options, features and components not mentioned in these
guidelines, reference the relevant Informatica help and product documentation.
7.1 Mapping Guidelines

When creating a mapping, there are key considerations and guidelines that foster ease of maintenance,
restart-ability and performance. These guidelines are:
1) Write a single mapping per a target when appropriate. This could be when a single or multiple
sources are needed to populate a target table. This may not make sense when a single source with
high volume has to load multiple targets.
2) Try not to make overly complex mappings and instead, break it into multiple logical mappings. This
will simplify maintenance and restart-ability. This may not be possible at all times such in cases
where better performance is needed.
3) Limit the number of transformations when possible. There is overhead involved in moving data
between transformations. Consider more shared memory for large number of transformations.
Session shared memory between 12 and 40 MB should suffice.
4) Only connect what is used. Delete un-needed links between transformations. This minimizes the
amount of data moved. This is very important in the source qualifier.
5) Avoid using stored procedures unless it is the only option. If the logic can be accomplished through
Informatica, use Informatica.
7.2 Mapping and Workflow Documentation

Documentation should be included in all objects within a Mapping or Workflow. In addition, the Mapping or
Workflow itself must have documentation that describes the Mapping/Workflow at a high level.

Figure 6 - Mapping Documentation

The mapping/workflow comments should include a high level description of the mapping/workflow.
Objects that should be documented include and are not limited to the following:
Transformation objects
Expression ports
Expression code
Workflow tasks
Page 19 of 55

Informatica ETL Standards and Guidelines

Workflows and Worklets


Sessions

7.3 Transformation Object Documentation

Every Transformation object in a Mapping should be documented. In the first tab of each Transformation
object is a place for the description. Include a brief description of the object and its function in the mapping.
Documentation should be simple and non-technical. Example:

Figure 7 - Transformation Documentation


7.4 Expression Ports Documentation

Expressions have a variety of ports: Input, Output, Variable, Lookup, etc. Transformations that may be
difficult to decipher should have documentation added.
Expression code is often self-explanatory. However, some expressions can be complex and not easily
understood. In such circumstances, the developer will provide comments and possibly pseudo-code that will
help others understand the purpose and logic of the code. All documentation will follow the code and each
line of documentation will be prefaced with double right-slashes (//). Example:

Page 20 of 55

Informatica ETL Standards and Guidelines

Figure 8 - Expression Port Documentation

7.5 Mapping Recovery

To eliminate the need for manual intervention, each mapping must be designed to support automated
recoverability. When intervention is required, those exceptions should be clearly documented. The Workflow
restart features allow sessions to recover simply by restarting the workflow or session.
7.6 Mapping Templates

The use of mapping templates is essential in supporting maintainable and consistent processing. Each of
the templates will contain the standard transformations and flow that expected for the particular type of
mapping. The working templates should be copied by developers to begin the development of the mappings
that follow the template pattern. PowerCenter Data Stencil, which allows the design of ETL mappings in a
Visio format, may also be used. These Visio flows can be reversed engineered into Informatica mappings in
order to jumpstart the ETL development work. In some cases, mapplets may be also used to simplify
processing or share common code see the section related to Mapplets.
7.7 Source Qualifier

Informatica has extensive flexibility with its SQL qualifier transformation object. However, with that flexibility,
there are guidelines that will help ensure efficiency and maintainability. Those are:
1) SQL Query - overrides are to be avoided. It is used only when there is no better option to resolve a
complex join criteria situation that cannot be achieved other ways. If used, be sure that the columns
in the SELECT clause match the order of the ports passed through the transformation. Avoid all nonstandard SQL
2) User Defined Join when the source qualifier has more than one incoming source, use this to
place the join predicates.
3) Source Filter - place any filtering of source data in this box to prevent unnecessary traffic moving
across the network and through additional mapping logic.
4) Number of Sorted Ports use this when there is a need to sort incoming data. Keep in mind that
the outgoing ports should be ordered (in this transformation port section) in the fashion expected.
5) Select Distinct select this when there is a need to gain unique or distinct sets of data. Keep in
mind that this can be an expensive database operation and may not gain the desired result if certain
data is to be transformed (e.g. trimmed)
Page 21 of 55

Informatica ETL Standards and Guidelines

6) Pre/Post SQL when using this in the mapping, keep in mind this is embedded in the mapping. It is
recommended to be placed in the session to ensure optimal flexibility.
When overriding table join logic, define all the source tables in the mapping even if they are not needed.

Figure 9 - Source Qualifier Transformation

7.8 Filter

Filter transformations are used to effectively narrow down incoming data, based upon a set of criteria, for
process. Some guidelines are:
1) Use filters when a SQL filtering condition is not feasible, otherwise, filter in the Source Qualifier when
possible.
2) Keep in mind a filter is considered active, so all points in the processing flow will have to converge to
the filter
3) A filter is appropriate on a true/false condition on whether data is to be processed or ignored
4) When multiple outcomes or processing dependencies occurs, consider a router. See router for more
specifics.
7.9 Joiner

Joiners may be necessary for a number of reasons: joining disparate sources, simplifying source qualifier
joins or addressing performance issues. A few guidelines are:
1) Joiners are expensive transformations; use them only when performance at the database level is
prohibitive. This usually occurs when there are too many joins in a SQL statement.
2) Use joiners to join disparate sources
3) Prior to joining, ensure that the join predicates have been trimmed or transformed accordingly.

Page 22 of 55

Informatica ETL Standards and Guidelines

4) Be careful about mixing character types. When joining on a field with varying types and lengths, the
join may not work.
5) When configuring the joiner, the following guidelines are relevant to the Ports tab
a. Join Type this instructs the joiner to perform an equal join (normal join), full outer join or
left and right joins. Carefully read Informatica help to select which join to use. Typically, a
master outer join is where all detail rows are kept and the master rows are matched against.
Informatica recommends, using the source with fewer rows as the master source. Hence a
master outer join is a better performer than a detail outer join.
b. Sorted Input - to boost performance, it is often better to presort the incoming data prior to
the joiner. Achieve this with a SQL sort or sorter transformation. This can significantly
improve performance.

Figure 10 - Joiner Properties

7.10 Router

This is a transformation that directs logic in one or more paths. A router is a more efficient transformation vs.
a filter when there is multiple decision logic to be performed. Hence use a router when:
1) More than one path is taken depending on a filtering condition.
2) A decision path results in one or more true paths
3) There is a need for a default decision path
7.11 Sorter

Sorting is a standard need when processing and storing data. However, sorting is a performance intensive
process demanding attention to where the sort occurs and sorting efficiency. When deciding on the sorting
methods consider:
1) Leverage DBMS sorting features when possible. See the section on Source Qualifier for selecting a
sorting option. By sorting in the DBMS, this reduces Informatica resources, shifting that the DBMS.

Page 23 of 55

Informatica ETL Standards and Guidelines

2) Consider using a sorter prior to an aggregator to minimize the impact of resource demands incurred
during an aggregation.
3) Sort only those fields necessary to sort.
4) When configuring sort Properties, some settings to note:
a. Case Sensitivity - click case sensitive (default) on when this is necessary. Uncheck the box
to increase efficiency.
b. Distinct only select this when unique rows are required. This may be preferable to use
versus an aggregator transformation.
c. Null Treated Low DBMS will typically sort nulls first. If this is the desired, then select this
box to ensure Informatica sorts the data in that fashion.

Figure 11 - Sort Properties

7.12 Aggregator

The need to aggregate or summarize data may be necessary. Aggregators are transformations used to
facilitate this type of processing. Often aggregate transformations are used for other purposes, such as deduplicating records or taking the first, last, max or min values of a record. Aggregators take significant
resources and should be used only when essential to processing. When deciding on the sorting methods
consider:
1) Aggregators are active transformations, requiring all data towards a target instance, to be passed
through the transformation. Since this is the case, consider aggregating either upstream or
downstream with few ports.
2) Reserve aggregating to intervals when the number of records are fewer. Hence, allow for filtering
and routing to be complete before aggregating.
3) When configuring Ports, some settings to note:
a. GroupBy select only those fields that are used in the aggregation. Do not select fields if
they are not in any way a key. If no ports are selected, the transformation will bring back only
one row.

Page 24 of 55

Informatica ETL Standards and Guidelines

b. Expression this enables flexible inclusion of functions that are necessary during
aggregation. If a field is not grouped, Informatica will default to the first value that is
encountered, based upon the GroupBy keys. Use the proper function to acquire the desired
result (e.g. max, min, last, first). The expression box is also used to perform aggregation
functions, such as sort, avg, mean, etc.

Figure 12 - Aggregator Ports

Page 25 of 55

Informatica ETL Standards and Guidelines

4) When configuring Properties, some settings to note:


a. Sorted Input when the incoming keys are in a sorted order select this option to optimize
performance. If possible, sort the incoming data in advance either using features in the
Source Qualifier or Sorter transformation.
b. Aggregator Data(Index) Cache Size in general follow Informatica recommendations
when sizing an aggregator. Keep the default cache size unless performance becomes a
concern.

Figure 13 - Aggregator Properties

7.13 Lookup

Lookups are useful transformations that foster change detection as well as other functions. Since most
lookups are essentially accessing relational databases, even though flat file lookups are permissible,
performance is a key consideration. Hence, lookup transformations are used when data volumes are
relatively small (less than 500 MB) or when no alternative is available. Small lookups can improve
performance while large volumes may necessitate performing SQL joins to gain efficiency.
Refer to Informatica documentation for recommended settings; however, the following are guidelines for the
should be considered.
1) Only bring in ports that are relevant to the lookup. In other words do not carry forward into the
transformation ports that have no relation to looking up data.
2) When configuring the lookup, the following guidelines are relevant to the Ports tab
a. Reusable Lookups - make a lookup reusable if it used in more than one mapping.
b. Ports - When creating a lookup, select only the ports that are required and cache only those
columns that are required. See the following figure.

Page 26 of 55

Informatica ETL Standards and Guidelines

c.

Disconnected/Connected Lookups connected lookups are embedded in the flow of the


mapping, while disconnected lookups are referenced in an expression of some sort. Use
disconnected lookups when there is a need to return only one port and the lookup may be
required for multiple lookups in the mapping. Use connected lookups to return more than
one port and its natural to incorporate in the logic flow. When using a disconnected lookup,
ensure an R return port is selected in the lookup. Select O output for those ports that will
be returned in the connected lookup.

Figure 14 Lookup Transformation Ports

3) Lookup properties provide a range of options to facilitate lookup processing and efficiency. When
configuring the lookup, the following guidelines are relevant to the Properties tab
a. Lookup SQL Override - overrides are to be avoided. It is used only the lookup complexity
is such that more than one table is required to support a lookup condition. If used, be sure
that the columns in the SELECT clause match the order of the ports passed through the
transformation.
b. Lookup Caching Enabled select this to entirely cache the table. This is generally the best
option for smaller tables; however, if the table is large and it is necessary to perform a
lookup, then do not click this box since caching large files is very expensive. Refer to
Informatica documentation for recommendations regarding caching size and performance
configuration.
c. Lookup Policy on Multiple Match generally take the default of Report Error. If there is a
potential to return more than one row and that is acceptable, select one of the other options
(return first, last or any).
d. Lookup Condition this is used as the selection criteria for the lookup. Ensure that the
condition will result in no more than one row being returned; otherwise, see Lookup Policy
on Multiple Match.
e. Lookup Cache Persistent click this when this lookup is being used from multiple
mappings and the source data is unlikely to change during the run
f. Re-cache from Lookup Source click this when this is expected to be the first mapping to
use the lookup. This is enabled only when Lookup Cache Persistent is enabled. All

Page 27 of 55

Informatica ETL Standards and Guidelines

subsequent mappings should not have this option checked. This will ensure that the data is
refreshed from the source the first time the lookup is used.

Figure 15 - Lookup Transformation Properties

7.14 Sequence Generator

The best practice is to use a sequence generator transformation to generate surrogate keys rather than
relying on the database to generate sequence numbers. Predominant reason is to ensure optimal
maintenance and flexibility. A few guidelines are:
1) Make the transformation reusable in all cases. This is to support deployment and visibility into
sequence generators.
2) When caching values, ensure the Number of cached values is set to appropriate align with the
numbers of rows to process. That is, for master data that may infrequently change and is of low
volume, set the count to 10. If the volume is larger, say 10,000 average inserts in a run, set to
10,000. This is a judgment call, but in no case leave at 0, since this could cause a performance
issue.
Page 28 of 55

Informatica ETL Standards and Guidelines

3) When multiple mappings can use the same sequence generator at the same time, it is required to
cache the lookup values.

Figure 16 - Sequence Generator Properties

7.15 Update Strategy

Update Strategy is used to ultimately describe the type of action occurring on the target, whether an insert,
update or delete.
1) Place this transformation as the last transformation prior to the target.
2) Use this transformation in all mappings to ensure the mapping controls the target processing.
Otherwise, the session must be configured to guide target behavior.
3) Constraint based loading may not operate properly when used in tandem with Update Strategies.
4) When configuring the lookup, the following guidelines are relevant to the Ports tab
a. Update Strategy Expression rather than entering in numeric values, enter in
DD_INSERT, DD_UPDATE or DD_DELETE, for maintainability purposes.

Page 29 of 55

Informatica ETL Standards and Guidelines

Figure 17 - Update Strategy

7.16 Target

The Target transformation is the descriptor of the target, as well as, consists of many opportunities for SQL
override.
1) When configuring the target, the following guidelines are relevant to the Properties tab:
a. Reject Truncation/Overflowed Rows leave this unchecked since rejection processing will
occur in accordance to reject/error handling standards and guidelines.
b. Update Override Do not use this unless absolutely necessary. It should be extremely rare
and override is used. If it is, the ETL architect should approve an ensure this is documented
appropriately. Also if an override is necessary, consider using in a session rather than a
mapping.
c. Table Name Prefix rather than placing in the prefix here, leverage DBMS synonym or
other features.
d. Pre/Post SQL this can be used to facilitate DBMS actions that may aid in target
processing, such as index drops, index creates, run statistics. It is not recommended to
place SQL in the mapping. To foster maintainability, place pre/post SQL in the session
pre/post functions.

Page 30 of 55

Informatica ETL Standards and Guidelines

Figure 18 - Target Properties


7.17 Mapplet

The mapplet object is very useful in supporting reusability, reducing mapping complexity and fostering
maintainability. These are situations where mapplets are appropriate:
1) Elements of ETL common to multiple mappings
2) Complex logic that will simplify processing. When a mapping begins to drift from the standards
templates, a mapplet is a mechanism to encapsulate differing logic.
3) When multiple sources are required to populate a single target and it becomes necessary to
separate the sources based upon timing or performance.
a. Each mapping will incorporate the source(s) and place the data into a common image to
import into the mapplet
b. The mapplet will contain the common processing that will occur regardless the source. This
will include, transforming the data, performing a lookup in the target, passing information to
determine the course of action (insert, insert/expire, update)
The following diagram depicts a Hub mapping with multiple sources. Should the mapping require
separation, based upon source, the transformations in the box can be separated out into a mapplet. What
may occur is any aggregation and union required due to processing multiple sources may be removed.
Hence, ultimately the mapplet will contain a Mapping Input, Expression, Lookup, Expression and Mapping
Output. The mapping itself will continue to contain any Filter or Router to ultimately process the target.

Page 31 of 55

Informatica ETL Standards and Guidelines

Figure 19 - Mapplet - Hub Example W/O Mapplet

The following figure depicts the removal of those common elements as well as those transformations, such
as unions and aggregators, that may no longer be required.

Figure 20 -Mapplet - Hub Example with Mapplet

7.18 Session

Sessions are wrappers to mappings that provide the execution environment parameters, configuration and
instructions to facilitate mapping execution. The initial setup of a session is essential to the long term
operation and execution of a mapping. When configuring a session, follow these guidelines:
1) All sessions should be created as reusable and initially configured to be production ready. When
testing, a copy of the session can be dropped into a worklet/workflow ultimately preserving the
pristine version.
2) When configuring the session Properties tab:
a. Parameter Filename keep this blank in all but rare cases. The parameter file should be
referenced at that worklet/workflow levels.
b. Enable Test Load for the pristine version, keep this unchecked.
c. Treat rows as the setting for this depends on the type of mapping and database actions.
For example, for Hub and Link mappings set to Insert or Data Driven assuming there are
Update Strategy before each target; for Satellite set to Data Driven.
Page 32 of 55

Informatica ETL Standards and Guidelines

d. Commit Type set to Target


e. Commit Interval the default of 10000 may be sufficient, however, this should be tailored to
the target load volume and should be altered to improve performance when deemed
necessary. Generally, there exists a balance of configuring the commit interval and testing
may be required.

Figure 21 - Session Properties

3) When configuring the session Config Object tab:


a. Constraint based load ordering Only check this when there is a need to sequence the
order of loading targets in a mapping. Constraint based loading may not operate properly
when used in tandem with Update Strategies.
b. Cache LOOKUP () Function check this box in all cases unless the lookup is very large
and the performance is acceptable.
c. Save session logs for this run set this to 10
d. Stop on errors set this to 1. This should never be set to anything different, unless there is
a rare instance where errors are acceptable.
e. Override tracing keep this set to the default, which provides the normal mode of tracing.

Page 33 of 55

Informatica ETL Standards and Guidelines

Figure 22 - Session Config Object

4) When configuring the session Mapping tab:


a. Connections use the DB connection parameters present in the associated workflow
parameter file. Do not directly reference the connection other than through the parameter.
b. Target Load Type the options are normal or bulk. Use bulk for loading stage tables,
transactions or tables that are significant in size.
c. Insert For Insert only mappings, do not check any of the other Update options since a
performance hit may occur
d. Update as Update this is typically selected for all Satellite mappings
e. Update as Insert not typically used
f. Update else Insert not typically used since the norm is to use Update Strategy for update
mappings.
g. Delete use only when deletes are used
h. Truncate Table Option Informatica allows the Truncate of a table by checking the
Truncate options for the target table in the session properties. When Informatica cannot
issue a truncate, it might issue a delete statement, which can cause performance issues.
When using native database connections, use the Truncate checkbox option. The user on
the connection must have access to truncate the target table. If the connection owner does
not have truncate/drop privileges on the target table, this method cannot be used. If the
connection owner is not the owner of the tables then the table owner must be specified in the
session properties override. There may be additional restrictions imposed by the RDBMS
that should be considered when using this option. For example, Oracle does not allow
truncate via a synonym. In this case, the owner of the target table must be specified for the
truncate to work. Verify through the session log that truncate is working. When the use of this
truncate option is not viable (for example when using ODBC connections), perform a table
truncate via the sessions pre-SQL property. When removing all data from a target table that
might be large, delete should be the final option.
Page 34 of 55

Informatica ETL Standards and Guidelines

i.

Pre/Post SQL use this sparingly, instead placing this in a SQL task in the workflow/worklet
will make it more visible

Figure 23 - Session Mapping

Page 35 of 55

Informatica ETL Standards and Guidelines

5) When configuring the session Components tab:


a. On Failure E-Mail every session should reference a reusable task that is used to send out
consistent emails. Upon failure the email should sent out
i. User Name Reference the email that is used in supporting the application. Do not
put in personal email.
ii. Subject Send the session name (%s) and text indicating a failure
iii. Text Attached more detailed text that contains the session name (%s), session
status (%e), number of rejected records (%r), session start and end times (%b, %c),
session elapsed time (%i) and attach the session log (%g).

Figure 24 - Session Component Tab

Figure 258 - Failure Email

Page 36 of 55

Informatica ETL Standards and Guidelines


7.19 Workflow/Worklet

Workflows and worklets are the mechanisms to orchestrate the execution of one or more sessions and tasks.
Workflows have configuration parameters that define the execution environment. When configuring a
workflow or worklet, follow these guidelines:
1) All workflow logs are written to the workflow log directory for that project
2) All session logs are written to the session log directory for that project
3) When configuring the worklet/session General tab:
a. All worklets, sessions and tasks within workflows should have the Fail parent if this task
fails option selected. This will cause the workflow/worklet to end in failure when any task
within it fails.

Figure 29 Worklet/Session General

4) When configuring the workflow Properties tab:


a. Parameter Filename The parameter file should be referenced here, rather than individual
sessions. All sessions and worklets that operate within the workflow will use the referenced
parameter file unless overridden at a session level.
b. Save workflow log for these runs keep the default of 10 unless there is a reason to alter.

Page 37 of 55

Informatica ETL Standards and Guidelines

Figure 30 - Workflow Properties

5) In the workflow, ensure there are conditions set between worklets and sessions. This is to ensure the
prior session/worklet have completed successfully. This will ensure the subsequent
sessions/worklets only process when that condition is met.

Page 38 of 55

Informatica ETL Standards and Guidelines

Figure 261 - Worklet Condition

Page 39 of 55

Informatica ETL Standards and Guidelines

8. Change Data Capture (CDC)


Change data capture (CDC) is a technique to determine the effect of a change in the source, allowing history
to be captured. The change detection process will determine if the source data is new, has changed, was
removed or is missing. There are several options in detecting CDC, which depend on the type of the source.
1) Full master files snapshot images of the data. Deletes are implicit.
2) Incremental Delta files each image of data is tagged as either new, changed or removed (explicit
deletes)
8.1 Full Master

A full master file depicts the snapshot image of a data source. Since there would not be any transaction
codes or indication regarding the event that added, changed or removed data, the ETL process would have
to compare the source with the current target data in order to determine the changes. Due to performance,
full comparisons may not be possible with high volume data because it would require a full outer join with the
source and target.
To ensure the ETL mappings are consistent in style and approach, use standard templates. See the section
on Mapping Templates.
8.2 Incremental

An incremental file should explicitly indicate the event that occurred with data. Hence, there should be some
sort of transaction code that indicates whether the data is new, changed or removed. The ETL approach in
determining changes is straightforward since the transaction determines the outcome.
The process to determine changes are:
1) New (insert) essentially insert the data into the database. If the data already exists, then error out
the row and report the error. This situation only occurs if the data has been already processed or the
source system presented the data in error.
2) Changed (update) - take the incoming key fields and lookup the corresponding data in the target
table using a Lookup transformation. If the result of the transformation indicates the row is found
then apply the changes. If the data is not found, , then error out the row and report the error.
3) Removed (delete) take the incoming key fields and lookup the corresponding data in the target
table using a Lookup transformation. If the result of the transformation indicates the row is found
then apply the delete. If the data is not found, then error out the row and report the error.
Another option might be to perform an upsert process.
8.3 CDC Techniques

To achieve CDC processing, Informatica offers a few options. However, for information purposes the
following describes a couple of techniques, the preferred one listed first.
1) Function calculating the checksum value of data is an effective mechanism in detecting change.
Rather than comparing incoming fields to prior fields, which can lead to errors and inefficient
processing, storing the checksum value is the preferred approach. What will occur is:
a. Calculate a checksum of key incoming data using a function (like MD5) and store it with the
data in the target table.
b. When looking up the data in the database, the lookup will return only the checksum value
and minimal columns to support CDC. This will minimize the size of the lookup.
Page 40 of 55

Informatica ETL Standards and Guidelines

c.

Should the CDC values differ (incoming vs. lookup) the row is targeted for change, otherwise
it is bypassed
d. When calculating the checksum, always include the surrogate key, foreign keys, and satellite
business data. Do not include the ODS ETL columns (effective dates, current indicator,
active indicator, last modify date, create date, created by id and modified by id).
2) Dynamic Lookup this type of lookup enables the insert and updates of the data to be applied to
the lookup cache. After the row is added or altered in the cache, the result will be returned to the ETL
flow allowing for subsequent insert/update processing. Additional logic is necessary to synchronize
the lookup cache with the target table. This approach is useful if it is necessary to keep the cache up
to date during processing. See Informatica documentation.
3) Timestamp Detection Use a timestamp that is to determine source changes.

Page 41 of 55

Informatica ETL Standards and Guidelines

9. Logging
A typical solution that uses Informatica and non-Informatica applications will have several layers of logging:
1) Informatica session logging
2) Informatica workflow logging
3) Non Informatica logging
9.1 Informatica Session Logging

Informatica will log data at run time at the session level for every session. Informaticas default is to give the
log file the same name as the session file with a .log extension and to overwrite the log file every time the
session runs.
Informatica also has the ability to archive log files and in this case, the old log files are stored in the same
directory as current log files but with a different name. Informatica can store log files for a specified number
of runs or based on timestamp (the lowest level of granularity being minute).
To store log files for a specified number of runs, Informatica adds a sequential number to the end of the log
file name (0 representing the current log). This setting is available at the server level (affecting all
applications) or at the individual session level (requiring more control and maintenance).
To store log files by timestamp, this setting is configured at the individual session level (requiring more
control and maintenance).
The standard is to allow Informatica to maintain log files and history. One advantage of letting Informatica
generate log files is in this example: If a session processes twice in a schedule, it may be advantageous to
have access to the log data from both runs. This will be lost if a single log is retained.
The standard for session logging is use Informaticas versioning of log files with a default setting of keeping
10 logs. With each run, log 9 would be removed, log 8 become log 9, log 7 become log 6, etc
9.2 Informatica Workflow Logging

Informatica maintains workflow log details in a similar manner to maintaining session log details. The
standards for workflow logging are the same as session logging using Informaticas versioning of log files
and to retain 10 versions of the log.
9.3 Non-Informatica Logging

Non-Informatica applications (for example Unix scripts) should log status data to flat files. This can be used
for troubleshooting purposes. These logs should be stored by application to allow easier maintenance
(archiving). These files will be generated with dates and times and maintained in a similar manner to
Informatica session and workflow logs.
Run time statistics for non-Informatica processes may be captured in a relational table. This can be used to
monitor run time statistics.

Page 42 of 55

Informatica ETL Standards and Guidelines

10. Error/Exception Handling and Recovery


Informatica has features that will assist in the handling of errors and recovery. When combined with standard
error and recovery practices this provides a solid framework.
10.1 Essential Error Trapping and Recovery

This approach contains the basic essentials features that are inherent with Informatica and the design of the
ETL mappings. This approach will consist of minimum data handling to avoid potential errors, session,
worklet and workflow configuration.
In many instances, there are aspects of data that either require a relationship to another piece of data or is a
required field. In these events, fundamental solutions can be instituted to fill in the gaps or build the
relationships so the data continues to maintain integrity and value and error conditions are avoided.
The configuration of session, worklets and workflows affect the operation of data loads. The configuration is
flexible allowing for the configuration of tightening or loosening error handling. If a enterprise scheduler is
also required, then the failure and recovery of jobs can be controlled at both levels. Some guidelines:
1) All sessions should have an error threshold of 1. Hence, the session will fail when the first error is
encountered.
2) All sessions and worklets should be configured to fail the parent worklet/workflow when it fails

Figure 272 - Session Runtime Configuration

3) Conditions should be set between sessions or worklets which evaluate the status of the session or
worklet

Page 43 of 55

Informatica ETL Standards and Guidelines


10.2 Informatica Errors and Warnings

Fatal errors often occur due to a violation of an environment condition (hardware, infrastructure services, and
ETL services), database or ETL logic/function. In addition, errors due to data anomalies may occur, but
those are typically manifested in an ETL function or database constraint error condition. Warnings in context
of ETL are those conditions where data situations or issues are encountered and handled according to a predefined set of rules. Encountering these types of conditions may require alteration of the pre-defined set of
rules or fall back to a default result. Data quality programs and best practices will define the approach and
handling of data aberrations.
The following table depicts the typical conditions and the action typically taken.
Condition
Informatica mapping transformation or
function failure
SQL failure condition
- Key constraint violation
- Null constraint violation
- Data type violation
- Foreign key violation
- General SQL failure
Workflow/worklet flow failure

Warning/
Error
Error
Error

Error

Action Taken
Developer must correct the failing function or
transformation
Developer or DBA must correct the condition

Data type of incoming data does not


match target data
Foreign key violation

Warning

Developer must correct the failing workflow or


worklet
The target field is defaulted to a standard value

Warning

Default the foreign key to a standard key value

Null constraint violation

Warning

The target field is defaulted to a standard value

Figure 283 - Warning and Failure Conditions

Page 44 of 55

Informatica ETL Standards and Guidelines

11. ETL Control


An ETL control process is needed to be defined in order to manage the ETL execution and record runtime
stats. Since the acquisition of data is often time driven, the ETL process may require a set of dates and/or
time intervals by which data is filtered for extraction.
This section includes a sample ETL Control solution. The actual ETL Control process will be
designed based on Freddie Mac business requirements.
ETL Control Table this database table contains information to help drive ETL processing for a specific
source. This may include dates and/or times that indicate the scope of data extracted from a source.
ETL Transform Rule since hard-coding codes or other attributes is not recommended, this database table
is used to take the place of hard coding values or filtering conditions in ETL. For ETL control purposes, this
will contain rules to help drive the generation of the ETL Control.
Parameter File these are parameters used within Informatica components to drive the ETL process. In
terms of ETL control, key dates or times will be placed into the parameter file to enable the Informatica
components (mapping) a means to filter or narrow the scope of incoming data from a particular source.
11.1 ETL Control Data Model

The ETL_CONTROL table is used to assist in the automation of the ETL process. It essentially contains
dates for a specific source/target combination used to drive the scope of the incoming data, as well as,
record the process event. The following table depicts the columns in the ETL_CONTROL.
Column
ETL_CONTROL_KEY
SOURCE_SYS_CD

Data Type
INTEGER
VARCHAR2(20)

TARGET_SYS_CD

VARCHAR2(20)

EXTRACT_FROM_TM

TIMESTAMP

EXTRACT_THRU_TM

TIMESTAMP

LAST_BUSN_PROC_TM

TIMESTAMP

ETL_START_TM

TIMESTAMP

ETL_END_TM

TIMESTAMP

ETL_PROC_COMPLETE_FLG

INTEGER

Purpose
Surrogate Key
The code that represents the source
system.
The code that represents the target
system.
The date for which date sensitive data is
extracted from. It is used to populate
effective from dates in versioned tables.
The date for which date sensitive data is
extracted thru (inclusive).
This represents the date and/or time for
which the data is valid through. Typically,
this is yesterday in the EDW, yet could be
an end of the month for a data mart. Used
to populate the ETL create and last update
dates.
The date and time for which the ETL run
began.
The date and time for which the ETL run
completed.
A 0/1 flag that indicates: 0 run in process
or halted, 1- successful completion.

Some key notes about the population of this table are:


Page 45 of 55

Informatica ETL Standards and Guidelines

Once the ETL_CONTROL is initially configured (set upon deployment), it is completely maintained by
the ETL process
Each time a workflow commences execution, the ETL_CONTROL will roll the ETL_CONTROL
forward based upon a rule established in the ETL_TRANSFORM_RULE table
At successful completion of the workflow, the appropriate row will be closed, by updating the
completion flag and ETL end times
The ETL_CONTROL dates are subsequently placed into the appropriate parameter file. See the
following sections.

11.2 ETL Transform Rule Data Model

For purposes of supporting ETL control processing, ETL_TRANSFORM_RULE table is defined as follows:
Column
ETL_TRANSFORM_RULE_KEY
RULE_PRIMARY_ID
RULE_SECONDARY_ID
RULE_VALUE_NBR
RULE_VALUE_DT
RULE_VALUE_TXT

Data Type
NUMBER
VARCHAR2(30)
VARCHAR2(30)
NUMBER
DATE
VARCHAR2(255)

RULE_DESC
VALID_FROM_DT

VARCHAR2(2056)
DATE

VALID_THRU_DT

DATE

Purpose
Surrogate Key
The primary identifier or key.
The secondary identifier or key.
A numeric value that is used in the rule. E.g. 1
A date that is used in a rule.
The text value that is used for a rule. For the
ETL control purposes, it will be either
HOURLY, DAILY, MONTHLY, YESTERDAY (or
any other type)
A detailed description and purpose of the rule
The effective start date for the rule. E.g. 01JAN-1900
The last day the rule is valid. E.g. 31-DEC-9999

11.3 Rolling ETL Control Forward

Rolling the ETL Control forward is critical to the automated load process. The ETL process will read the
ETL_CONTROL table for the applicable source and target combination and roll the effective time forward,
based upon rules set forth in the ETL_TRANSFORM_RULE table.
Since the ETL_TRANSFORM_RULE table contains the rules regarding how to increment the effective dates,
refer to the following for the possible increment type situations.
1) Get the latest (most recent EXTRACT_FROM_TM and completed successfully) ETL_CONTROL
based upon the source, target combination (SOURCE_SYS_CD & TARGET_SYS_CD)
2) Get from the ETL_TRANSFORM_RULE table the rule for incrementing by matching the
SOURCE_SYS_CD & TARGET_SYS_CD to RULE_PRIMARY_ID & RULE_SECONDARY_ID
respectively for the valid date range.
3) Increment based upon the following:
RULE_VALUE_TXT this contains the increment cycle for the source/target. When set to:
HOURLY the timestamp will roll to the next time period that is not in the future
1. ETL_CONTROL.EXTRACT_FROM_TM is set to the most recent
ETL_CONTROL.EXTRACT_THRU_TM + 1 second
2. ETL_CONTROL.EXTRACT_THRU_TM is set to the most recent
ETL_CONTROL.EXTRACT_THRU_TM + xx hours (xx = RULE_VALUE_NBR)
3. ETL_CONTROL.LAST_BUSN_PROC_DT is set to EXTRACT_THRU_TM
DAILY the timestamp will roll to the next time that is not in the future
Page 46 of 55

Informatica ETL Standards and Guidelines

1. ETL_CONTROL.EXTRACT_FROM_TM is set to the most recent


ETL_CONTROL.EXTRACT_THRU_TM + 1 second
2. ETL_CONTROL.EXTRACT_THRU_TM is set to the most recent
ETL_CONTROL.EXTRACT_THRU_TM + xx day(s) (xx = RULE_VALUE_NBR)
3. ETL_CONTROL.LAST_BUSN_PROC_DT is set to EXTRACT_THRU_TM
YESTERDAY the timestamp will roll through yesterday.
1. ETL_CONTROL.EXTRACT_FROM_TM is set to the most recent
ETL_CONTROL.EXTRACT_THRU_TM + 1 second
2. ETL_CONTROL.EXTRACT_THRU_TM is set to yesterdays date
3. ETL_CONTROL.LAST_BUSN_PROC_DT is set to EXTRACT_THRU_TM
MONTHLY the timestamp will roll to the next month that is not in the future
1. ETL_CONTROL.EXTRACT_FROM_TM is set to the first day of the month by adding one
month to the most recent ETL_CONTROL.EXTRACT_FROM_TM + 1 month
2. ETL_CONTROL.EXTRACT_THRU_TM is set to end of the month for the new
EXTRACT_FROM_TM
3. If the dates are in the future, the ETL will ABORT since it is attempting to process future data

Examples:
HOURLY

The increment frequency is set to HOURLY incrementing by 4 hours.


Column
Recent
SOURCE_SYS_CD
PSFT
TARGET_SYS_CD
STG
EXTRACT_FROM_TM
15-OCT-2008 00:00:00
EXTRACT_THRU_TM
15-OCT-2008 03:59:59
LAST_BUSN_PROC_DT
15-OCT-2008 03:59:59

New
PSFT
STG
15-OCT-2008 04:00:00
15-OCT-2008 07:59:59
15-OCT-2008 07:59:59

DAILY

The increment frequency is set as DAILY incrementing by 1 day.


Column
Recent
SOURCE_SYS_CD
PSFT
TARGET_SYS_CD
STG
EXTRACT_FROM_TM
15-JUN-2008 00:00:00
EXTRACT_THRU_TM
15-JUN-2008 23:59:59
LAST_BUSN_PROC_DT
15-JUN-2008 23:59:59

New
PSFT
STG
16-JUN-2008 00:00:00
16-JUN-2008 23:59:59
16-JUN-2008 23:59:59

MONTHLY

The increment frequency is set as MONTHLY incrementing by 2 months.


Column
Recent
New
SOURCE_SYS_CD
PSFT
PSFT
TARGET_SYS_CD
IDR
IDR
EXTRACT_FROM_TM
01-MAY-2008 00:00:00
01-JUL-2008 00:00:00
EXTRACT_THRU_TM
30-JUN-2008 23:59:59
31-AUG-2008 23:59:59
LAST_BUSN_PROC_DT
30-JUN-2008 23:59:59
31-AUG-2008 23:59:59

Page 47 of 55

Informatica ETL Standards and Guidelines


11.4 Parameter File and ETL_CONTROL Synchronization

Any date sensitive parameter is updated after the ETL_CONTROL has rolled forward for a run (see prior
sections) and prior to any session processing The parameter dates are populated from the ETL_CONTROL
using the mapping (m_PARMETER_FILE_INITIALIZATION). The synchronization occurs as follows:
1) The ETL_CONTROL is rolled forward (see mapping s_m_EDW_ETL_CONTROL_START) based
upon the parameter source/target values. Note if new date sensitive parameters are required, the
mapping must be changed incorporating the new parameter.
2) A copy of the session that was dropped into a worklet points to its parameter file.
3) The mapping will read the parameter file and detect those strings that contain a date and substitute
the dates that exist in the ETL_CONTROL. See the following table.
Parameter
$$EXT_FROM_TM
$$EXT_THRU_TM
$$ETL_EFF_FROM_TM
$$ETL_EFF_THRU_TM

$$LAST_BUSN_PROC_TM
$$ETL_LAST_MOD_TM
$$ETL_CREATE_TM
$$INFINITY_TM

ETL_CONTROL (after rolled forward)


EXTRACT_FROM_TM
Note upon table initialization, this will be set to
1900-01-01 00:00:00
EXTRACT_THRU_TM
EXTRACT_FROM_TM
EXTRACT_FROM_TM - 1 second
Note upon inserts the ETL logic will use the $
$INFINITY_TM parameter, whereas an update of a
row, this value ($$ODS_EFF_THRU_TM) is used
LAST_BUSN_PROC_TM
System Time
System Time
Set to 9999-12-31 00:00:00

Figure 29 - Parameter and ETL_CONTROL Sync

4) A temporary file is created and subsequently renamed by a command task to the expected
parameter file name
5) The next worklet, in the execution path, will pick up the updates. Note the workflow will continue to
have the old values of the parameter file, whereas the worklets will properly pick up the values in the
updated file.
11.5 ETL Start

The initial ETL Control mapping is key to ensuring ETL processing is executing properly. This mapping fulfills
the following needs:
1) Acquires the most recent ETL_CONTROL for the indicated source and target combination
respectively where the completion is successful.
ETL_CONTROL.SOURCE_SYS_CODE= '$$SRC_SYS_CD'
AND ETL_CONTROL.TARGET_SYS_CD= '$$TGT_SYS_CD
AND ETL_CONTROL.ETL_PROC_COMP_FLG=1
2) It performs a lookup in ETL_TRANSFORM_RULE to acquire the increment rule (see section Error:
Reference source not found Error: Reference source not found)
3) The process will stop processing when the following conditions are met:
a. Could not find a matching ETL_CONTROL row
b. Found an ETL_CONTROL row already existing for the increment start date
c. The increment start date is in the future
d. The increment end date is before the increment start date
4) The dates are incremented and inserted as a new row into ETL_CONTROL

Page 48 of 55

Informatica ETL Standards and Guidelines

Figure 30 - Start ETL Control Mapping


11.6 ETL End

The ending ETL Control mapping simply closes out the Workflow run by recording relevant completion code
and timing:
1) Acquires the most recent ETL_CONTROL for the indicated source and target combination
respectively where the completion is in process.
ETL_CONTROL.SOURCE_SYS_CODE= '$$SRC_SYS_CD'
AND ETL_CONTROL.TARGET_SYS_CD= '$$TGT_SYS_CD
AND ETL_CONTROL.ETL_PROC_COMP_FLG=0
2) The process will stop processing when the following conditions are met:
a. Could not find a matching ETL_CONTROL row
b. Multiple rows are returned
3) The relevant completion time and status is updated in the ETL_CONTROL row.

Figure 31 - End ETL Control Mapping

Page 49 of 55

Informatica ETL Standards and Guidelines


11.7 ETL Control Workflow

The ETL Control components in the workflows orchestrate the flow of incoming data. Essentially, there are
three events that drive the process: first, start the ETL process rolling forward the ETL_CONTROL; second,
initialize the parameter file based upon ETL_CONTROL; and finally, close out the ETL_CONTROL recording
completion.
The following worklet diagram depicts the execution of the ETL roll forward and parameter initialization
process.

Start

s_m_ODS_LOA
D_ETL_CONT
ROL_START

s_m_ODS_LOAD
$s_m_ODS_LO..._PARAMETER_
FILE_INITIA
LIZATION

c_ODS_Move_Ne
$s_m_ODS_LO... w_Control_Fi
le

Figure 32 - Sample Worklet to Roll Forward ETL_CONTROL & Initialize Parameter File

The following diagram depicts the worklet that marks the completion of the ETL process.

Start

s_m_ODS_LOA
D_ETL_CONT
ROL_END

Figure 33 - Sample Worklet for ETL_CONTROL Completion

Finally, the following workflow depicts the start and end worklets that drive the ETL process.

Start

wl_ODS_STAR
T

$wl_ODS_STA...

wl_ODS_1

$wl_ODS_1.St...

wl_ODS_2

$wl_ODS_1.St...

$wl_ODS_2.St...

wl_ODS_3

Figure 34 - Sample Workflow Depicting ETL Flow

Page 50 of 55

$wl_ODS_3.St... wl_ODS_END

Informatica ETL Standards and Guidelines

12. UNIX

12.1 UNIX Data Directories

The Informatica installation (INFA_HOME) is in /opt/Informatica/PowerCenter8.1.1. The project specific


folders and Unix scripts will reside under $INFA_HOME/server/infa_shared. The Informatica installation
and working directories could reside on a disk mounted separately from the sftp directories. This will ensure
that if there is a disk failure on the SFTP drive, Informatica can continue processing any files that have been
moved to the working directory.
Additionally, an archive directory will also exist on the SFTP drive. After an Informatica file has been
successfully processed through an entire ETL cycle it will be moved to the archive directory and compressed.
It should not be archived before this time as until a Cycle is complete there is no certainty that a file or file set
is valid. Only after it has been processed successfully can it be classified as a good file and only good files
should be archived and compressed.
The following image represents the shared folder structure of the Informatica installation directory. The
shared folder structure contains folders that, when represented with Informatica parameters, will house data
used in the ETL processing.
The server will have a base set of folders and each project, corresponding to an Informatica folder, will have
its own set of folders as well. The following is the listing of the base set of directories:
TgtFiles
BWParam
Backup
LkpFiles
SrcFiles
BadFiles
WorkflowLogs
Cache
Storage
Log
Temp
SessLogs
In the Production and QA environments these folder structures will need to be restricted to the userid and/or
group which the userid executing the Informatica code is utilizing. In the development environment, each
developer will need to have read/edit/execute (7) level access to these folders in order to accommodate
development and testing scenarios. To accommodate this each developer will need to have a userid in the
group which owns the folder (if 775 is used as above) or the access will need to be updated to be 777.

Page 51 of 55

Informatica ETL Standards and Guidelines

12.2 UNIX Landing Directories

The UNIX landing zones are directories in which ftp files from the source systems will be stored. There are
two classifications of landing zones. There will be inbound and outbound directories for each application.
There must be specific ftp users created for performing the sftp of files
1. SFTP Inbound:
/fmacdata/{project}/datain
2. SFTP Outbound:
3. /fmacdata/{project}/dataout/
12.3UNIX Informatica Environment Variables

The admin user .profile has the variables and paths set to allow Informatica system commands such as
INFA_HOME.
Manual startup and shutdown of the Informatica domain and services with the infaservice.sh script.
$INFA_HOME/server/tomcat/bin/infaservice.sh
12.4 UNIX Logging

Custom script logs will reside under the /scripts/logs folder.


12.5 UNIX Naming
Unless there is a standard shell used, all shell scripts should have a .sh extension.
Example: fdw_archive.sh

Page 52 of 55

Informatica ETL Standards and Guidelines

13. File Processing

13.1 File Types

It is expected that the content of the data extracts provided by various sources will reflect the data in terms of
timing, subject and extraction criteria. There are typically two differing extract formats.
Note Once the file formats have been determined, a detailed description of that format should be
communicated (file name, format, field characteristics, field description, etc.)

Master file master or reference data, such as WBS


Transaction file transactions, such as financials

13.2 File Format and Transfer Validation

In order to perform validation for transmitted files, the following items are helpful:
Master File
As of Date the end period for which the reference data is applicable
Row Count count of master rows in the file or checksum of the file
Checksum checksum of the file
Transaction File
Date or Date Range the period for which the transactions are applicable
Row Count count of transactions in the file
Balance Total a grand total field that represents a summation of a key value for all records in the
file
Checksum checksum of the file
There are a couple of options that can be utilized to fulfill these requirements.
1) Use of header, detail and trailer records in one file
2) Two separate files, a summary trigger/signal file consisting of the as of or range date(s), row count
and hash total and a data file that contains the actual business data.
The actual format will be determined depending on the capability of the extraction toolset.
13.3 File Naming

The naming of the incoming file is critical in understanding the source of that file. They should contain a
timestamp when possible. They should also have an appropriate extension. For example:
1) Data file if a relational source, then name with table name appended with a . dat extension (e.g.
source.dat). For non-relational source, then name with a meaningful name appended with extension
2) Trigger file if a relational source, source table name appended with an .ind extension (e.g.
source.ind). When the trigger file represents a series of files, the name should be recognizable (e.g.

Page 53 of 55

Informatica ETL Standards and Guidelines


13.4 File Notification

The fact that the source file has successfully landed in the landing zone, should trigger an event that notifies
the load process. There are several options in triggering this event:
1) Event periodically looks for a presence of a file
2) Schedule time where the file is expected to be there
The preferred option is to be immediately notified of the file presence to promote efficient and timely
processing rather than wait. The event can be triggered in a couple of ways:
1) Scheduling tool the source system scheduling tool will trigger an event or process that can be
recognized by the Informatica environment to subsequently launch a workflow. An integrated
corporate scheduling tool can notify a server to launch an Informatica workflow (via a UNIX
command line).
2) Informatica Event an Informatica workflow will contain an event that continually monitors the
landing zone for the presence of a file. The workflow is constantly polling and once the file shows,
the workflow will commence processing. After successful completion, the Informatica WorKflow
Manager, will schedule the workflow to await the presence of the file.
a. Trigger file the event can poll for a file that contains no business data, but is used only as
an indication that data is ready for processing. This is useful when waiting for a collection of
files. The trigger file will show only when all files are ready for processing.
b. Rename of the incoming data file - this is simply the source system renaming the
extension of the data file upon successful completion. For example, the file may be name
*.tmp during transfer and will be renamed to *.dat when complete.
13.5 File Landing

When files are pushed from the source system, each file should be placed in the appropriate landing
directory. The subsequent Informatica processes will acquire the file and move it to a location on the
Informatica service for load processing.
13.6 File Archiving

Every incoming file will be archived for recovery and audit purposes. Based upon the design requirements,
each file is retained for a period. Essentially the file is acquired, moved to working folder, processed and
then archived. They may also be tard and compressed (gzip).
The archived files are typically organized in a similar fashion to the incoming sources; hence, there would be
an archive folder for each source.
When archiving each file or directory, the name should reflect the file or directory and timeframe for which the
files are relevant. For example:
1) File [file name][timestamp].ext (e.g. sbo_2008110113:30:45.tar)
2) Directory [directory name][timestamp] (e.g. lms_20081103144500.tar)
The process for which a file is processed and then archived, should consist of
1) Move the file from the landing zone into the Informatica working directory
2) At the end of the workflow, a UNIX script will archive all of the files within the scope of the workflow
a. Keep the most recent files in the working directory
b. TAR and/or ZIP the referenced files, placing them in the appropriate directory

Page 54 of 55

Informatica ETL Standards and Guidelines

14. Scheduling

14.1 Informatica Workflow

The Informatica Workflow is a self-contained scheduler that integrates the ETL repository and processing
within its own scheduling function. Depending on enterprise standards, the Workflow may be used to
schedule the entire ETL process. At a minimum, all mappings must belong to at least one workflow.
14.2 Corporate Scheduler

The clients corporate standard is AutoSys. Since it is an enterprise scheduler, which can integrate jobs
across distributed servers, it may be used to encapsulate the Informatica workflows. The Informatica
pmcmd command can be wrapped in a shell script for further control of the job. This may also provide more
insight to production support personal that are exclusively monitoring production jobs via AutoSys.
14.3 Data Warehouse Administration Console (DAC)

The DAC is a centralized console that allows configuration, administration, loading, and monitoring of the
Oracle Business Analytics Warehouse. This includes execution plans that run pre-built Informatica mappings
through individual workflows. The DAC will be used to schedule the pre-built custom Informatica mappings
and any customization made for the Oracle Business Analytics Warehouse. The execution plans can be
executed via a corporate scheduler (AutoSys).

Page 55 of 55

Das könnte Ihnen auch gefallen