Sie sind auf Seite 1von 160

Informatica Basics

Data Integration Life Cycle


Overview
Data Integration Life Cycle

3
1. Access

ƒ Access
All data must be accessed, regardless of its source or structure.
ƒ Data must be extracted from arcane
– Mainframe systems
– Relational databases
– Applications
– XML
– Messages
– Spreadsheets.
ƒ Products used in Informatica for Data Access are
ƒ Informatica PowerExchange
ƒ Informatica B2B Data Exchange

4
Informatica PowerExchange

ƒ Access and update enterprise data without needing specialized


programming skills in:
ƒ Major enterprise and packaged applications, whether on-
premise, outsourced, or hosted software as a service (SaaS)
ƒ All major enterprise database systems and data warehousing
environments
ƒ Mainframe systems
ƒ Midrange systems
ƒ Message-oriented middleware (MOM)
ƒ Industry-wide technology standards such as email, JMS, LDAP,
and Web services

5
Access Major Enterprise and Packaged
Applications ( Supported Application )

6
Database and Data Warehouse Environment

7
Get Secure, Scalable, Real-Time Access to
Mainframe Data

ƒ You can rely on PowerExchange to provide secure,


scalable, real-time access to your mainframe data.

8
Message-Oriented Middleware

ƒ You can rely on PowerExchange to provide secure, scalable,


real-time access to message queues.
ƒ Supported Systems

9
PowerExchange Features
Accelerated Access to Enterprise Application Data

ƒ No specialized programming skills required to access or update


application-managed data

ƒ Designed for minimal administration effort

ƒ Use flexible object filtering techniques to rapidly locate needed data

ƒ Use a point-and-click interface to speed development and reduce errors

Optimized for Each Application

ƒ Support native datatypes and special features of each application

ƒ Use available native interfaces and high-speed APIs

ƒ Expose the capabilities of each application with a purpose-built GUI

10
PowerExchange Features

Fully Integrated with the Informatica Platform

ƒ Share available metadata automatically with the entire


Informatica platform

ƒ Leverage PowerCenter’s reusability, failure recovery, and high


performance

ƒ Interface with Informatica Data Quality to optionally cleanse


application-managed data

ƒ Enable further enrichment with data from other PowerExchange


sources

11
Informatica B2B Data Exchange

ƒ Informatica B2B Data Exchange is industry-leading


software for multi-enterprise data integration.
ƒ It adds secured communication, management, and
monitoring capabilities to handle data from internal and
external sources.

12
Informatica B2B Data Exchange Features
Universal Data Transformation ƒ Features full support for
complex flat files, including
deep hierarchy, complex
ƒ Provides single, centralized, looping, delimited, and fixed-
consistent, and reusable data and variable-width data
transformation service that
enables true “any to any” data ƒ Complex hierarchical industry
transformation standards:
ƒ HIPAA
ƒ HL7
ƒ NCPDP
ƒ Supplies unique support for: ƒ ACORD
ƒ DTCC
− Binary documents (e.g., ƒ MVR
PDF, Excel, Word) ƒ EDI
ƒ EDI-Fact
− Printing formats (e.g., AFP ƒ SWIFT
and PostScript) ƒ FIX
ƒ NACHA
ƒ Telekurs
− Large batch file and real- ƒ Provides native support for
time messages XML,

13
Data Integration Life cycle ( Discover )

ƒ Data sources—particularly poorly documented or


unknown sources—must be profiled to understand their
content and structure.
ƒ Patterns and rules implicit in the data must be inferred.
ƒ Potential data quality issues must be flagged.
ƒ Informatica Products are used for Discover are
ƒ Informatica Data Explorer

14
Informatica Data Explore

ƒ Informatica Data Explorer combines powerful


ƒ Data profiling
ƒ Data mapping

ƒ capabilities so that data analysts and data stewards—to investigate,


document, and resolve data quality issues.

ƒ Informatica Data Explorer provides a complete set of data investigation,


discovery, and mapping tools to scan every single data record from any
source.
ƒ Informatica Data Explorer also helps improve and maintain data quality over
time.
ƒ Informatica Data Explorer can get following Information about data used in
Application , Database and business Systems
ƒ Identifies gaps
ƒ Data inconsistencies
ƒ Data Redundancies
ƒ Data inaccuracies
ƒ .
15
Data Profiling and Data Mapping

Sophisticated Data Profiling Capabilities


ƒ Analyze data in three dimensions to automatically profile the content, structure, and
quality of highly complex data structures
ƒ Discover hidden inconsistencies and incompatibilities between data sources and
target applications
ƒ Automatically apply more than 500 prebuilt profiling rules
ƒ Easily customize new rules for automatically profiling processes for new data entries

Robust Data Mapping Capabilities


ƒ Generate accurate source-to-target mapping between different data structures and
define the necessary transformation specifications
ƒ Compare actual data and metadata sources to target application requirements
ƒ Identify data gaps, redundancies, and inaccuracies to be resolved before moving the
data
ƒ Identify data anomalies and create a normalized schema capable of supporting data

16
Data Integration Life cycle ( Cleanse )

ƒ Data must be cleansed to ensure its quality, accuracy,


and completeness.
ƒ Errors or omissions must be addressed.
ƒ Data standards must be enforced, and values must be
validated.
ƒ Duplicate data entries must be eliminated.
ƒ Informatica Products which are used for

ƒ Informatica Data Quality


ƒ Informatica Identity Resolution

17
Informatica Data Quality

ƒ Informatica Data Quality provides powerful data analysis


ƒ Data cleansing
ƒ Data matching
ƒ Exception handling

ƒ Reporting and monitoring capabilities that enable IT and the business


to manage enterprise-wide data quality initiatives.

18
Data Quality Features

Easy-to-Use Data Quality Workbench


ƒ Design, build, and manage enterprise-wide data quality programs using this
intuitive interface
ƒ Deploy data quality rules in real time and as batch processes to drive
ongoing data quality processes
ƒ Review and manually correct exceptions
ƒ Assess match clusters and manually select a master or “golden” record
ƒ Create reports and dashboards to monitor data quality improvement

Robust Data Quality Profiling Capabilities


ƒ Use business rules and reference data to analyze and rank data according
to completeness, conformity, consistency, duplication, integrity, and
accuracy
ƒ Identify, categorize, and quantify low-quality data

19
Data Quality Features

Comprehensive Data Cleansing and Parsing Capabilities


ƒ Cleanse, standardize, validate, enhance, and enrich all types of
master data
ƒ Standardize and validate mailing addresses for a wide range of
countries
ƒ Use business rules and reference data dictionaries to parse and
standardize free-form text data elements
Flexible Data Matching Capabilities
ƒ Identify relationships between data records to eliminate duplicates
before consolidation
ƒ Gain transparency and control over data quality using components
that can be applied to any data field
ƒ Apply data matching to any combination of master datatypes

20
Data Quality Features

Open, Content-Based Reference Data Dictionaries


ƒ Analyze and standardize content and implement data
quality business rules using reference dictionaries
ƒ Take advantage of extensive reference content
ƒ Leverage full read/write capability and familiar Microsoft
Excel-like functionality
ƒ Create, edit, and enhance reference data dictionaries at
any time
ƒ Use data quality reports and scorecards to track
improvements
21
Development Life Cycle (Integrate )

ƒ Integrate: To maintain a consistent view of data across


all systems, data must be integrated and transformed to
reconcile discrepancies in the way different systems
define and structure various data elements.
ƒ For example, the marketing and finance systems may
have completely different business definitions and data
formats for “customer profitability,” and these differences
require resolution.
ƒ Learn more
Informatica PowerCenter

22
Informatica PowerCenter

ƒ Informatica PowerCenter is a single, unified enterprise data integration platform


for accessing, discovering, and integrating data from virtually any business
system in any format, and delivering that data throughout the enterprise at any
speed.
ƒ Power Center Provides
ƒ Highly available
ƒ High-performing
ƒ Fully scalable, PowerCenter serves as the foundation for all data integration
projects and enterprise integration initiatives across the enterprise, including:
ƒ B2B data exchange
ƒ Data governance
ƒ Data migration
ƒ Data synchronization and replication
ƒ Enterprise data warehousing
ƒ Integration Competency Centers (ICC)
ƒ Master data management (MDM)
ƒ Service-oriented architectures (SOA)

23
Devlopmet Life Cycle (Deliver)

ƒ The right data must be delivered in the right format, at the right time,
to all the applications and users that need it.
ƒ Delivering data can range from a single data element or record in
support of a real-time business operation to millions of records for
trend analysis and enterprise reporting.
ƒ It also involves delivering inactive data to archives/history databases
and provisioning masked subsets of production data for non-
production systems.
ƒ Data must be both highly available and secure in its delivery.
ƒ Learn more
Informatica PowerExchange
Informatica B2B Data Exchange
Informatica Data Archive

24
Development Life Cycle
Audit, Manage, and Monitor

ƒ Data stewards and IT administrators need to collaborate to audit, manage, and monitor
data. Key metrics, such as data quality, are constantly measured with an eye toward
steady improvement over time.

ƒ The goal is to track progress on key data attributes and flag any new issues for resolution
and continual improvement once data is fed back into the data integration life cycle.

ƒ The Informatica Platform provides shared metadata to document where your data is, as
well as the business rules and logic associated with your data. The Platform shows the
impact of potential changes, which helps all roles respond more quickly and cost-
effectively to change.
Define, Design, and Develop

ƒ Business analysts, data architects, and IT developers need a powerful set of tools to help
them collaborate on defining, designing, and developing data integration rules and
processes.
ƒ The Informatica Platform includes common set of integrated tools to make sure all people
are working together effectively. The Platform also ensures that metadata is shared and
consistent across all data integration roles.
25
Informatica Power Center
Course Objectives

At the end of this course you will:


ƒ Understand how to use all major PowerCenter 8.6
components
ƒ Be able to perform basic Repository administration tasks
ƒ Be able to build basic ETL Mappings and Mapplets
ƒ Be able to create, run and monitor Workflows
ƒ Understand available options for loading target data
ƒ Be able to troubleshoot most problems

27
Extract, Transform, and Load
Operational Systems Decision Support
Data
RDBMS Mainframe Other Warehouse

• Transaction level data Aggregate Data


Cleanse Data • Aggregated data
• Optimized for Transaction Consolidate Data • Historical
Response Time Apply Business Rules
• Current De-normalize
• Normalized or De-
Normalized data
Transform

ETL Load
Extract

28
PowerCenter Architecture

29
PowerCenter Architecture
Server
native native
Sources Targets

TCP/IP
Repository
Heterogeneous Services Heterogeneous
Targets Targets

TCP/IP Repository
Agent

native
Repository Designer Workflow Workflow Rep Server Repository
Manager Manager Monitor Administrative
Console

Not Shown: Client ODBC Connections for Source and Target metadata 30
PowerCenter 8.6 Components

ƒ PowerCenter Domain
ƒ Power Center Node
ƒ PowerCenter Repository Services
ƒ Power Center Intergration Services
ƒ Power Center Reporting Services
ƒ PowerCenter Client
• Designer
• Repository Manager
• Administration Console
• Workflow Manager
• Workflow Monitor
ƒ External Components
• Sources
• Targets

31
Repository Topics

By the end of this section you will be familiar with:


ƒ The purpose of the Repository Server and Agent
ƒ The Repository Server Administration Console GUI
interface
ƒ The Repository Manager GUI interface
ƒ Repository maintenance operations
ƒ Security and privileges
ƒ Object sharing, searching and locking
ƒ Metadata Extensions
32
Repository Server
ƒ Each Repository has an independent architecture for the
management of the physical Repository tables
ƒ Components: one Repository Server, and a Repository
Agent for each Repository
Server

Repository
Server

Repository
Agent

Repository Repository Server


Manager Administration Console
Repository
Client overhead for Repository management is greatly
reduced by the Repository Server
33
Repository Server Features

ƒ Manages connections to the Repository from client


applications
ƒ Can manage multiple Repositories on different
machines on a network
ƒ Uses one Repository Agent process to insert, update
and fetch objects from the Repository database tables,
for each Repository it manages
ƒ Maintains object consistency by controlling object
locking

The Repository Server runs on the same system running the Repository Agent 34
Repository Server Administration Console

Use Repository Administration console to Administer Repository Servers and


Repositories through Repository Server. Following tasks can be performed:

ƒ Add, Edit and Remove Repository Configurations


ƒ Export and Import Repository Configurations
ƒ Create a Repository
ƒ *Promote a local Repository to a Global Repository
ƒ Copy a Repository
ƒ Delete a Repository from the Database
ƒ Backup and Restore a Repository
ƒ Start, Stop, enable and Disable a Repositories
ƒ View Repository connections and locks
ƒ Close Repository connections.
ƒ Upgrade a Repository

35
Repository Server Administration Console

Information HTML View


Nodes

Console Tree
Hypertext Links to
Repository
Maintenance Tasks

36
Repository Management
ƒ Perform all Repository
maintenance tasks through
Repository Server from the
Repository Server Administration
Console
ƒ Create the Repository
Configuration
ƒ Select Repository Configuration
and perform maintenance tasks:

• Create • Notify Users


• Delete • Propagate
• Backup • Register
• Copy from • Restore
• Disable • Un-Register
• Export Connection • Upgrade
• Make Global
37
Repository Manager

Use Repository manager to navigate through multiple folders


and repositories. Perform following tasks:

ƒ Manage the Repository


• Launch Repository Server Administration Console for this
purpose
ƒ Implement Repository Security
• Managing Users and Users Groups
ƒ Perform folder functions
• Create, Edit, Copy and Delete folders
ƒ View Metadata
• Analyze Source, Target, Mappings and Shortcut
dependencies. 38
Repository Manager Interface

Navigator
Window

Main Window

Dependency Window

Output Window

39
Users, Groups and Repository Privileges
Steps:

ƒ Create groups
ƒ Create users
ƒ Assign users to
groups
ƒ Assign privileges to
groups
ƒ Assign additional
privileges to users
(optional)

40
Managing Privileges

Check box assignment of privileges

41
Folder Permissions

ƒ Assign one user as the


folder owner for first tier
permissions

ƒ Select one of the owner’s


groups for second tier
permissions

ƒ All users and groups in


the Repository will be
assigned the third tier
permissions
42
Object Locking
ƒ Object Locks preserve Repository integrity
ƒ Use the Edit menu for Viewing Locks and Unlocking
Objects

43
Object Searching
(Menu- Analyze – Search)

ƒ Keyword search
• Limited to keywords
previously defined in
the Repository
(via Warehouse
Designer)

ƒ Search all
• Filter and search
objects

44
Object Sharing
ƒ Reuse existing objects
ƒ Enforces consistency
ƒ Decreases development time
ƒ Share objects by using copies and shortcuts
COPY SHORTCUT
Copy object to another folder Link to an object in another folder
Changes to original object not Dynamically reflects changes to
captured original object
Duplicates space Preserves space
Copy from shared or unshared folder Created from a shared folder

Required security settings for sharing objects:


• Repository Privilege: Use Designer
• Originating Folder Permission: Read
• Destination Folder Permissions: Read/Write
45
Adding Metadata Extensions

ƒ Allows developers and partners to extend the metadata


stored in the Repository
ƒ Accommodates the following metadata types:
• Vendor-defined - Third-party application vendor-created
metadata lists
• For example, Applications such as Ariba or PowerConnect for
Siebel can add information such as contacts, version, etc.
• User-defined - PowerCenter/PowerMart users can define
and create their own metadata
ƒ Must have Administrator Repository or Super User
Repository privileges

46
Sample Metadata Extensions

Sample User Defined


Metadata, e.g. - contact
information, business user

Reusable Metadata Extensions can also be created in the Repository Manager


47
Design Process

1. Create Source definition(s)


2. Create Target definition(s)
3. Create a Mapping
4. Create a Session Task
5. Create a Workflow from Task components
6. Run the Workflow
7. Monitor the Workflow and verify the results

48
Source Object Definitions

By the end of this section you will:


ƒ Be familiar with the Designer GUI interface
ƒ Be familiar with Source Types
ƒ Be able to create Source Definitions
ƒ Understand Source Definition properties
ƒ Be able to use the Data Preview option

49
Source Analyzer

Designer Tools

Analyzer Window

Navigation
Window

50
Methods of Analyzing Sources

Repository
ƒ Import from Database
ƒ Import from File
ƒ Import from Cobol File
ƒ Import from XML file
ƒ Create manually Source
Analyzer

Relational XML file Flat file COBOL file

51
Analyzing Relational Sources
Source Analyzer Relational Source
ODBC Table
View
Synonym
DEF

Repository
Server

TCP/IP
Repository
Agent
native

Repository
DEF
52
Analyzing Relational Sources
Editing Source Definition Properties

53
Analyzing Flat File Sources
Source Analyzer
Mapped Drive Flat File
NFS Mount
Local Directory DEF
Fixed Width or
Delimited

Repository
Server

TCP/IP
Repository
Agent
native

Repository
DEF
54
Flat File Wizard

ƒ Three-step
wizard
ƒ Columns can
be renamed
within wizard
ƒ Text, Numeric
and Datetime
datatypes are
supported
ƒ Wizard
‘guesses’
datatype
55
XML Source Analysis
Source Analyzer .DTD File
Mapped Drive
NFS Mounting
Local Directory DEF

DATA
Repository
Server

TCP/IP
Repository
Agent In addition to the DTD file, an
XML Schema or XML file
native can be used as a Source
Definition
Repository
DEF
56
Analyzing VSAM Sources
Source Analyzer .CBL File
Mapped Drive
NFS Mounting
DEF
Local Directory

Repository DATA
Server

TCP/IP
Repository
Agent Supported Numeric Storage Options:
COMP, COMP-3, COMP-6
native

Repository
DEF
57
VSAM Source Properties

58
Target Object Definitions

By the end of this section you will:


ƒ Be familiar with Target Definition types
ƒ Know the supported methods of creating Target
Definitions
ƒ Understand individual Target Definition properties

59
Creating Target Definitions

Methods of creating Target Definitions


ƒ Import from Database
ƒ Import from an XML file
ƒ Manual Creation
ƒ Automatic Creation

60
Automatic Target Creation

Drag-and-
drop a
Source
Definition
into
the
Warehouse
Designer
Workspace

61
Import Definition from Database
Can “Reverse engineer” existing object definitions
from a database system catalog or data dictionary

Warehouse
Database
Designer
ODBC

Table
Repository View
Server DEF Synonym

TCP/IP Repository
Agent

native
Repository DEF
62
Manual Target Creation
1. Create empty definition 2. Add desired columns

3. Finished target definition

ALT-F can also be used to create a new column


63
Target Definition Properties

64
Target Definition Properties

65
Creating Physical Tables

DEF

DEF

DEF Execute SQL


via
Designer

LOGICAL PHYSICAL
Repository target table Target database
definitions tables

66
Creating Physical Tables
Create tables that do not already exist in target database
ƒ Connect - connect to the target database
ƒ Generate SQL file - create DDL in a script file
ƒ Edit SQL file - modify DDL script as needed
ƒ Execute SQL file - create physical tables in target database

Use Preview Data to verify


the results (right mouse
click on object)
67
Transformation Concepts

By the end of this section you will be familiar with:


ƒ Transformation types and views
ƒ Transformation calculation error treatment
ƒ Null data treatment
ƒ Informatica data types
ƒ Expression transformation
ƒ Expression Editor
ƒ Informatica Functions
ƒ Expression validation
68
Transformation Types
Informatica PowerCenter 7 provides 23 objects for
data transformation

ƒ Aggregator: performs aggregate calculations


ƒ Application Source Qualifier: reads Application object sources as ERP
ƒ Custom: Calls a procedure in shared library or DLL
ƒ Expression: performs row-level calculations
ƒ External Procedure (TX): calls compiled code for each row
ƒ Filter: drops rows conditionally
ƒ Joiner: joins heterogeneous sources
ƒ Lookup: looks up values and passes them to other objects
ƒ Normalizer: reorganizes records from VSAM, Relational and Flat File
ƒ Rank: limits records to the top or bottom of a range
ƒ Input: Defines mapplet input rows. Available in Mapplet designer
ƒ Output: Defines mapplet output rows. Available in Mapplet designer
69
Transformation Types

ƒ Router: splits rows conditionally


ƒ Sequence Generator: generates unique ID values
ƒ Sorter: sorts data
ƒ Source Qualifier: reads data from Flat File and Relational Sources
ƒ Stored Procedure: calls a database stored procedure
ƒ Transaction Control: Defines Commit and Rollback transactions
ƒ Union: Merges data from different databases
ƒ Update Strategy: tags rows for insert, update, delete, reject
ƒ XML Generator: Reads data from one or more Input ports and
outputs XML through single output port
ƒ XML Parser: Reads XML from one or more Input ports and outputs
data through single output port
ƒ XML Source Qualifier: reads XML data

70
Transformation Views

A transformation has
three views:
ƒ Iconized - shows the
transformation in
relation to the rest of
the mapping
ƒ Normal - shows the
flow of data through
the transformation
ƒ Edit - shows
transformation ports
and properties; allows
editing
71
Edit Mode
Allows users with folder “write” permissions to change
or create transformation ports and properties
Define transformation
Define port level handling
level properties

Enter comments
Make reusable

Switch
between
transformations

72
Expression Transformation

Perform calculations using non-aggregate functions


(row level)

Passive Transformation
Connected

Ports
• Mixed
• Variables allowed
Click here to invoke the
Expression Editor
Create expression in an
output or variable port

Usage
• Perform majority of
data manipulation
73
Expression Editor
ƒ An expression formula is a calculation or conditional statement
ƒ Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy
ƒ Performs calculation based on ports, functions, operators, variables,
literals, constants and return values from other transformations

74
Informatica Functions - Samples

ASCII
CHR Character Functions
CHRCODE
CONCAT ƒ Used to manipulate character data
INITCAP
INSTR ƒ CHRCODE returns the numeric value
LENGTH (ASCII or Unicode) of the first character
LOWER of the string passed to this function
LPAD
LTRIM
RPAD
RTRIM
SUBSTR For backwards compatibility only - use || instead
UPPER
REPLACESTR
REPLACECHR
75
Informatica Functions

TO_CHAR (numeric) Conversion Functions


TO_DATE
TO_DECIMAL ƒ Used to convert datatypes
TO_FLOAT
TO_INTEGER
TO_NUMBER

ADD_TO_DATE Date Functions


DATE_COMPARE
DATE_DIFF ƒ Used to round, truncate, or
GET_DATE_PART compare dates; extract one part of
LAST_DAY a date; or perform arithmetic on a
ROUND (date) date
SET_DATE_PART
TO_CHAR (date) ƒ To pass a string to a date function,
TRUNC (date) first use the TO_DATE function to
convert it to an date/time datatype
76
Informatica Functions
Numerical Functions
ABS
CEIL ƒ Used to perform mathematical
CUME operations on numeric data
EXP
FLOOR
LN Scientific Functions COS
LOG COSH
MOD ƒ Used to calculate SIN
MOVINGAVG geometric values SINH
MOVINGSUM TAN
POWER of numeric data TANH
ROUND
SIGN
SQRT
TRUNC

77
Informatica Functions
Special Functions
ERROR
ABORT Used to handle specific conditions within a session;
DECODE search for certain values; test conditional
statements
IIF
IIF(Condition,True,False)

ISNULL
Test Functions
IS_DATE
IS_NUMBER
ƒ Used to test if a lookup result is null
IS_SPACES
ƒ Used to validate data

SOUNDEX
Encoding Functions
METAPHONE
ƒ Used to encode string values
78
Expression Validation

The Validate or ‘OK’ button in the Expression Editor will:


ƒ Parse the current expression
• Remote port searching (resolves references to ports in
other transformations)
ƒ Parse transformation attributes
• e.g. - filter condition, lookup condition, SQL Query
ƒ Parse default values
ƒ Check spelling, correct number of arguments in
functions, other syntactical errors

79
Variable Ports
ƒ Use to simplify complex expressions
• e.g. - create and store a depreciation formula to
be
referenced more than once
ƒ Use in another variable port or an output port expression
ƒ Local to the transformation (a variable port cannot also be
an input or output port)
ƒ Available in the Expression, Aggregator and Rank
transformations

80
Informatica Data Types
NATIVE DATATYPES TRANSFORMATION DATATYPES
Specific to the source and target PowerMart / PowerCenter internal
database types datatypes based on ANSI SQL-92
Display in source and target tables Display in transformations within
within Mapping Designer Mapping Designer

Native Transformation Native

ƒ Transformation datatypes allow mix and match of source and target


database types
ƒ When connecting ports, native and transformation datatypes must be
compatible (or must be explicitly converted) 81
Datatype Conversions
Integer Decimal Double Char Date Raw
Integer X X X X
Decimal X X X X
Double X X X X
Char X X X X X
Date X X
Raw X

ƒ All numeric data can be converted to all other numeric datatypes,


e.g. - integer, double, and decimal
ƒ All numeric data can be converted to string, and vice versa
ƒ Date can be converted only to date and string, and vice versa
ƒ Raw (binary) can only be linked to raw
ƒ Other conversions not listed above are not supported
ƒ These conversions are implicit; no function is necessary
82
Mappings

By the end of this section you will be familiar with:


ƒ Mapping components
ƒ Source Qualifier transformation
ƒ Mapping validation
ƒ Data flow rules
ƒ System Variables
ƒ Mapping Parameters and Variables

83
Mapping Designer

Transformation Toolbar

Mapping List

Iconized Mapping

84
Pre-SQL and Post-SQL Rules

ƒ Can use any command that is valid for the database


type; no nested comments
ƒ Can use Mapping Parameters and Variables in SQL
executed against the source
ƒ Use a semi-colon (;) to separate multiple statements
ƒ Informatica Server ignores semi-colons within single
quotes, double quotes or within /* ...*/
ƒ To use a semi-colon outside of quotes or comments,
‘escape’ it with a back slash (\)
ƒ Workflow Manager does not validate the SQL

85
Data Flow Rules
ƒ Each Source Qualifier starts a single data stream
(a dataflow)
ƒ Transformations can send rows to more than one
transformation (split one data flow into multiple pipelines)
ƒ Two or more data flows can meet together -- if (and only if)
they originate from a common active transformation
x Cannot add an active transformation into the mix

ALLOWED DISALLOWED

Passive Active

T T T T

Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are:
Mapplet Input and Joiner transformations 86
Connection Validation

Examples of invalid connections in a Mapping:


ƒ Connecting ports with incompatible datatypes
ƒ Connecting output ports to a Source
ƒ Connecting a Source to anything but a Source
Qualifier or Normalizer transformation
ƒ Connecting an output port to an output port or
an input port to another input port
ƒ Connecting more than one active
transformation to another transformation
(invalid dataflow)

87
Mapping Validation
ƒ Mappings must:
• Be valid for a Session to run
• Be end-to-end complete and contain valid expressions
• Pass all data flow rules
ƒ Mappings are always validated when saved; can be validated
without being saved
ƒ Output Window will always display reason for invalidity

88
Workflows
By the end of this section, you will be familiar with:
ƒ The Workflow Manager GUI interface
ƒ Workflow Schedules
ƒ Setting up Server Connections
x Relational, FTP and External Loader

ƒ Creating and configuring Workflows


ƒ Workflow properties
ƒ Workflow components
ƒ Workflow Tasks
89
Workflow Manager Interface

Task
Tool Bar

Workflow
Designer
Tools

Workspace
Navigator
Window

Output Window

Status Bar
90
Workflow Manager Tools

ƒ Workflow Designer
• Maps the execution order and dependencies of Sessions,
Tasks and Worklets, for the Informatica Server

ƒ Task Developer
• Create Session, Shell Command and Email tasks
• Tasks created in the Task Developer are reusable

ƒ Worklet Designer
• Creates objects that represent a set of tasks
• Worklet objects are reusable

91
Workflow Structure

ƒ A Workflow is set of instructions for the Informatica Server


to perform data transformation and load
ƒ Combines the logic of Session Tasks, other types of Tasks
and Worklets
ƒ The simplest Workflow is composed of a Start Task, a Link
and one other Task
Link

Start Session
Task Task

92
Workflow Scheduler Objects

ƒ Setup reusable schedules to


associate with multiple
Workflows
− Used in Workflows and
Session Tasks

93
Server Connections
ƒ Configure Server data access connections
− Used in Session Tasks

Configure:
1. Relational
2. MQ Series
3. FTP
4. Custom
5. External Loader

94
Relational Connections (Native )
ƒ Create a relational (database) connection
− Instructions to the Server to locate relational tables
− Used in Session Tasks

95
Relational Connection Properties
ƒ Define native
relational (database)
connection

User Name/Password

Database connectivity
information

Rollback Segment
assignment (optional)

Optional Environment SQL


(executed with each use of
database connection)

96
FTP Connection
ƒ Create an FTP connection
− Instructions to the Server to ftp flat files
− Used in Session Tasks

97
External Loader Connection
ƒ Create an External Loader connection
− Instructions to the Server to invoke database bulk loaders
− Used in Session Tasks

98
Task Developer
ƒ Create basic Reusable “building blocks” – to use in any Workflow
ƒ Reusable Tasks
• Session Set of instructions to execute Mapping logic
• Command Specify OS shell / script command(s) to run
during the Workflow
• Email Send email at any point in the Workflow
Session
Command
Email

99
Session Task
ƒ Server instructions to runs the logic of ONE specific Mapping
• e.g. - source and target data location specifications,
memory allocation, optional Mapping overrides,
scheduling, processing and load instructions

ƒ Becomes a
component of a
Workflow (or
Worklet)
ƒ If configured in
the Task
Developer,
the Session Task
is reusable
(optional)

100
Command Task
ƒ Specify one (or more) Unix shell or DOS (NT, Win2000)
commands to run at a specific point in the Workflow
ƒ Becomes a component of a Workflow (or Worklet)
ƒ If configured in the Task Developer, the Command Task is
reusable (optional)

Commands can also be referenced in a Session through the Session “Components”


tab as Pre- or Post-Session commands
101
Command Task

102
Additional Workflow Components

ƒ Two additional components are Worklets and Links


ƒ Worklets are objects that contain a series of Tasks

ƒ Links are required to connect objects in a Workflow

103
Developing Workflows
Create a new Workflow in the Workflow Designer

Customize
Workflow name

Select a
Server

104
Workflow Properties

Customize Workflow
Properties

Workflow log displays

Select a Workflow
Schedule (optional)

May be reusable or
non-reusable

105
Workflows Properties

Create a User-defined Event


which can later be used
with the Raise Event Task

Define Workflow Variables that can


be used in later Task objects
(example: Decision Task)

106
Building Workflow Components
ƒ Add Sessions and other Tasks to the Workflow
ƒ Connect all Workflow components with Links
ƒ Save the Workflow
ƒ Start the Workflow Save
Start Workflow

Sessions in a Workflow can be independently executed 107


Workflow Designer - Links
ƒ Required to connect Workflow Tasks
ƒ Can be used to create branches in a Workflow
ƒ All links are executed -- unless a link condition is used
which makes a link false
Link 1 Link 3

Link 2
108
Session Tasks

After this section, you will be familiar with:


ƒ How to create and configure Session Tasks
ƒ Session Task properties
ƒ Transformation property overrides
ƒ Reusable vs. non-reusable Sessions
ƒ Session partitions

109
Session Task

ƒ Created to execute the logic of a mapping (one mapping only)


ƒ Session Tasks can be created in the Task Developer
(reusable) or Workflow Developer (Workflow-specific)

ƒ Steps to create a Session Task


• Select the Session button from the Task Toolbar or
• Select menu Tasks | Create

Session Task Bar Icon

110
Session Task - General

111
Session Task - Properties

112
Session Task – Config Object

113
Session Task - Sources

114
Session Task - Targets

115
Session Task - Transformations
Allows overrides of
some transformation
properties
Does not change the
properties in the
Mapping

116
Session Task - Partitions

117
Monitor Workflows

By the end of this section you will be familiar with:


ƒ The Workflow Monitor GUI interface
ƒ Monitoring views
ƒ Server monitoring modes
ƒ Filtering displayed items
ƒ Actions initiated from the Workflow Monitor
ƒ Truncating Monitor Logs

118
Monitor Workflows
ƒ The Workflow Monitor is the tool for monitoring
Workflows and Tasks
ƒ Review details about a Workflow or Task in two views
• Gantt Chart view
• Task view

Task view
Gantt Chart view 119
Monitoring Workflows
ƒ Perform operations in the Workflow Monitor
• Restart -- restart a Task, Workflow or Worklet
• Stop -- stop a Task, Workflow, or Worklet
• Abort -- abort a Task, Workflow, or Worklet
• Resume -- resume a suspended Workflow after a
failed Task is corrected

ƒ View Session and Workflow logs


ƒ Abort has a 60 second timeout
• If the Server has not completed processing and
committing data during the timeout period, the
threads and processes associated with the Session
are killed

Stopping a Session Task means the Server stops reading data 120
Monitoring Workflows
Start Completion
Task View Task Workflow Worklet Time Time

Start, Stop, Abort, Resume


Status Bar Tasks,Workflows and Worklets
121
Monitor Window Filtering
Task View provides filtering

Monitoring filters
can be set using
drop down menus

Minimizes items
displayed in
Task View

Right-click on Session to retrieve the


Session Log (from the Server to the
local PC Client)
122
Debugger

By the end of this section you will be familiar with:


ƒ Creating a Debug Session
ƒ Debugger windows & indicators
ƒ Debugger functionality and options
ƒ Viewing data with the Debugger
ƒ Setting and using Breakpoints
ƒ Tips for using the Debugger

123
Debugger Features

ƒ Debugger is a Wizard driven tool


• View source / target data
• View transformation data
• Set break points and evaluate expressions
• Initialize variables
• Manually change variable values
ƒ Debugger is
• Session Driven
• Data can be loaded or discarded
• Debug environment can be saved for later use

124
Debugger Interface
Debugger windows & indicators Debugger Mode
indicator

Solid yellow
arrow Current
Transformation
indicator
Flashing
yellow
SQL
indicator

Transformation
Debugger Instance
Log tab Data window

Session Log tab Target Data window


125
Filter Transformation

Drops rows conditionally

Active Transformation
Connected

Ports
• All input / output

Specify a Filter condition

Usage
• Filter rows from
flat file sources
• Single pass source(s)
into multiple targets
126
Aggregator Transformation

Performs aggregate calculations

Active Transformation
Connected

Ports
• Mixed
• Variables allowed
• Group By allowed

Create expressions in
output or variable ports

Usage
• Standard aggregations
127
Informatica Functions

Aggregate Functions
AVG ƒ Return summary values for non-null data
COUNT in selected ports
FIRST
LAST ƒ Use only in Aggregator transformations
MAX
MEDIAN
ƒ Use in output ports only
MIN ƒ Calculate a single value (and row) for all
PERCENTILE records in a group
STDDEV
SUM ƒ Only one aggregate function can be
VARIANCE nested within an aggregate function
ƒ Conditional statements can be used with
these functions
128
Aggregate Expressions

Aggregate
functions are
supported
only
in the
Aggregator
Transformation

Conditional
Aggregate
expressions
are supported Conditional SUM format: SUM(value, condition)
129
Aggregator Properties

Sorted Input Property

Instructs the
Aggregator to
expect the data
to be sorted

Set Aggregator
cache sizes (on
Informatica Server
machine)

130
Sorted Data

ƒ The Aggregator can handle sorted or unsorted data


• Sorted data can be aggregated more efficiently, decreasing
total processing time

ƒ The Server will cache data from each group and


release the cached data -- upon reaching the first
record of the next group
ƒ Data must be sorted according to the order of the
Aggregator “Group By” ports
ƒ Performance gain will depend upon varying factors

131
Incremental Aggregation
MTD
Trigger in calculation
Session Properties,
Performance Tab

ƒ Cache is saved into $PMCacheDir: aggregatorname.DAT


aggregatorname.IDX
ƒ Upon next run, files are overwritten with new cache information

Example: When triggered, PowerCenter Server will save


new MTD totals. Upon next run (new totals), Server will
subtract old totals; difference will be passed forward

Best Practice is to copy these files in case a rerun of data is ever required.
Reinitialize when no longer needed, e.g. – at the beginning new month processing 132
Joiner Transformation

By the end of this section you will be familiar with:


ƒ When to use a Joiner Transformation
ƒ Homogeneous Joins
ƒ Heterogeneous Joins
ƒ Joiner properties
ƒ Joiner Conditions
ƒ Nested joins

133
Homogeneous Joins
Joins that can be performed with a SQL SELECT statement:
ƒ Source Qualifier contains a SQL join
ƒ Tables on same database server (or are synonyms)
ƒ Database server does the join “work”
ƒ Multiple homogenous tables can be joined

134
Heterogeneous Joins

Joins that cannot be done with a SQL statement:


ƒ An Oracle table and a Sybase table
ƒ Two Informix tables on different database servers
ƒ Two flat files
ƒ A flat file
and a
database
table

135
Joiner Transformation

Performs heterogeneous joins on records from


different databases or flat file sources

Active Transformation
Connected
Ports
• All input or input / output
• “M” denotes port comes
from master source
Specify the Join condition
Usage
• Join two flat files
• Join two tables from
different databases
• Join a flat file with a
relational table 136
Sorter Transformation

ƒ Can sort data from relational tables or flat files


ƒ Sort takes place on the Informatica Server machine
ƒ Multiple sort keys are supported
ƒ The Sorter transformation is often more efficient than
a sort performed on a database with an ORDER BY
clause

137
Lookup Transformation

By the end of this section you will be familiar with:


ƒ Lookup principles
ƒ Lookup properties
ƒ Lookup conditions
ƒ Lookup techniques
ƒ Caching considerations

138
Lookup Transformation
Looks up values in a database table and provides
data to other components in a Mapping

Passive Transformation
Connected / Unconnected
Ports
• Mixed
• “L” denotes Lookup port
• “R” denotes port used as a
return value (unconnected
Lookup only)
Specify the Lookup Condition
Usage
• Get related values
• Verify if records exists or
if data has changed 139
Lookup Properties

Override
Lookup SQL
option

Toggle
caching

Native
Database
Connection
Object name

140
Additional Lookup Properties

Set cache
directory

Make cache
persistent

Set
Lookup
cache sizes

141
To Cache or not to Cache?
Caching can significantly impact performance
ƒ Cached
• Lookup table data is cached locally on the Server
• Mapping rows are looked up against the cache
• Only one SQL SELECT is needed

ƒ Uncached
• Each Mapping row needs one SQL SELECT
ƒ Rule Of Thumb: Cache if the number (and size) of
records in the Lookup table is small relative to the
number of mapping rows requiring lookup
142
Update Strategy Transformation

By the end of this section you will be familiar with:


ƒ Update Strategy functionality
ƒ Update Strategy expressions
ƒ Refresh strategies
ƒ Smart aggregation

143
Update Strategy Transformation
Used to specify how each individual row will be used to
update target tables (insert, update, delete, reject)

Active Transformation
Connected

Ports
• All input / output

Specify the Update


Strategy Expression

Usage
• Updating Slowly
Changing Dimensions
• IIF or DECODE logic
determines how to
handle the record
144
Router Transformation

Rows sent to multiple filter conditions

Active Transformation
Connected
Ports
• All input/output
• Specify filter conditions
for each Group
Usage
• Link source data in
one pass to multiple
filter conditions

145
Router Transformation in a Mapping

146
Parameters and Variables

By the end of this section you will understand:


ƒ System Variables
ƒ Creating Parameters and Variables
ƒ Features and advantages
ƒ Establishing values for Parameters and Variables

147
System Variables
SYSDATE ƒ Provides current datetime on the
Informatica Server machine
• Not a static value

$$$SessStartTime ƒ Returns the system date value as a


string. Uses system clock on machine
hosting Informatica Server
• format of the string is database type
dependent
• Used in SQL override
• Has a constant value

SESSSTARTTIME ƒ Returns the system date value on the


Informatica Server
• Used with any function that accepts
transformation date/time data types
• Not to be used in a SQL override
• Has a constant value 148
Mapping Parameters and Variables

ƒ Apply to all transformations within one Mapping


ƒ Represent declared values
ƒ Variables can change in value during run-time
ƒ Parameters remain constant during run-time
ƒ Provide increased development flexibility
ƒ Defined in Mapping menu
ƒ Format is $$VariableName or $$ParameterName

149
Mapping Parameters and Variables
Sample declarations

Set the
User- appropriate
defined aggregation
names type

Set optional
Initial Value

Declare Variables and Parameters in the Designer Mappings menu


150
Sequence Generator Transformation

Generates unique keys for any port on a row

Passive Transformation
Connected

Ports
• Two predefined
output ports,
NEXTVAL and
CURRVAL
• No input ports allowed

Usage
• Generate sequence
numbers
• Shareable across mappings
151
Sequence Generator Properties

Number
of
Cached
Values

152
Dynamic Lookup

By the end of this section you will be familiar with:


ƒ Dynamic lookup theory
ƒ Dynamic lookup advantages
ƒ Dynamic lookup rules

153
Additional Lookup Cache Options

Make cache persistent

Cache File Name Prefix


• Reuse cache by
name for another
similar business
purpose

Recache from Database


• Overrides other
settings and Lookup
Dynamic Lookup Cache
data is refreshed • Allows a row to know about the
handling of a previous row
154
Persistent Caches

ƒ By default, Lookup caches are not persistent


ƒ When Session completes, cache is erased
ƒ Cache can be made persistent with the Lookup
properties
ƒ When Session completes, the persistent cache is
stored on server hard disk files
ƒ The next time Session runs, cached data is loaded
fully or partially into RAM and reused
ƒ Can improve performance, but “stale” data may pose
a problem

155
Dynamic Lookup Cache Advantages

ƒ When the target table is also the Lookup table,


cache is changed dynamically as the target load
rows are processed in the mapping
ƒ New rows to be inserted into the target or for
update to the target will affect the dynamic Lookup
cache as they are processed
ƒ Subsequent rows will know the handling of
previous rows
ƒ Dynamic Lookup cache and target load rows
remain synchronized throughout the Session run

156
Update Dynamic Lookup Cache

ƒ NewLookupRow port values


• 0 – static lookup, cache is not changed
• 1 – insert row to Lookup cache
• 2 – update row in Lookup cache
ƒ Does NOT change row type
ƒ Use the Update Strategy transformation before or after
Lookup, to flag rows for insert or update to the target
ƒ Ignore NULL Property
• Per port
• Ignore NULL values from input row and update the cache
using only with non-NULL values from input

157
Example: Dynamic Lookup Configuration

Router Group Filter Condition should be:


NewLookupRow = 1

This allows isolation of insert rows from update rows


158
Rank Transformation
Filters the top or bottom range of records

Active Transformation
Connected
Ports
• Mixed
• One pre-defined
output port
RANKINDEX
• Variables allowed
• Group By allowed

Usage
• Select top/bottom
• Number of records
159
Normalizer Transformation

Normalizes records from relational or VSAM sources

Active Transformation
Connected

Ports
• Input / output or output

Usage
• Required for VSAM
Source definitions
• Normalize flat file or
relational source
definitions
• Generate multiple
records from one record
160

Das könnte Ihnen auch gefallen