Sie sind auf Seite 1von 33

Objectives

At the end of this session, you will be able to understand:


 The Information Economy
 ETL-an Overview
 Why ETL is still relevant?
 Informatica Overview
 The Informatica Platform
 Why Informatica
 Informatica Partners&Customers
 Informatica ArchitectureOverview&Components
 Usecase1-Loading Product Dimension table using Slowly changing dimension (SCD)
 Usecase2-Populate Sales summary table using Incremental Aggregation
 Job trends
 Scope of this course
The Information Economy
ETL -An Overview

ETL stands for Extraction, Transformation and Load

1. The "E" represents the ability to extract data with high performance and minimal impact to
the source system
2. The "T“ represents the ability to transform one or more data sets in batch or real-time into a
consumable format
3. The "L“ stands for loading data into a persistent or virtual data store
Why ETL is Still Relevant

1. Data needs to flow from source applications into analytic data stores in a controlled, reliable,
secure manner
2. Information needs to be standardized
3. Operational results need to be consistent and repeatable
4. Operational results need to be confirmed and transparent
5. Facilitates Integration of data from various data sources for building a Data warehouse
6. Businesses have data in multiple databases with different formats
7. Transformation is required to convert and to summarize operational data into a consistent,
business oriented format
8. Pre-Computation of any derived information
9. Makes data available in a query-able format
The Informatica Approach
Comprehensive, Unified, Open and Economical Approach
Informatica Products & Their Functionalities

There are a wide range of Products available under the Informatica product suite that helps satisfy the data
integration requirements within the enterprise and beyond

 Informatica's product is a portfolio focused on Data Integration:


 Data Integration & ETL
 Information Lifecycle Management
 Complex Event Processing
 Data Masking
 Data Quality
 Data Replication
 Data Virtualization
 Master Data Management
 Ultra Messaging

Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data
warehouses
Why Informatica?

1. Proven technology leadership

2. A track record of continuous innovation

3. The most neutral trusted partner

4. Long history of customer success


Introduction to PowerCenter

PowerCenter:
 An ETL tool ( Extract, Transform and Load)
 The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used
in both Windows and Unix based systems.
 PowerCenter can read from a variety of different sources and write to as many targets, while
transforming data in between.
Informatica Power Center Architecture
Is a Service Oriented Architecture called (SOA)

 The Primary Responsibility of SOA is help other Services to Perform task with in a domain

 The Informatica Uses a Client-Server Architecture that contain several client and Server
components

 A service-oriented architecture (SOA) can be defined as a group of services, which


communicate with each other. The process of communication involves either simple data
passing or it could involve two or more services coordinating same activity.
Client Tools

D
O
M
A
I
N

N
O
D
E
 Domain

 The Informatica domain is the fundamental administrative unit in Informatica. The domain
supports the administration of the distributed services.

 A domain is a collection of nodes and services that you can group in folders based on
administration ownership.

 Each Installation we can have only one Possible Domain and we can configure and Install
multiple Tools Such as PowerCenter ,IDQ,MDM,SAP BW Service,WebServices
 Node
A node is the logical representation of a machine in a domain.

One node in the domain acts as a gateway to receive service requests from clients and route them to the
appropriate service and node.

A node can be a gateway node or a worker node


Gateway Nodes
A gateway node is any node that you configure to serve as a gateway for the domain.
One node acts as the gateway at any given time. That node is called the master
gateway

A gateway node can run application services, and it can serve as a master gateway
node

The master gateway node is the entry point to the domain.

Worker Nodes
A worker node is any node not configured to serve as a gateway. A worker node can
run application services, but it cannot serve as a gateway.
The Informatica Architecture comprises of the following components.

Server Components:
1. Repository Service

2. Integration Service

3. SAP BW service

4. Web services hub

Client Components:

1. Repository Manager

2. Designer

3. Workflow Manager

4. Workflow Monitor

5. Power Center Administration Console (browser based)


Server Components

1. Repository Service:

 The Repository Service manages the metadata in the repository database. Also manages
connections to the Power Center repository
 The Repository Service manages connections to the repository from client applications.
Mapping Designer, Workflow Manager, Workflow Monitor and Repository Manager interact
with the Repository using the Repository Service
 The Repository Service ensures the consistency of metadata in the repository. It is
normally a multi-threaded process that retrieves, inserts, and updates metadata in the
repository database tables.
2. Integration Service:

The Integration service runs sessions and workflows

 The Integration Server/Service is an important service of the Power Center


Architecture/Framework
 When a workflow is started, the Integration Service reads metadata information
(mapping, session and session properties) from the repository.
 It assigns and materializes the parameter/variable values, extracts data from the
mapping sources and stores the data in memory while it applies the transformation rules
configured in the mapping and then loads the transformed data into the mapping
targets.
SAP BW service:

 The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to
extract data from, or load data into the SAP BW.

Web services hub:

 The Web services hub receives requests from web service clients and exposes PowerCenter
workflows as services.
Client Components

Client Tools

1. Repository Manager

 Used by the Administrator for administration configuration purpose.


 Manages the Power Center repository to assign permissions to users and groups,
manage folders, and view Power Center repository metadata.
 Normally, access to this component is not provided to everyone than of the
Administrators
2. Designer

 Used to Create and Design Mapping


 Used to create/import Source, Target, mapplet and transformations. It’s a
repository object
 A Mapping can be said as a program that carries the metadata of the data flow.
Once created it has to be saved to the Repository
3.Workflow Manager

 Used to create session and workflows


 A session can be called as a run time entity for the mapping. All the run time properties
for the mapping are to be configured here. It’s a repository object
 A workflow is the minimum unit which can be run by the Integration service in the
Informatica Server
 Multiple Re-usable/ non-reusable Session objects can be created inside a single
workflow

4.Workflow Monitor
 This is used to monitor the workflow load status
 Various observation windows along with the log file capturing GUI is available for
proper monitoring
Informatica Repository:

The Informatica repository is a relational database that stores information, or metadata, used by the
Informatica Server and Client tools. This is the actual storage of Informatica PowerCenter.
Metadata: Its data about the data. It can also be referred as the Structure of the data without containing the
actual data
 The Repository Server actually stores the Source, Target, Mapping, transformation, connection strings
etc. It also saves the values of parameters and variables (if set to store explicitly)
 It stores the metadata in a proper Informatica readable standard and helps managing creation and
deletion of Repositories.
 Along with the mapping information it also saves the administrative information such as roles,
permissions, credentials and versions.
Informatica Power Center –DI Solution

Informatica PowerCenter is the premium data integration solution available today


 “Database neutral” -will communicate with any database
 Powerful data transformations convert one application’s data to another’s format
The domain include the Service Manager and a set of application services:

Service Manager. A service that manages all domain operations. It runs the application services and
performs domain functions on each node in the domain. Some domain functions include authentication,
authorization, and logging.

Application Services. Services that represent server-based functionality, such as the Model
Repository Service and the Data Integration Service. The application services that run on a node
depend on the way you configure the services.
Data Migration
A company purchases a new accounts payable application

 PowerCenter can move the existing account data to the new application

»Preserves data lineage for tax, accounting, and other legally mandated purposes
Application Integration

Company A purchases Company B

To achieve the benefits of consolidation, Company B’s billing system must be integrated into
Company A’s billing system
Data Warehousing

 Data warehouses put information from many sources together for analysis

 Data is moved from many databases to the Data warehouse


Middleware
Informatica can connect variety of sources, including the most of the Application Sources

 SAP certified Data Integration tool


 Can pull and push data into SAP R3, SAP BW systems
 Have connectivity adapter for majority of the Application Sources
 Can be used as Middleware between two Applications like SAP R3, SAP BW etc.
Some Unique Features of Informatica

Single Administration console to Administer all the application services

Unified Users, Groups, Privileges and Roles admin across PC AE Tools


Single Sign on for all the client tool -Once you login to one client tool, others are automatically
logged in
In built version control

Grid and High availability

In built scheduling tool


Job Trends