1. The "E" represents the ability to extract data with high performance and minimal impact to
the source system
2. The "T“ represents the ability to transform one or more data sets in batch or real-time into a
consumable format
3. The "L“ stands for loading data into a persistent or virtual data store
Why ETL is Still Relevant
1. Data needs to flow from source applications into analytic data stores in a controlled, reliable,
secure manner
2. Information needs to be standardized
3. Operational results need to be consistent and repeatable
4. Operational results need to be confirmed and transparent
5. Facilitates Integration of data from various data sources for building a Data warehouse
6. Businesses have data in multiple databases with different formats
7. Transformation is required to convert and to summarize operational data into a consistent,
business oriented format
8. Pre-Computation of any derived information
9. Makes data available in a query-able format
The Informatica Approach
Comprehensive, Unified, Open and Economical Approach
Informatica Products & Their Functionalities
There are a wide range of Products available under the Informatica product suite that helps satisfy the data
integration requirements within the enterprise and beyond
Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data
warehouses
Why Informatica?
PowerCenter:
An ETL tool ( Extract, Transform and Load)
The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used
in both Windows and Unix based systems.
PowerCenter can read from a variety of different sources and write to as many targets, while
transforming data in between.
Informatica Power Center Architecture
Is a Service Oriented Architecture called (SOA)
The Primary Responsibility of SOA is help other Services to Perform task with in a domain
The Informatica Uses a Client-Server Architecture that contain several client and Server
components
D
O
M
A
I
N
N
O
D
E
Domain
The Informatica domain is the fundamental administrative unit in Informatica. The domain
supports the administration of the distributed services.
A domain is a collection of nodes and services that you can group in folders based on
administration ownership.
Each Installation we can have only one Possible Domain and we can configure and Install
multiple Tools Such as PowerCenter ,IDQ,MDM,SAP BW Service,WebServices
Node
A node is the logical representation of a machine in a domain.
One node in the domain acts as a gateway to receive service requests from clients and route them to the
appropriate service and node.
A gateway node can run application services, and it can serve as a master gateway
node
Worker Nodes
A worker node is any node not configured to serve as a gateway. A worker node can
run application services, but it cannot serve as a gateway.
The Informatica Architecture comprises of the following components.
Server Components:
1. Repository Service
2. Integration Service
3. SAP BW service
Client Components:
1. Repository Manager
2. Designer
3. Workflow Manager
4. Workflow Monitor
1. Repository Service:
The Repository Service manages the metadata in the repository database. Also manages
connections to the Power Center repository
The Repository Service manages connections to the repository from client applications.
Mapping Designer, Workflow Manager, Workflow Monitor and Repository Manager interact
with the Repository using the Repository Service
The Repository Service ensures the consistency of metadata in the repository. It is
normally a multi-threaded process that retrieves, inserts, and updates metadata in the
repository database tables.
2. Integration Service:
The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to
extract data from, or load data into the SAP BW.
The Web services hub receives requests from web service clients and exposes PowerCenter
workflows as services.
Client Components
Client Tools
1. Repository Manager
4.Workflow Monitor
This is used to monitor the workflow load status
Various observation windows along with the log file capturing GUI is available for
proper monitoring
Informatica Repository:
The Informatica repository is a relational database that stores information, or metadata, used by the
Informatica Server and Client tools. This is the actual storage of Informatica PowerCenter.
Metadata: Its data about the data. It can also be referred as the Structure of the data without containing the
actual data
The Repository Server actually stores the Source, Target, Mapping, transformation, connection strings
etc. It also saves the values of parameters and variables (if set to store explicitly)
It stores the metadata in a proper Informatica readable standard and helps managing creation and
deletion of Repositories.
Along with the mapping information it also saves the administrative information such as roles,
permissions, credentials and versions.
Informatica Power Center –DI Solution
Service Manager. A service that manages all domain operations. It runs the application services and
performs domain functions on each node in the domain. Some domain functions include authentication,
authorization, and logging.
Application Services. Services that represent server-based functionality, such as the Model
Repository Service and the Data Integration Service. The application services that run on a node
depend on the way you configure the services.
Data Migration
A company purchases a new accounts payable application
PowerCenter can move the existing account data to the new application
»Preserves data lineage for tax, accounting, and other legally mandated purposes
Application Integration
To achieve the benefits of consolidation, Company B’s billing system must be integrated into
Company A’s billing system
Data Warehousing
Data warehouses put information from many sources together for analysis