Beruflich Dokumente
Kultur Dokumente
Course Structure
What is ETL? What is Informatica?
Informatica Products
Informatica PowerMart training
Session I
- Overview of PowerMart
Session II
- Working with PowerMart Repository Manager
Session III
- Working with PowerMart Designer
Session IV
Working with PowerMart Server Manager Performance tuning Case Study - Test your Informatica skills Quiz
What is ETL?
ETL(Extraction, Transformation and Loading) is a process by which data is integrated and transformed from the operational systems into the datawarehouse environment
Extraction
80 tables
Oracle
50 tables
Sybase Target
Text files
Transformation
Source
Emp id
10001 10002
Last Name
Jones Holmes
First Name
Indiana Sherloc k
Staging Area
Loading
Data Warehouse
Direct Load
Source
Staging Area
What is Informatica?
A market leading provider of e-business infrastructure and analytic software which enables customers to automate the integration, analysis and real time delivery of critical corporate information via web,wireless and voice Informatica applications include
eCRM application eBusiness Operations application eProcurement
More than 1,370 customers, including 60 percent of the Fortune 100 companies are using Informaticas analytic solutions More than 900 companies are using Informatica products
What is Informatica
Founded in 1993 HQ : Redwood City, CA
Informatica Products
Informatica provides the following suite of products for data integration
PowerCenter- enterprise data integration hub PowerMart- application deployment platform PowerCenter.e- PowerCenter extension for e-business data PowerConnect- high performance data extraction PowerPlug- data model import utilities PowerBridge- Metadata bridge to Hyperion Essbase Analytic Business Components- developer productivity tools Mobile Access- delivery of corporate data and analytics via wireless devices and voice recognition
Session II
Viewing/removing locks Generating metadata reports Import/export registry Overview of PowerMart Designer Create/import source in Source analyzer Create/import target in Warehouse designer Understanding Transformation Objects Suggested naming conventions
Session IV
Create sessions Create batches Run a session/batch Performance tuning techniques Case study Quiz
Session I
Objective
Familiarize with Informatica Powermart and its components Hands on with Repository Manager
Components of PowerMart
PowerMart Designer
Multi-faceted tool for visually defining mappings and transformations
PowerMart Repository
An open metadata store for definitions about mappings, transformations and other data mart details
PowerMart Server
A pipelined, multi-threaded server engine that is able to overlap data extraction, transformation and loading
System Requirements
Navigator
Dependency
Status Bar
Main
Enter the Repository name,database username and password and Select the ODBC data source created previously and finally the native connect string and click OK. The list of tables created for the repository will be visible in the Output window
Creating Groups
Connect to a repository To create Groups choose Security>Manage Groups Click Add
Click OK
Creating Users
To create Users choose Security>Manage Users
Click Add
Enter a username Enter the password
twice to confirm it
Creating Users
Click Group Memberships
To Add the user to a group, select the group in the Not Member list, and click Add To Remove the user from a group, select the group in the Member List, and click Remove
Assign Privileges
Choose Security> Manage Privileges
Assign Privileges
Click Add
Repository groups without the selected privilege appear
Session II
Objective
Familiarize with Repository Manager Familiarize with Designer
Types of ports
Naming conventions for different objects
View/Remove Locks
Locks prevent the users from duplicating or overriding work Choose Edit > Show Locks to view all the locked objects
Username locking the object Folder containing the object Version containing the object Object type(folder, session, reusable transformation etc.) Object name Lock type(Read,Write,Execute,Save,Fetch) Lock time, hostname,Application(Server Manager, Designer, Server etc.)
Generate Reports
Types of Default Reports
Mapping report source/target dependencies report
Generate Reports
Click on Add to add a report to the installed report list
Select from the default reports available or custom reports created using Crystal reports and click OK To view or print a report select it from the Reports menu Select Print Preview to view the report
Generate Reports
Provide username, password and ODBC data source name
Click Print Preview and select the wildcard character if reports for all the folders, tables and versions have to be viewed
Import/Export Registry
Purpose- To simplify the process of setting up client systems The registry contains the following connection information
Repository name Database user name and password Repository user name and password ODBC data source name(DSN)
Dont forget to create the DSN before importing the registry as it does not include the ODBC data source itself
Import/Export Registry
To export the registry, choose Tools>Export Registry
In the dialog box enter the name for the file and Save To import the registry, choose Tools>Import Registry Select the file and Open it A dialog box confirms the merging of data source information
PowerMart Designer
Warehouse Designer
- to import or create target definitions
Transformation developer
- to create reusable transformations
Mapplet designer
- to create mapplets
Mapping designer
- to create mappings
Navigator
Source Analyzer
Mapping Designer
Warehouse Designer
Output
Workspace
Status Bar
Source Analyzer
Reads, analyzes and "reverse engineers" schema information of operational databases and flat files Stores metadata information in the repository
Source Analyzer
Click on Connect button and the list of tables appears Select the tables which you will be using as source tables Click OK to add the selected tables into the Source analyzer work space
Give a name to the new source, select the database type Click Create and then Done buttons to make the blank source appear in workspace
Source Analyzer
To enter column names, data types and field lengths double click on the newly created source structure Click on the Columns tab
Click on the Add a new column button to add new columns in the source and specify the details Click OK to accept changes
Warehouse Designer
Provides the following features
Create a new target Import the target structure
Select the newly created table and choose from menu bar Target >Generate/Execute SQL
Warehouse Designer
Connect to the warehouse by giving ODBC data source,user id and password Select the appropriate Generation options and click on Generate SQL file button To view/ edit the SQL click on Edit SQL file
Warehouse Designer
How to import target table definition?
Choose Target > Import from database Connect to the database by selecting the ODBC data source and the enter the user name and password
Select tables from the list of tables available in the database which will be used as targets Click OK to get the tables in the workspace
Mapping Designer
Visual aid to creating and editing source-to-target mappings. Dataflow diagramming.
Method of creating dataflow links through combinations of PowerMart 4.6's transformation objects.
Sources, targets and transformation objects can be dragged and dropped into a workspace to construct the transformation pipeline.
Transformation Objects
To create a transformation
Click on Transformation > Create
Transformation Objects
Select the transformation object and give a name to it Click on Create and then Done
Aggregator
Performs an aggregate calculation(Count, Average etc.)
Transformation Objects
Expression
Perform custom calculations of a simple or complex nature, using data from one or more input ports
Transformation Objects
Filter
Performs a test on all records before allowing them to be sent to the next object
Transformation Objects
Joiner
Joins data from disparate sources, such as mainframes, flat files and relational databases
Transformation Objects
Lookup
Looks up values
Transformation Objects
Sequence Generator
Generates unique ID values in the same fashion as a sequence in a relational database
Transformation Objects
Source Qualifier
Represents data temporarily stored on the PowerMart server
Transformation Objects
Stored Procedure
Calls a stored procedure and captures return values
Transformation Objects
Update Strategy
Defines how the PowerMart server should handle updates to existing records in targets
Transformation Objects
Rank
Performs comparisons and groupings
Filter
Joiner Lookup Normalizer Rank Sequence Generator Stored Procedure Source Qualifier Update Strategy
FIL_TransformationName
JNR_TransformationName LKP_TransformationName NRM_TransformationName RNK_TransformationName SEQ_TransformationName SP_TransformationName SQ_TransformationName UPD_TransformationName
Mappings
m_MappingName
Mapplets
mplt_MappletName
Sessions
s_MappingName
Sequential Batches
bs_SequentialBatchName
Concurrent Batches
bc_SequentialBatchName
Mapping Designer
Ports
To design the basic flow of data between source and targets Types: input, output, variable
Variable Ports
For aggregator, expression and rank transformations Use variable to
simplify complex expression store temporary data store values from prior rows
Session III
Objective
Hands on with Designer Familiarize with Server Manager
Mapping Creation
How to create a Mapping?
Open the Mapping designer workspace Choose Mappings > Create
Open the Sources from the navigator which you would have created/imported using source analyzer Drag and drop the source table from navigator into the workspace
Mapping Creation
Mapping Creation
Choose Transformation > Create to create a transformation object Select the type of transformation object you want to create and give a proper name to it
Mapping Creation
Drag and drop the required fields(ports) from Source Qualifier to the transformation object
Mapping Creation
Add new ports in the transformation object and define them as variable ports to do complex transformation
Mapping Creation
Open the Targets from the navigator which you would have created/imported using Warehouse Designer Select the target table, drag and drop it into the designer workspace
Mapping Creation
Drag and drop the output ports from the last transformation object to the corresponding ports in the target
Choose Repository > Save to store the mapping Check the output window for any errors
Mapping Wizards
Wizards help to create mappings quickly and easily Wizards designed to create mappings for loading and maintaining star schemas
Mapping Wizards
Slowly Changing Dimensions Wizard
Type I Dimension Mapping
- Keep most recent values in target
Source
Emp id Name Email Emp id
Target
Name Email
1001
Shane
Shane @xyz.c om
1001
Shane
Shane @xyz.c om
Source
Emp id Name Email Emp id
Target
Name Email
1001
Shane
Shane@ abc.co.in
1001
Shane
Shane@ abc.co.in
Shane@xyz. com
Mapping Wizards
Slowly Changing Dimensions Wizard
Type II Dimension Mapping
Version Data Mapping
- insert new and changed dimensions with version number and incremented primary key - full history and progress of changes
Target Source
Emp id 10 Name Shane Email Shane @xyz. com
PM_ PRIM ARY KEY 1000 Emp id Name Email PM_V ERSI ON_ NUM BER 0
10
Shane
Type II Dimension/Versioning
Source
Emp id 10 Name Email
Emp id 10 10
Name
PM_VERSION_N UMBER 0 1
Shane Shane
Target
Type II Dimension/Versioning
Source
Emp id 10 Name Email
PM_PRI MARYKE Y
Emp id 10 10 10
Name
PM_VERSION_N UMBER 0 1 2
Target
Mapping Wizards
Slowly Changing Dimensions Wizard
Type II Dimension Mapping
Flag Current Mapping
- insert new and changed dimensions with flags and incremented primary key - full history and flagging only current dimensions
Target Source
Emp id 10 Name Shane Email Shane @xyz. com
PM_ PRIM ARY KEY 1000 Emp id Name Email PM_CU RRENT _FLAG
10
Shane
Emp id 10 10
Name
PM_CURRENT_F LAG N Y
Shane Shane
Target
PM_PRI MARYKE Y
Emp id 10 10 10
Name
PM_CURRENT_F LAG N N Y
Target
Mapping Wizards
Slowly Changing Dimensions Wizard
Type II Dimension Mapping
Effective Date Range Mapping
- insert new and changed dimensions with date range to define current dimension data - full history and tracking changes with an exact effective date range
Source
Emp id 10 Name Email
Target
PM_ PRIM ARY KEY
1000
Emp id
Name
10
Shane
Emp id
Name
10
Shane
1001
10
Shane
03/01/00
Target
Emp id 10 10 10
Name
Target
Mapping Wizards
Slowly Changing Dimensions Wizard
Type III Dimension Mapping
- insert new and update values in existing dimensions
Target Source
Emp id 10 Name Email
PM_P RIMA RYKE Y 1 Emp id Name Email PM_P rev_ Colu mnN ame PM_E FFECT _DAT E 01/01 /00
10
Shane
PM_PRIMA RYKEY 1
Emp id
Name
10
Shane
Target
PM_PRI MARYKE Y 1
Emp id 10
Name
Shane
Shane@ abc.com
Target
Mapplet Designer
Mapplet
Reusable object that reflect set of transformation logic to use in multiple mappings
Mapplet Designer
Rules for Objects in Mapplets
Do not use the following in a mapplet
Joiner Cobol Source definition Normalizer Target definitions
Mapplet Designer
How to create a Mapplet?
Open the Mapplet designer workspace Choose Mapplets > Create
Create an input transformation to define mapplet input ports if the mapplet contains no sources
Creating a Mapplet
Creating a Mapplet
Create the transformation objects to be used in the mapplet One input transformation can be connected to only one transformation, so to pass same values to two separate data flows,connect the input transformation to another transformation and then split the data flow Use Output transformation to create output ports, creating one Output transformation for each mapplet output group
Connect all the input ports, ports in transformation objects and the output ports to complete the data flow
Creating a Mapplet
Choose Repository > Save to store the mapplet Check the output window for validation status If mapplet is not valid,correct the problem and re-save the mapplet
Monitor, add, edit, and delete Informatica server information in the repository
Server Manager
Sessions
set of instructions that tell Informatica Server how and when to move data from sources to targets
Batch
Group of sessions Types of batch
Sequential
Runs the sessions one after the other
Concurrent
Runs all the sessions at the same time
Navigator
Monitor
Output
Configure
Status Bar
Server Manager
How to add database connections?
Connect to a repository Choose Server Configuration > Database Connections
Click Add
Enter the following information
Click OK to add this connection to the Data Sources list Click Close to save all the changes
Session IV
Objective
Hands on with Server Manager Understand how to tune performance in Informatica
Creating Session
How to create a Session?
Click on the folder in the navigator which contains the mapping
Creating Session
Select Operations > Add Session from menu bar
Creating Session
Session wizard will appear
Select the Source type, how to treat rows and the database connection name for relational sources
For target, select the type and the database connection name(if relational)
Click Next
Creating Session
On the time page, enter a schedule for the session
Click Next
Creating Session
On the log files page, enter the settings for Session Log File name, Stop on(after no. of errors), Perform recovery, override tracing etc.
Click Next
Creating Session
On the Transformations page, enter override mapping or mapplet transformation attributes as needed
Creating Batches
How to create a batch?
Select the folder in the navigator for which sessions have been created Choose Operations > Add batch
Creating Batches
Specify whether a concurrent one Enter the schedule for the batch
Run a Session/Batch
Select the session/batch to be run from the folder in the navigator Make sure the Monitor option of the server is checked
Run a Session/Batch
Read the number of rows loaded, failed or the first error message from the dialog box Click on the Open Log File button to dig into why a session failed
Performance Tuning
Check Collect Performance data option in the session properties of a session in Server Manager
Performance Tuning
Source Qualifier
Buffer Input efficiency BufferOutput efficiency
Target
Evaluation
High
Low
High
Low
Source database slow, eliminate read bottleneck Target database slow, eliminate write bottleneck DTM slow, optimize session or mapping
Low
Low/High
Low
High
Low
High
High
Low
Performance Tuning
The counters help identify
Read/Write/DTM bottlenecks Caching problems
Transformation errors
Shared memory allocation problems
Performance Tuning
To avoid write bottleneck
Utilize SQL loader facility of the database Drop indexes before load and rebuild after the load Increase the database block size
Performance Tuning
To avoid DTM bottleneck
Run parallel sessions in Concurrent batches Use incremental aggregation for mappings that use aggregation
Performance Tuning
To avoid DTM bottleneck
Optimize mapping Optimize session
Click
Performance Tuning
For shared memory allocation problems
Increase the shared memory size
Performance Tuning
To optimize sessions
Increase shared memory size Increase buffer block size for very large row sizes
Performance Tuning
To optimize mapping
Utilize single pass reads, use SQL override Place filters, aggregators as close to source as possible
Reader
DTM
Writer
Case Study
Case I Product data is captured in two platforms. One in relational table and other in flat file. Combine data from these two sources and put them into the Product dimensional table Case II In a data warehouse, we create surrogate keys for defining the primary keys in dimension tables. Create surrogate keys for the product table created in previous case. Populate the fact table with the surrogate keys created in product dimension table as the foreign keys
Case Study
Case III The Employee table has data for employees situated in all the countries.
Whenever new employee data gets added or the existing data gets modified into the source table, the new/modified data needs to be loaded into the Employees dimension.
Create a mapping which checks for new/changed data in the source and loads only those records into the target
Case Study
Case IV
Create the following tables in the source database
1. Customers
Customer_id, Country Customer_name, Address, City, State
2. Employees
Employee_id, Address 3. Products Product_id, Product_name, item, unitprice First_name, last_name, Designation
Case Study
4. Orders order_id, customer_id, employee_id, order_date required_date 5. Order details order_id, product_id, unitprice, quantity
discount
In the target side, 1. create the dimension table for Customer,products and employees by adding an extra surrogate key columns respectively.
Case Study
2. Design a mapping for loading the data on the above mentioned tables by using slowly changing dimension wizard. Try to make use of Type1,Type2 and Type3 3. Create order_fact in the target db for loading orders. Design a mapping for loading the data to handle insert and update strategies. Populate customer_key and employee_key instead of populating customer_id,employee_id by using lookup transformations. 4. Create a order_details_fact table with the following measures Order_id, no_of_products, Tot_qty, Tot_price Try to make use of aggregator transformation.
Quiz
1. Where do you generate reports on Metadata?
a. Designer b. Repository manager
c. Server manager
d. Server
c. Report generation
d. Creating sessions
Quiz
3. When only one object of an entire mapping is reusable it is called?
a. Reusable transformation
b. Mapplet
c. Repeat transformation d. Duplicate transformation
Quiz
5. Where do you copy a mapping from one folder to another folder?
a. Designer
b. Repository manager
c. Server manager d. Server
Quiz
7. How many repositories can you create in one database?
a. 1
b.
c. d.
2
3 Any number
8.
a. b. c. d.
Quiz
9. Where do you create folders?
a. b. Designer Repository Manager
c.
d.
Server Manager
None of the above
c.
d.
Quiz
11. A mapplet should not have the following transformation
a. Source Qualifier
b.
c. d.
Joiner
Expression Aggregator
Quiz
13. How much is the default size for the index cache?
a. b. 100 MB 10 MB
c.
d.
1 MB
100 KB
14. How much is the default size for the data cache?
a. b. c. 100 MB 200 MB 2 MB
d.
100 KB