Beruflich Dokumente
Kultur Dokumente
Informatica PowerCenter 8 Level I Developer Student Guide Version 8.1 April 2006
Copyright (c) 19982006 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The Apache Software Foundation. All rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved. The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler. The Curl license provided with this Software is Copyright 1996-200, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved. The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/ pcre. InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved. Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Obtaining Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Obtaining Informatica Professional Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Providing Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
iii
Step 4: Create and Save Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Step 5: Launch the Workflow Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Step 6: Navigating the Workflow Manager Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Step 7: Workflow Manager Task Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Step 8: Database Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iv
Feature 3: Revert to Saved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Feature 4: Link Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Feature 5: Propagating Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Feature 6: Autolink by Name and Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Feature 7: Moving Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Feature 8: Shortcut to Port Editing from Normal View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Feature 9: Create Transformation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Feature 10: Scale-to-Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Feature 11: Designer Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Feature 12: Object Shortcuts and Copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Feature 13: Copy Objects Within and Between Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Step 2: Step Through the Debug Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Step 3: Use the Debugger to Locate the Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Step 4: Fix the Error and Confirm the Data is Correct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) . . . . 195
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 1: Create a Shortcut to a Shared Relational Source Table . . . . . . . . . . . . . . . . . . . . . . . 198 Step 2: Create a Shortcut to Shared Relational Target Table . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 4: Create Lookups for the Start and Expiry Date Keys . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 5: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
vii
Lesson 10-2. Aggregator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Aggregator Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Lesson 10-3. Active and Passive Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Lesson 10-4. Data Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Lesson 10-5. Self-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Step 12: Prepare, Run, and Monitor the Second Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Mapplets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Mapping Input Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Mapping Output Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
User-Defined Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Lesson 17-2. Event Raise Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Lesson 17-3. Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Lesson 17-4. Reusable Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Lesson 17-5. Reusable Session Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Lesson 17-6. Reusable Session Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Lesson 17-7. pmcmd Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
xi
xii
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure 2-1. Navigator Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2. DEV_SHARED Folder and Subfolders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3. Designer Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4. DEV_SHARED Target subfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5. Student folder with new objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6. Application Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7. Task Toolbar Default Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8. Task Toolbar After Being Moved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9. Relational Connection Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1. Normal view of the payment flat file definition displayed in the Source Analyzer . . . . 3-2. Mapping with Source and Target Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3. Normal view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4. Completed Session Task Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6. Successful Run of a Workflow Depicted in the Task View of the Workflow Monitor . 3-7. Properties for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . 3-9. Data Preview of the STG_PAYMENT Target Table . . . . . . . . . . . . . . . . . . . . . . . . . 3-10. Source Definitions with a PK/FK Relationship Displayed in the Source Analyzer . . . 3-11. Normal View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12. Generated SQL for the m_Stage_Product Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 3-13. Properties of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 3-15. Data Preview of the STG_PRODUCT Target Table . . . . . . . . . . . . . . . . . . . . . . . . 3-16. Normal view of the promotions flat file definition displayed in the Source Analyzer . 3-17. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18. Properties of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 3-20. Data Preview of the STG_DEALERSHIP Target Table . . . . . . . . . . . . . . . . . . . . . . 3-21. Data Preview of the STG_PROMOTIONS Target Table . . . . . . . . . . . . . . . . . . . . 4-1. Source Analyzer View of the customer_layout Flat File Definition . . . . . . . . . . . . . . . 4-2. Target Designer View of the STG_CUSTOMERS Table Relational Definition . . . . . . 4-3. Mapping with Source and Target Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4. Mapping with Newly Added Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5. Properties Tab of the Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6. Completed Properties Tab of the Filter Transformation . . . . . . . . . . . . . . . . . . . . . . 4-7. Filter Transformation Linked to the Expression Transformation . . . . . . . . . . . . . . . . 4-9. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8. Sample Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10. Session Task Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11. Contents of the customer_list.txt File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12. Properties for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 4-14. Data Preview of the STG_CUSTOMERS Target Table . . . . . . . . . . . . . . . . . . . . . . 4-15. General Properties for the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16. Customized Repeat Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17. Completed Schedule Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 . 18 . 19 . 21 . 22 . 22 . 23 . 24 . 25 . 38 . 39 . 40 . 42 . 43 . 43 . 44 . 44 . 45 . 52 . 54 . 54 . 56 . 56 . 56 . 62 . 63 . 63 . 64 . 64 . 65 . 82 . 83 . 83 . 84 . 85 . 85 . 86 . 88 . 88 . 90 . 90 . 91 . 91 . 92 . 93 . 94 . 94
xiii
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
5-1. Normal View of the Heterogeneous Sources, Source Qualifiers and Target . . . . . . . . . . . . . . 5-2. Joiner Transformation Button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3. Normal View of Heterogeneous Sources Connected to a Joiner Transformation . . . . . . . . . . 5-5. Edit View of the Condition Tab for Joiner Transformation Without a Condition . . . . . . . . . 5-4. Edit View of the Ports Tab for the Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6. Edit View of the Condition Tab for the Joiner Transformation with Completed Condition . 5-7. Normal View of Completed Mapping Heterogeneous Sources Not Displayed . . . . . . . . . . . . 5-8. Task Details of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9. Source/Target Statistics for the Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10. Data Preview of the STG_TRANSACTIONS Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11. View of an Unorganized Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12. Arranged View of a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13. Iconic View of an Arranged Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14. Selecting Multiple Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15. Designer Warning Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16. Selecting the forward link path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17. Highlighted forward link path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18. Highlighted link path going forward and backward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19. Selecting to propagate the attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20. Propagation attribute dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21. Autolink dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22. Defining a prefix in the autolink dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23. Expression after the AGE port has been moved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24. Click and drag method of moving ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25. Creating a transformation using the menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26. Create Transformation dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27. Normal View of the Newly Created Aggregator Transformation . . . . . . . . . . . . . . . . . . . . 5-28. Zoom options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29. Navigator window in the Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1. Source Analyzer view of the employees_layout flat file definition . . . . . . . . . . . . . . . . . . . . . 6-2. Target Designer view of the STG_EMPLOYEES relational table definition . . . . . . . . . . . . . 6-3. Transformation edit dialog box showing how to make a transformation reusable . . . . . . . . . 6-4. Question box letting you know the action is irreversible . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5. Transformation edit dialog box of a reusable transformation . . . . . . . . . . . . . . . . . . . . . . . . 6-6. Navigator window depicting the Transformations node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7. Partial mapping with source and target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8. Transformation Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9. Lookup Transformation table location dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10. Dialog box 1 of the 3 step Flat File Import Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11. Normal view of the newly created Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . 6-12. Lookup Transformation condition box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13. Source properties for the employee_list file list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16. Data Preview of the STG_EMPLOYEES target table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17. Mapping with Source and Target definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18. Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20. Source/Target Statistics for the session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21. Data preview of the STG_DATES table - screen 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. 106 . 107 . 107 . 108 . 108 . 109 . 110 . 111 . 111 . 112 . 116 . 117 . 117 . 118 . 118 . 119 . 119 . 120 . 120 . 121 . 122 . 123 . 124 . 124 . 125 . 125 . 125 . 126 . 127 . 142 . 142 . 143 . 143 . 143 . 144 . 144 . 145 . 145 . 145 . 146 . 147 . 148 . 149 . 149 . 150 . 157 . 158 . 159 . 160 . 161
xiv
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure table Figure Figure Figure Figure Figure Figure Figure Figure
6-22. Data preview of the STG_DATES table - screen 2 scrolled right . . . . . . . . . . . . . . . . . . . . . . . 161 7-1. Debug Session creation dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7-2. Debug Session connections dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7-3. Designer while running a Debug Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7-4. Customize Toolbars Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7-5. Debugger Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8-1. Expanded view of m-DIM_DATES_LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8-2. Sequence Generator Transformation icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8-3. Normal view of the sequence generator NEXTVAL port connected to a target column . . . . . . . . 186 8-4. Normal view of connected ports to the target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 8-5. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8-6. Source/Target statistics for the session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8-7. Data Preview of the DIM_DATES table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9-1. m_DIM_PROMOTIONS_LOAD mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9-2. m_DIM_DATES from the previous lab that populated the DIM_DATES table . . . . . . . . . . . . . 199 9-3. Select Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9-4. Lookup Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9-5. m_DIM_POROMOTIONS_LOAD completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9-6. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9-7. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9-8. Data Preview of the DIM_PROMOTIONS target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9-9. Preview files created when Persistent Cache is set on Lookup Transformation . . . . . . . . . . . . . . . 203 9-10. Find in workspace dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9-11. View Dependencies dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9-12. Transformation compare objects dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9-13. Compare Transformation objects Properties details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9-14. Target comparison dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9-15. Column differences between two target tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10-1. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD mapping . . . . . . . . . . . . . . . . . . . . . . . 230 10-2. Employee_central.txt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 10-3. Renaming an instance of a Reusable Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10-4. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD after most links removed . . . . . . . . . . . 231 10-5. Sorter Transformation Icon on Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10-6. Aggregator Transformation Icon on Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 10-7. Partial mapping flow depicting the flow from the Sorter to the Filter to the Aggregator. . . . . . . 233 10-8. Split data stream joined back together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 10-9. Iconic view of the completed self-join mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 10-10. Source properties for the employee_list.txt file list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 10-11. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10-12. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10-13. Data preview of the self-join of Managers and Employees in the STG_EMPLOYEES target screen 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 10-14. Data preview of the STG_EMPLOYEES target table - screen 2 scrolled right . . . . . . . . . . . . . 238 11-1. Mapping copy Target Dependencies dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 11-2. Iconic view of the m_DIM_EMPLOYEES_MAPPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 11-3. Router Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 11-4. Update Strategy set to INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 11-5. Iconic view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 11-6. Source Filter Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 11-7. Writers section of Target schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
xv
Figure 11-8. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-9. Source/Target Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-10. Data Results for DIM_EMPLOYEES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-11. Data Results for the Error Flat File (Located on the Machine Hosting the Integration Service Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-12. Task Details tab results for second run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-13. Source/Target Statistics for second run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-14. Data preview showing updates to the target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-1. Port tab view of a dynamic Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-2. Port to Port Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-3. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-4. Error Log Choice Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-5. Task Details of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-6. Source/Target Statistics for the Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-7. Data preview of the DIM_CUSTOMERS table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-8. Flat file error log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-1. Source Analyzer view of the STG_TRANSACTIONS and STG_PAYMENT tables . . . . . . . . . Figure 13-2. Declare Parameters and Variables screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-3. Parameter entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-4. Lookup Ports tab showing input, output and return ports checked/unchecked . . . . . . . . . . . . Figure 13-5. Aggregator ports with Group By ports checked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-6. Finished Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-7. Aggregator to Target Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-8. Iconic view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-9. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-10. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-11. Data Preview of the FACT_SALES target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 14-1. Mapplet Designer view of mplt_AGG_SALES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 14-2. Mapplet Designer view of MPLT_AGG_SALES with Input and Output transformations . . . . Figure 14-3. Iconic view of the m_FACT_SALES_LOAD_MAPPLET_xx mapping . . . . . . . . . . . . . . . . . . Figure 15-1. Source table definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-2. Target table definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-3. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-4. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-5. Data Preview of the FACT_PROMOTIONS_AGG_DAILY table . . . . . . . . . . . . . . . . . . . . . Figure 16-1. Workflow variable declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-2. Link condition testing if a session run was successful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-3. Assignment Task expression declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-4. Decision Task Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-5. Link condition testing for a Decision Task condition of TRUE . . . . . . . . . . . . . . . . . . . . . . . Figure 16-6. Email Task Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-7. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-8. Gantt chart view of the completed workflow run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-9. View Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-10. Value of the $$WORKFLOW_RUNS variable after first run . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-11. Gantt chart view of the completed workflow run after the weekly load runs . . . . . . . . . . . . . Figure 16-12. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-1. Timer Task Relative time setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-2. Email Task Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-3. Control Task Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 263 . 263 . 264 . 264 . 265 . 265 . 266 . 283 . 284 . 285 . 285 . 286 . 286 . 287 . 288 . 307 . 308 . 308 . 309 . 310 . 311 . 312 . 312 . 313 . 313 . 314 . 326 . 327 . 328 . 336 . 336 . 340 . 340 . 340 . 354 . 355 . 355 . 356 . 357 . 358 . 358 . 359 . 359 . 360 . 360 . 360 . 384 . 385 . 386
xvi
Figure 18-4. Completed Worklet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Figure 18-5. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Figure 18-6. Gantt chart view of the completed workflow run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
xvii
xviii
Preface
Welcome to the PowerCenter 8 Level I Developer course. Data integration is a large undertaking with many potential areas of concern. The PowerCenter infrastructure will greatly assist you in your data integration efforts and alleviate much of your risk. This course will prepare the developers for that challenge by teaching you the most commonly used components of the product. The students will build a small data warehouse using PowerCenter to extract from source tables and files, transform the data, load it into a staging area and finally into the data warehouse. The instructor will teach you about mappings, transformations, sources, targets, workflows, sessions, workflow tasks, connections and the Velocity methodology.
xix
Create and debug mappings Create, run, monitor and troubleshoot workflows Provide experience in designing mappings
Audience
This course is designed for data integration and data warehousing implementers. You should be familiar with data integration and data warehousing terminology and in using Microsoft Windows.
Document Conventions
This guide uses the following formatting conventions:
If you see > It means Indicates a submenu to navigate to. Example
boldfaced text
UPPERCASE
Indicates text you need to type or enter. Database tables and column names are shown in all UPPERCASE. Indicates a variable you must replace with specific information. The following paragraph provides additional facts. The following paragraph provides suggested uses or a Velocity best practice.
Click the Rename button and name the new source definition S_EMPLOYEE. T_ITEM_SUMMARY Connect to the Repository using the assigned login_id. Note: You can select multiple objects to import by using the Ctrl key. Tip: The m_ prefix for a mapping name is
italicized text
Note: Tip:
xx
Informatica Documentation Informatica Customer Portal Informatica web site Informatica Developer Network Informatica Knowledge Base Informatica Professional Certification Informatica Technical Support
The site contains information on how to create, market, and support customer-oriented add-on solutions based on interoperability interfaces for Informatica products.
Providing Feedback
Email any comments on this guide to aconlan@informatica.com.
support@informatica.com for technical inquiries support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http:// my.informatica.com .
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Toll Free 877 463 2435 Standard Rate United States: 650 385 5800 Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Toll Free 00 800 4632 4357 Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445 Asia / Australia Informatica Business Solutions Pvt. Ltd. 301 & 302 Prestige Poseidon 139 Residency Road Bangalore 560 025 India Toll Free Australia: 00 11 800 4632 4357 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 5112 5738
xxii
Informatica Data Integration Mapping and Transformations Workflows and Tasks Metadata
Integration Consortium. www.eaiindustry.org Object Management Group (OMG). www.omg.org. Common Warehouse Metamodel (CWM). www.omg.org/cwm Enterprise Grid Alliance. www.gridalliance.org Global Grid Forum (GGF). www.gridforum.org
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
XML.org. www.xml.org Web Services Interoperability Organization. www.ws-i.org Supply-Chain Council. www.supply-chain.org Carnegie-Mellon Software Engineering Institute (SEI). www.sei.cmu.edu APICS Educational and Research Foundation. www.apics.org Shared Services and Business Process Outsourcing Association (SBPOA). www.sharedxpertise.org.
www.informatica.comprovides information on Professional Services and Education Services my.informatica.comprovides access to Technical Support, product documentation, Velocity methodology, knowledge base, and mapping templates devnet.informatica.comthe Informatica Developers Network offers discussion forums, web seminars, and technical papers.
Informatica PowerCenter is deployed for a variety of batch and real-time data integration purposes:
Data Migration. ERP consolidation, legacy conversion, new application implementation, system consolidation Data Synchronization. application integration, business to business data transfer
Data Warehousing. Business intelligence reporting, data marts, data mart consolidation, operational data stores Data Hubs. master data management; reference data hubs; single view of customer, product, supplier, employee, etc. Business Activity Monitoring. business process improvement, real-time reporting
Informatica partners with Composite Software for Enterprise Information Integration (EII): on-the-fly federated views and real-time reporting of information spread across multiple data sources, without moving the data into a centralized repository.
Transformations
Transformations change the data they receive.
Passive. The number of rows entering and exiting the transformation are the same. Active. The number of rows exiting the transformation may not be the same as the number of rows entering the transformation.
Source Qualifier - reads sources Filter - filters data conditionally Sorter - sorts data Expression - performs logical/mathematical functions on data Aggregator - sums, averages, maximum, minimum Joiner - joins two data flows Lookup - looks up a corresponding value from a table or flat file
Workflows
A workflow is a set of ordered tasks that describe runtime ETL processes. Tasks can be sequenced serially, in parallel and conditionally. Each linked icon represents a task.
10
SourcesCan be relational tables or heterogeneous files (flat files, VSAM files and XML) TargetsCan be relational tables or heterogeneous files Integration ServiceThe engine that performs all of the extract, transform and load logic Repository ServiceManages connectivity to the metadata repositories that contain mapping and workflow definitions Repository Service ProcessMulti-threaded process that retrieves, inserts and updates repository metadata RepositoryContains all of the metadata needed to run ETL processes Client ToolsDesktop tools used to populate the repository with metadata, execute workflows on the Integration Service, monitor the workflows and manage the repository
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
11
All tools access the repository through the Repository Service. Workflow Manager and Workflow Monitor connect to Integration Service. Each client application has its own interface. The interfaces have toolbars, a navigation window to the left, a workspace to the right, and an output window at the bottom.
12
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Designer
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
13
Within the Designer, you can display transformations in the following views:
Iconized. Shows the transformation in relation to the rest of the mapping. This also minimizes the screen space needed to display a mapping. Normal. Shows the flow of data through the transformation. This view is typically used when copying/linking ports to other objects. Edit. Shows transformation ports and properties; allows editing. This view is used to add, edit, or delete ports and to change any of the transformation attributes or properties.
14
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Workflow Manager
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
15
In the Workflow Manager, you can display tasks in the following views:
16
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Technical Description
PowerCenter includes two development applications: the Designer, which you will use to create mappings, and the Workflow Manager, which you will use to create and start workflows. This exercise is designed to serve as your first hands-on experience with PowerCenter, and supplement the instructor demonstrations. You will import source and target definitions from a shortcut folder into your own folder.
Objectives
Learn how to navigate the repository folder structure. Understand the purpose of the tools accessed from the Designer and Workflow Manager. Create and save source and target shortcuts. Learn how to access and edit the database connection objects
Duration
30 minutes
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
17
Instructions
Step 1: Launch the Designer and Log Into the Repository
1. 2.
Launch the Designer client application from the desktop icon. If no desktop icon is present, select Start > Programs > Informatica PowerCenter > Client > PowerCenter Designer. Maximize the Designer window.
Note: Notice the Navigator window on the left side, which should resemble Figure 2-1. However, you
3.
Log into the PC8_DEV repository with the user name studentxx, where xx represents your student number as assigned by the instructor. The password is the same. Passwords are always case-sensitive.
Tip: The user name to log into the repository is an application-level user nameit allows PowerCenter to admit you to the repository with a specific set of application privileges. It is not a database user name.
Double-click the folder DEV_SHARED. This opens the folder and shows you the subfolders associated with it. Figure 2-2 shows the Navigator:
Figure 2-2. DEV_SHARED Folder and Subfolders
18
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
Note: Notice that the DEV_SHARED folder has a small blue arm holding it. This icon denotes that DEV_SHARED is a shortcut folder. As you will see later in this lab, objects dragged from a shortcut folder into an open folder create shortcuts to the object. Tip: Technically, all folders are shared with all users who have the appropriate folder permissions, regardless whether it has the blue arm or not. Do not confuse repository folders with the directories visible in Windows Explorer. The folders are PowerCenter repository objects and are not related to Windows directories.
2.
Expand some of the subfolders to see the objects they hold. Note that some subfolders are empty. When a new object, such as a target definition, is created within a folder, it automatically goes into the appropriate subfolder.
Note: Notice that within the Sources subfolder, the source objects are organized under individual nodes (branches in the hierarchy), such as FlatFile, ODBC_EDW, etc. These are based on the type of source and the name of the Data Source Name that was used to import the source definition (more on this later). Very Important: You will need to click on these source nodes to locate source definitions that may be hiding from view. Tip: Subfolders are created and managed automatically. Users cannot create, delete, nest, or rename subfolders.
Each PowerCenter application, such as the Designer, shows only subfolders related to the objects that can be created and modified by that application. For example, in the Designer you only see subfolders for sources, targets, mappings, etc.
3.
Double-click on your individual student folder. For the remainder of the class, you will create and modify objects in this folder. Some pre-made objects have been provided as well.
Note: Your student folder is now the open folder. Only one folder at a time can be open. The DEV_SHARED folder is now expanded. This distinction is important, as you will see later in this lab.
Select the menu option Tools > Source Analyzer. The workspace to the right of the Navigator window changes to an empty space.
Note: Note the small toolbar directly to the right of the Navigator window, at the top. These are the five Designer tools. Each tool allows you to create and modify one specific type of object, such as sources. Figure 2-3 shows the Designer tools with the first tool (the Source Analyzer) selected. Figure 2-3. Designer Tools
2.
With your left mouse button, alternately toggle between the five tools.
19
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
The name of each tool is displayed in the upper left corner of the workspace when that tool is active.
Note: The main menu bar (very top of your screen) changes depending on which tool is active. Because these menus are context-sensitive to which tool is active, you must already be in the appropriate tool to create or modify a specific type of object.
The Source Analyzer tool is used to create or modify source objects. They may be relational, flat file, XML or COBOL sources. The Target Designer tool is used to create or modify target objects. They may be relational, flat file, or XML. It does not matter whether these targets are part of an actual data warehouse. The Transformation Developer tool is used to create or modify reusable transformations. Nonreusable transformations are created directly in a mapping or mapplet. This distinction will be covered later in the class. The Mapplet Designer tool is used to create or modify mapplets. The Mapping Designer tool is used to create or modify mappings.
Ensure that the Target Designer is active and that your student folder is open.
Important: In order to copy/shortcut any object into a folder, the destination folder (the folder you
are adding to) must be the open folder. If the destination folder is not open, the copy/shortcut will not work.
2.
To help view which folder is active, choose View > Workbook to view the PowerCenter Client in Workfbook view. The PowerCenter Client displays tabs for each folder at the bottom of the Main window:
20
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
3.
In the DEV_SHARED folder, expand the Targets subfolder by clicking on the + sign to the left of the subfolder. Figure 2-4 shows the Navigator window:
Figure 2-4. DEV_SHARED Target subfolder
4.
Drag and drop the STG_PAYMENT target from the Navigator into the Target Designer workspace. You will see the confirmation message, Create a shortcut to the target table STG_PAYMENT?
5. 6.
Click Yes at the confirmation message. Expand the Targets subfolder in your Student folder. Note that you have added a shortcut of the STG_PAYMENT staging target table to your own folder.
Tip: PowerCenter shortcuts are pointers to the original object. They can be used but they cannot be modified as shortcuts. The original object can be modified, and any changes will immediately affect all shortcuts to that object.
7. 8. 9.
Open the Source Analyzer tool in your student folder. In the DEV_SHARED folder, expand the Sources subfolder and expand the FlatFile container. Add shortcuts to your folder to the two source definitions listed below.
PROMOTIONS PAYMENT
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
21
10.
11.
Use the menu option Repository > Save to save these objects in your student folder.
Tip: You should periodically save changes to the repository when using the Designer or the Workflow Manager. The keyboard shortcut Ctrl+S can also be used. There is no auto-save feature.
Left-click the toolbar icon for the Workflow Manager shown in Figure 2-6. This toolbar is usually above the Navigator window.
Figure 2-6. Application Toolbar
Workflow Manager Button
2. 3.
Confirm that the Workflow Manager launches and you are automatically logged into the repository the same way as you were in the Designer. Maximize the Workflow Manager application.
Tip: Avoid having two or more instances of the same PowerCenter application (such as the Workflow Manager) running on a machine at the same time. There is no benefit in doing this, and it can result in confusion when editing objects.
22
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
4.
Browse through the various folders and subfolders in the Workflow Manager Navigator window as you did in the Designer. Note that only subfolders for the objects that can be created with the Workflow Manager are present, Tasks, Sessions, Worklets, and Workflows.
Note: Although a session object is a type of task, it gets its own subfolder because you will typically have many more sessions than the other types of tasks. Only reusable sessions will appear in the Sessions subfolder. Likewise, only reusable tasks (except for sessions) will appear in the Tasks subfolder.
Select the menu option Tools > Task Developer. Just as in the Designer, you will see the workspace clear itself and a toolbar appear to the right of the Navigator window. The idea is the same as with the Designer, except there are three tools instead of five.
2.
With your left mouse button, alternately toggle between the three tools. Note that the name of each tool is displayed in the upper left corner of the workspace when that tool is active. Note also the context-sensitive menus, as we did in the Designer.
The Task Developer tool is used to create or modify reusable tasks. The Worklet Designer tool is used to create or modify worklets. The Workflow Designer tool is used to create or modify workflows
Locate the vertical stripe at the far-left hand side of the task bar, as shown in Figure 2-7:
Figure 2-7. Task Toolbar Default Position
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
23
2.
With your left mouse button, drag the toolbar toward the left and drop it in a convenient location so that all of the buttons are visible. The top of your Workflow Manager should appear similar to Figure 2-8:
Figure 2-8. Task Toolbar After Being Moved
In the Workflow Manager, select the menu option Connections > Relational.
24
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
You will see the Relational Connection Browser similar to Figure 2-9.
Figure 2-9. Relational Connection Browser
Note: Note that each connection object is organized under a database type.
2. 3.
Double-click on the NATIVE_TRANS connection object to display its properties. You will not have write privileges. Click OK.
Note: Note that the connection NATIVE_TRANS will log into the database with the user name sdbu.
The connection object will be shared among the students in the class.
4.
Double-click on any of the other objects that have your student number. The NATIVE_STG07 connection, for example, will have the user name tdbu07. These are the individual student connections to be used to read from and write to your individual staging tables and the enterprise data warehouse (EDW) tables. Its intuitive to create additional connection objects. Experiment if you have extra time.
Tip: Database connection objects are not associated with a specific folder.
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
25
26
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
Source Qualifier transformation Velocity Methodology Source Qualifier joins Source pipelines
Type
Active.
Description
A Source Qualifier transformation:
Selects records from flat file and relational table sources. Only those fields or columns used in the mapping are selected, based on the output connections. Converts the data from the sources native datatype to the most compatible PowerCenter transformation datatype. Generates a SQL query for relational sources. Can perform homogeneous joins between relational tables on the same database.
27
Properties
Business Purpose
The use of a Source Qualifier is a product requirement; other types of sources require equivalent transformations (XML Source Qualifier, etc.). It provides an efficient way to filter input fields/columns and to perform homogeneous joins.
28
Datatype Conversion
Passing data between ports with different datatypes Passing data from an expression to a port Using transformation functions Using transformation arithmetic operators Numeric datatypes <=> Other numeric datatypes Numeric datatypes <=> String Date/Time <=> Date or String (to convert from string to date the string must be in the default PowerCenter data format MM/DD/YYYY HH24:MI:SS)
Similarly, when writing to a target the Integration Service converts the data to the targets native datatype. For further information, see the PowerCenter Client Help > Index > port-to-port data conversion.
29
Templates
Mapping specification templates Source to target field matrix Naming conventions Object type prefixes: m_, exp_, agg_, wf_, s_, Best practices Phase 1: Manage Phase 2: Architect Phase 3: Design Phase 4: Build Phase 5: Test Phase 6: Deploy Phase 7: Operate
Lab Project
The Mersche Motors data model consists of the following star schemas. The labs predominately use the Sales star schema.
30 Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
Data is moved first to the staging area and from there to the data warehouse and target flat files.
The labs can source from flat files and/or a relational database.
31
PRODUCT PRODUCT_COST
Staging Area
The staging area has the following tables:
STG_CUSTOMERS STG_DATES STG_DEALERSHIP STG_EMPLOYEES STG_INVENTORY STG_PAYMENT STG_PRODUCT STG_PROMOTIONS STG_TRANSACTIONS
Data Warehouse
The data warehouse has the following tables:
DIM_CUSTOMERS DIM_DATES DIM_DEALERSHIP DIM_EMPLOYEES DIM_PAYMENT DIM_PRODUCT DIM_PROMOTIONS FACT_INVENTORY FACT_PRODUCT_AGG_DAILY FACT_PRODUCT_AGG_WEEKLY FACT_PROMOTIONS_AGG_DAILY FACT_SALES
32
Architecture
The labs use the following architecture and connections: Integration Service: PC_IService Repository Name: PC8_DEV Folders: Student 01 - 20 User Names: student01 - 20 Passwords: student01 - 20
Connectivity
ODBC Connections:
Source Tables Staging Area Data Warehouse ODBC_TRANS ODBC_STG (01 - 20) ODBC_EDW (01 - 20)
Native Connections:
Source Tables Staging Area Data Warehouse Relational Source Relational Targets Passwords NATIVE_TRANS NATIVE_STG (01 - 20) NATIVE_EDW (01 - 20) sdbu with password sdbu tdbu01 - 20 tdbu01 - 20
33
34
Technical Description
PowerCenter will source from a delimited flat file and insert the data into a database table without performing data transformations. In order to avoid duplicate records in subsequent loads, we will configure PowerCenter to truncate the target table before each load.
Objectives
Open the Designer Tools and switch between Workspaces Import a flat file definition Import a table definition Create a simple pass-through mapping Create a Session task to run the mapping and configure connectivity Create a Workflow to run the Session task Run the Workflow and monitor the results
Duration
35 minutes
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
35
SOURCES
Tables Table Name Schema/Owner Selection/Filter
Files File Name payment.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info Comma delimiter
TARGETS
Tables Table Name STG_PAYMENT Schema Owner Update TDBUxx Delete Insert X Unique Key
Source
Target
36
Target Table
Target Column
STG_PAYMENT
Payment_id
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
STG_PAYMENT
Payment_type_desc
Velocity Best Practice: This is the Velocity Source to Target Field Matrix. It is displayed here for you
37
Instructions
Step 1: Launch the Designer and Review the Source and Target Definitions
1.
Launch the Designer application by selecting Start > Programs > Informatica PowerCenter > Client > PowerCenter Designer.
Tip: If an instance of the Designer is already running on your workstation, do not launch another
instance. It is unnecessary and potentially confusing to run more than one instance per workstation.
2. 3. 4. 5.
Log into the PC8_DEV repository with the user name studentxx and password studentxx where xx represents your student number as assigned by the instructor. Open your student folder by double-clicking on it. Open the Source Analyzer selecting the menu option Tools > Source Analyzer. Drag the Shortcut_to_payment source file from the Sources subfolder into the Source Analyzer workspace. Confirm that your source definition appears the same as displayed in Figure 3-1. You may have to drag the box wider to see the Length column.
Figure 3-1. Normal view of the payment flat file definition displayed in the Source Analyzer
6.
Open the Target Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
7. 8.
Drag the Shortcut_to_STG_PAYMENT target table definition from the Targets subfolder into the Target Designer workspace. Review the target definition.
Open the Mapping Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
38
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
2.
Delete the default mapping name and enter the name m_Stage_Payment_Type_xx, xx refers to your student number. Click OK.
Velocity Best Practice: The m_ as a prefix for a mapping name is specified in the Informatica Velocity
Best Practices. Mapping names should be clear and descriptive so that others can immediately understand the purpose of the mapping. Velocity suggests either the name of the targets being accessed or a meaningful description of the function of the mapping.
3.
Expand the Sources subfolder. Expand the FlatFile node Drag-and-drop the source Shortcut_to_payment into the mapping.
4. 5.
Expand the Targets subfolder, and drag-and-drop the target Shortcut_to_STG_PAYMENT into the mapping. Select the menu option View > Navigator. This will temporarily remove the Navigator window from view in order to increase your mapping screen space. Your mapping should appear as displayed in Figure 3-2.
Figure 3-2. Mapping with Source and Target Definitions
6.
Drag-and-drop the port PAYMENT_ID from the Source Qualifier to the PAYMENT_ID field in the target definition. Drag-and-drop the Source Qualifier port PAYMENT_TYPE_DESC to the PAYMENT_TYPE_DESC field in the target definition.
Tip: When linking ports in a mapping as described above, ensure that the tip of your mouse cursor is touching a letter in the name or datatype or any property for the port you are dragging.
7.
Right-click in a blank area within the mapping and choose the menu option Arrange All.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
39
8.
9.
Launch the Workflow Manager by clicking on the respective icon in the toolbar. The icon is shown highlighted below:
2.
Open the Workflow Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
3.
Delete the default Workflow name and enter wkf_Load_STG_PAYMENT_xx (xx refers to your student number).
Velocity Best Practice: The wkf_ as a prefix for a Workflow name is specified in the Informatica
Velocity Methodology.
b. 4.
Click OK.
Select session from the Select the task type to create drop-box. Enter the Session name s_m_Stage_Payment_Type_xx (xx refers to your student number).
Velocity Best Practice: The s_ as a prefix for a session name is specified in the Informatica Velocity
40
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
c. d. e. 5.
Click the Create button. The Mappings list box shows the mappings saved in your folder. Confirm that the m_Stage_Payment_Type_xx mapping is selected and click OK. Click Done.
6.
Double-click on the session task that you just created to open it in edit mode.
a. b. c.
Select the Mapping tab. Select the Source Qualifier icon SQ_Shortcut_to_payment (in the Session properties navigator window). In the Properties area scroll down and confirm the source file name and location.
i. ii.
Tip: When the Integration Service process runs on UNIX or Linux, the filename is case sensitive.
d. e. f.
Select the target Shortcut_to_STG_PAYMENT (in the Session properties navigator window). Using the Connections list box, select the NATIVE_STGXX connection object, where XX represents your student number assigned by the instructor. In the Properties area, confirm that the load type is Bulk.
Tip: Setting the load type to bulk will use the target RDBMS bulk loading facility.
g.
In the Properties area, scroll down until the property Truncate target table option is visible. Select the check-box.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
41
Your session task information should appear similar to that displayed in Figure 3-4.
Figure 3-4. Completed Session Task Target Properties
h. 7. 8.
Click OK.
Type Ctrl+S to save your work to the repository. Confirm that your Output Window displays the message:
*******Workflow wkf_Load_STG_PAYMENT is INVALID******* Workflow wkf_Load_STG_PAYMENT inserted ------------------------------------------------------
Tip: For this section you have created a non-reusable session within the workflow. This session
Click the Link Tasks icon in the Tasks Toolbar shown below.
10.
Holding down the left mouse button, drag from the Start Task to the s_m_Stage_Payment_Type_xx Session Task and release the mouse. This will establish a link from the Start Task to the Session Task.
42
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
11.
Type Ctrl+S to save your work to the repository. Confirm that your Output Window displays the message:
...Workflow wkf_Load_STG_PAYMENT tasks validation completed with no errors. ******* Workflow wkf_Load_STG_PAYMENT is VALID ******* Workflow wkf_Load_STG_PAYMENT updated. ---------------------------------------
Right-click on a blank area near the Workflow and inside the workspace, select Start Workflow. If Workflow Monitor is already opened, the workflow and session will automatically display. However, if the Monitor opens new:
a. b. c. d. e.
Right click on the PC8_DEV repository and choose Connect. Log in with your studentxx id and password. Right click on PC_IService and choose Connect. Right click on your Studentxx folder and choose Open. Right click on wkf_Load_STG_PAYMENT_xx and select Open Latest 20 Runs.
3.
Maximize the Workflow Monitor. Note there are two tabs above the Output window: Gantt Chart and Task View.
4.
Select Task View. Your information should appear similar to what is displayed in Figure 3-6.
Figure 3-6. Successful Run of a Workflow Depicted in the Task View of the Workflow Monitor
5.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
43
6.
Select the Source /Target statistics tab. Expand the node for the source and target. Note that for the Source and Target objects in the mapping, there is a count of the rows in various categories, such as Applied Rows (success), Affected (transformed), and Rejected, also an estimated throughput speed.
Figure 3-8. Source/Target Statistics for the Completed Session Run
7.
Session log will be displayed. Review the log and note the variety of information it shows. Close the Session Log.
8.
Select the Gantt Chart tab. Note that the Workflow and the Session are displayed within a horizontal timeline.
44
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Data Results
In the Designer, you can view the data that was loaded into the target.
1. 2. 3. 4. 5.
Right-click on the STG_Payment target definition. Select Preview Data. Set the ODBC Data Source drop-box to the ODBC_STG Data Source Name. Enter the user name tdbuxx, where xx represents your student number as assigned by the instructor. Enter the password tdbuxx and click the Connect button. Your data should appear as displayed in Figure 3-9.
Figure 3-9. Data Preview of the STG_PAYMENT Target Table
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
45
46
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
The join is performed on the source database at runtime (when SQL generated by the Source Qualifier executes). Joining data in a Source Qualifier allows the Integration Service to read data in multiple tables in a single pass, which can improve session performance.
In a case where there is no PK/FK relationship you can specify a User Defined Join. Enter the join condition in the Source Qualifier properties e.g. tableA.EmployeeID=TableB.EmployeeID. By default you get an inner joinuse SQL Query override to specify other join types.
Example
A business sells a high volume of products and updates the Product Dimension table on a regular basis. To update the dimension table, a join of the PRODUCT and PRODUCT_COST table is required. Since the source tables are from the same database and have a key relationship only a single Source Qualifier transformation is needed.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
47
Note the primary key-foreign key relationship between the PRODUCT_ID field of the PRODUCT table and the PRODUCT_CODE field of the PRODUCT_COST table.
Performance Considerations
For relational sources, the number of rows processed can be reduced by using SQL override and adding a WHERE clause or by the use of the Source Filter attribute if not all rows are required. Also, the default SQL generated by the Source Qualifier can be customized to improve performance.
48
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Technical Description
PowerCenter will define a homogeneous join between the two Oracle source tables. The source database server will perform an inner join on the tables based on a join statement automatically generated by the Source Qualifier. The join set will be loaded into the staging table.
Objectives
Import relational source definitions View relationships between relational sources Use a Source Qualifier to define a homogeneous join and view the statement
Duration
30 minutes
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
49
This mapping joins the product table and the product cost table and loads data to the staging area Once Target truncate
SOURCES
Tables Table Name PRODUCT PRODUCT_COST Schema/Owner SDBU SDBU Selection/Filter N/A N/A
TARGETS
Tables Table Name STG_PRODUCT Schema Owner Update Delete TDBUxx Insert X Unique Key
50
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
51
Instructions
Step 1: Import the Source Definitions
1.
Right click in the workspace and select Clear All. Chose the menu option Sources > Import from Database
i. ii. iii. iv. v. vi.
Set the ODBC Data Source drop-box to the ODBC_TRANS Data Source Name Enter the user name sdbu. Tab down into Owner name and confirm that it defaults to the user name entered above. Enter the password sdbu and click the Connect button. Expand the node in the Select tables area, and expand the TABLES node. Import the relational tables PRODUCT and PRODUCT_COST.
Tip: You can select multiple objects for simultaneous import by using the Ctrl key.
2.
Save your work. Your Source Analyzer should appear as displayed in Figure 3-10.
Figure 3-10. Source Definitions with a PK/FK Relationship Displayed in the Source Analyzer
Tip: The arrow connecting the keys PRODUCT_ID and PRODUCT_CODE denotes a relationship stored in the Informatica repository. By default, referential integrity (primary to foreign key) relationships defined on a database are imported when each of the tables in the relationship are imported. The arrow head is on the Primary key end (Parent / Independent / one end) of the relationship. Tip: It is not generally a good practice to create two different tables with the same primary key. In
correct database design, this method of horizontal partitioning of a table is usually only justified for security or performance reasons. Separating products and their product_costs doesn't meet either of these criteria. These two tables however give you a very good example of using a homogenous join in a mapping.
52
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Right click in the workspace and select Clear All. Chose the menu option Targets > Import from Database.
i. ii.
Connect using the ODBC Data Source ODBC_STG, the user name tdbuxx and the password tdbuxx, where xx represents your student number. Import the relational target definition STG_PRODUCT.
2.
Open the Mapping Designer. If a mapping is visible in the workspace, close it by choosing the menu option Mappings > Close. Create a new mapping named m_Stage_Product_xx. For further details about how to do this, see Step 2, Create a Mapping on page 38. Choose the menu option Tools > Options.
a.
Set the Tools drop-box at the top to Mapping Designer. Uncheck the check-box Create Source Qualifiers when opening Sources. Click OK.
Tip: The check-box described above allows you to specify whether a Source Qualifier transformation will be created automatically every time a Source definition is added to the mapping. Generally, this option is turned off when it is desired to add several relational Sources to the mapping and create a single Source Qualifier to join them.
5. 6.
Add the source definitions PRODUCT and PRODUCT_COST to the mapping. You may need to display the navigator window by selecting the menu option View > Navigator. Create a Source Qualifier transformation by clicking on the appropriate icon in the transformation toolbar and then clicking in the workspace. The icon is shown highlighted below:
In the Select Sources for Source Qualifier Transformation dialog-box, confirm that both sources are selected and click OK. Double-click the Source Qualifier to enter edit mode. Click the rename button and change the name to sq_Product_Product_Cost. Add the target definition STG_PRODUCT to the mapping. Link each of the output ports in the Source Qualifier to an input port in the target with the same name (i.e., PRODUCT_ID linked to PRODUCT_ID). Link the COST port to the PRODUCT_COST port.
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
53
13.
Save your mapping and confirm that it is valid. Note that the PRODUCT_CODE port in the Source Qualifier is intended to be unlinked, as it is not required in the target. Confirm that your mapping appears the same as displayed in Figure 3-11.
Figure 3-11. Normal View of the Completed Mapping
14.
Click on the Properties tab. Open the SQL Query Editor by clicking the arrow in the SQL Query property. Click the Generate SQL button. Note that the join statement can now be previewed, and that it is an inner join. Also note that the PRODUCT_CODE column is not in the SELECT statement; this is because the column is not linked in the mapping and is not needed. Your SQL Editor should appear as displayed in Figure 3-12.
Figure 3-12. Generated SQL for the m_Stage_Product Mapping
15.
Click OK twice.
54
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
16.
From the Workflow Manager application, open the Workflow Designer tool. If a Workflow is visible in the workspace, close it by choosing the menu option Workflows > Close. Create a new Workflow named wkf_Stage_Product_xx. For further details about how to do this, see Step 3, Create a Workflow and a Session Task on page 40.
4.
Create a new Session by clicking on the appropriate icon in the task toolbar and then clicking in the workspace. The icon is shown highlighted below:
Set the relational source connection object property to NATIVE_TRANS. Set the relational target connection object property to NATIVE_STGxx where xx is your student number. Check the property Truncate target table option in the target properties. In the Properties area, confirm that the load type is Bulk.
6.
Link the Start task to the Session task. For further details about how to do this, see Step 3, Create a Workflow and a Session Task on page 40.
7. 8.
Right click in the workspace and select Arrange > Horizontal. Save your work.
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
55
Confirm that your Task Details appear the same as displayed in Figure 3-13.
Figure 3-13. Properties of the Completed Session Run
2.
Confirm that your Source/Target Statistics appear the same as displayed in Figure 3-14.
Figure 3-14. Source/Target Statistics for the Completed Session Run
3.
Using the Preview Data option in the Designer, confirm that your target data appears the same as displayed in Figure 3-15. Be sure to login with user tdbuxx.
Figure 3-15. Data Preview of the STG_PRODUCT Target Table
56
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
57
58
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Technical Description
Both loads have a simple pass-through logic as in Lab A so we will combine them into one mapping. Even though two sources and two targets are involved, only one Session will be required to run this mapping.
Objectives
Import a fixed-width flat file definition Define two data flows within one mapping
Duration
20 minutes
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
59
Simple pass-through mapping with 2 pipelines. One pipeline extracts from a flat file and loads to an Oracle table. The second pipeline extracts from an Oracle table and loads an Oracle table.
SOURCES
Tables Table Name DEALERSHIP Schema/Owner SDBU Selection/Filter N/A
Files File Name promotions.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Fixed Additional File Info
TARGETS
Tables Table Name STG_DEALERSHIP STG_PROMOTIONS Schema Owner Update TDBUxx Delete Insert X X Unique Key
60
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Source 2
Target 2
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
61
Instructions
Step 1: Import the Source Definitions
1. 2.
Import the relational source definition DEALERSHIP. For further details about how to do this, see Step 1, Import the Source Definitions on page 52. Edit the source definition for the promotions.txt file in the source analyzer.
a. b.
Click the Advanced button in the lower right of the edit box. Make sure that the number of bytes to skip between records is set to 2.
Note: A fixed width flat file will have bytes at the end of each row that depict a line feed and a carriage return. Depending on the system that the file was created on you will need to skip the appropriate number of bytes. If you don't your result set will be offset by 1 or 2 bytes. For files created on a mainframe set the value to 0, for UNIX/Linux set the value to 1, for all others set the value to 2.
Confirm that your promotions source definition appears the same as displayed in Figure 3-16.
Figure 3-16. Normal view of the promotions flat file definition displayed in the Source Analyzer
Make sure that the option to Create Source Qualifiers when Opening Sources is checked (on). For further details about how to do this, see Step 3, Create the Mapping on page 53. Add the Dealership and Promotions source definitions to the mapping. Confirm that a Source Qualifier was created for each. Add the STG_DEALERSHIP and STG_PROMOTIONS target definitions to the mapping. Link the appropriate Source Qualifier ports to the target ports.
2. 3.
Save the mapping and confirm that it is valid. Right-click in a blank area within the mapping and choose the menu option Arrange All Iconic.
62
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Create a workflow named wkf_Load_Stage_Dealership_Promotions_xx. Create a Session Task named s_m_Dealership_Promotions_xx that uses the mapping m_Dealership_Promotions_xx. Edit the Session.
a.
Set the database connection objects for the sources and targets in the Session. Note that both of the relational target database connections need to be set separately. For further details about how to do this, see Create a Workflow and a Session Task on page 40 and Create the Session and Workflow on page 55. Confirm that the source location information for the Promotions flat file is set correctly. For further details about how to do this, see Create a Workflow and a Session Task on page 40. Check the property Truncate target table option in the target properties.
b. c. 4.
Complete the Workflow, save it, and run it. Confirm that your Task Details appear the same as displayed in Figure 3-18.
Figure 3-18. Properties of the Completed Session Run
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
63
Confirm that your Source/Target Statistics appear the same as displayed in Figure 3-19.
Figure 3-19. Source/Target Statistics for the Completed Session Run
5.
Preview the target data with user tdbuxx. It should appear the same as Figure 3-20 and Figure 3-21:
Figure 3-20. Data Preview of the STG_DEALERSHIP Target Table
64
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
65
66
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Type
Passive.
Description
The Expression transformation lets you modify individual ports of a single row (or columns within a single row). It also lets you add and suppress ports. It cannot perform aggregation across multiple rows (use the Aggregator transformation).
Business Purpose
You can modify ports using logical and arithmetic operators or built-in functions for:
Character manipulation (concatenate, truncate, etc.) Datatype conversion (to char, to date, etc.) Data cleansing (check nulls, replace string, etc.)
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
67
Data manipulation (round, truncate, etc.) Numerical calculations (exponential, power, log, modulus, etc.) Scientific calculations (sine, cosine, etc.) Special (lookup, decode, etc.) Test (for spaces, number, etc.)
For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.
Numeric and arithmetic/logical operator keypads. Functions tab for built-in functions. Ports tab for port values. Variables tab for mapping and system variables.
68
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Expressions resolve to a single value of a specific datatype. For example, the expression LENGTH (HELLO WORLD) / 2 returns a value of 5.5. The function LENGTH calculates the length of the string including all blank spaces as 11 bytes.
Tip: Highlighting a function and pressing F1 will launch the online help and open it at the highlighted
function section.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
69
A transformation variable is created by creating a port and selecting the V check box. When V is checked, the I and O check boxes are grayed out. This indicates that a variable port is neither an input nor an output port.
70
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
When a record is processed, the expression is evaluated and the result assigned to the variable port. The result must be compatible with the datatype selected for the port otherwise an error is generated. The variable persists across the entire set of records that traverse the transformation. It may be used or modified anywhere in the set of data that is being processed.
Example 1
Check, Clean and Record Errors Suppose that we want to:
Clean Up Item Name: The Accounts Receivable department is tired of generating reports with an inconsistent set of Items Names. Some are in UPPERCASE while others are in lower case; still others are in mixed case. They would like to see all of the data in a Title case mode. They would also like a count of how many changes have been made. Missing Data: The Systems and Application group is concerned that occasionally some incomplete data is sent to end users. They will like to tag each record as an error and be able to report and investigate the data where critical fields are missing data. Invalid Dates: Due to applications issues, occasionally dates are not valid. The AR departments, as well as the auditors are very concerned about this issue. They want every record with a bad date tagged and reported on. Invalid Numbers: The Sales Department is concerned that occasionally they see non-numeric data in a report that covers sales discounts where they expect to see numeric data. Find all errors and tag the records.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
71
3 4
Example 2
Calculate Sales Discounting and Inventory Days Suppose we want to calculate:
Discount Tracking: Sales Management would like to compare the amount of the suggested sell price to the actual sell price to determine the level of discounting. They plan to do this via a report. They would like a field developed that calculates sales discount. Days in Inventory: The Sales and Marketing departments would like to be able to determine how long an item was in inventory.
Performance Considerations
Multiple identical conversions of the same data should be avoided. Ports that do not need modification should bypass the Expression transformation to save buffers.
72
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Type
Active.
Description
The Filter transformation allows rows which meet the filter condition to pass through the transformation. Rows which do not meet the filter condition are skipped.
Business Purpose
A business may chose not to process records which do not meet a data quality criterion, such as containing a null value in a field which may cause a target constraint violation or eliminate from the process date field values which will not provide useful data.
Example 1
Existing customer dimension records need to be updated to reflect changes to columns like address. However, only existing customer records are to be updated. The following example uses a Lookup to verify the customer exits and a filter to skip records which do not have an exiting customer (MSTR_CUST_ID) id. An Update Strategy tags the records for update which pass the filter condition.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
73
Performance Considerations
Filter records which do not meet the selection criterion as early as possible in a mapping to reduce the number of rows processed, decrease throughput and decrease run-time. In fact any active transformation, that decreases the number of rows (the Normalizer and the Router can increase the number of rows), should be placed as early as possible in the mapping to decrease total rows throughput and improve performance.
74
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
The session processes each file in turn. The properties of all files must match the source definition. Wild cards are not allowed. All of the files must exist.
d:\data\eastern_trans.dat e:\data\midwest_trans.dat f:\data\canada_trans.dat
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
75
Run Options
76
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Run on Integration Service Initializationwill run the workflow each time the integration service initializes and then schedules it based on the other options. Run on demandruns the workflow only when asked to. Run continuouslyruns the workflow in a continuous mode. When the workflow finishes it will start again from the beginning.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
77
78
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Technical Description
PowerCenter will source from a file list. This file list contains the names of three delimited flat files from the regional sales offices. All rows with a customer number of 99999 will need to be filtered out. There are a number of columns that will need to have the data reformatted, this will include substrings, concatenation and decodes. The target will be truncated until the mapping is fully tested.
Objectives
Create a Filter transformation to eliminate unwanted rows from a flat file source Create an Expression transformation to reformat incoming rows before they are written to a target Use the DECODE function as a small lookup to replace values for incoming data before writing to target Create a session task that will accept a file list as a source Create a workflow that can run on a schedule
Duration
60 Minutes
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
79
Flat file list (customer_east.txt, customer_west.txt, customer_central.txt) comma delimited files that need to be filtered and reformatted before they are loaded into the target table. Scheduled run every night at midnight.
SOURCES
Files File Name customer_central.txt, customer_east.txt, customer_west.txt Definition in customer_layout.txt customer_list.txt C:\pmfiles\SrcFiles NA File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info These 3 comma delimited flat files will be read into the session using a filelist named customer_list.txt. The layout of the flat files can be found in customer_layout.txt File list
TARGETS
Tables Table Name STG_CUSTOMERS Schema Owner Update TDBUxx Delete Insert X Unique Key
Source
Filter
Expression
Target
80
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
Customer inquiries are captured using customer_no 99999. The mapping will filter out the customer inquiries.
This mapping will reformat the customer names, gender and telephone number columns.
Target Table
Target Column
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
STG_CUSTOMER
CUST_ID
STG_CUSTOMER customer_layout customer_layout customer_layout customer_layout customer_layout customer_layout PHONE_NUMBE R COUNTRY The CUST_PHONE is a reformat of the PHONE_NUMBER column. The PHONE_NUMBER column is in the format of 9999999999 and needs to be reformatted to (999) 999-9999. The CUST_GENDER is derived from the decoding of the GENDER column. The GENDER column is a 1 character column that contains either 'M' (male) or 'F' (female). Any other values will resolve to 'UNK'. The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and GREATER than 60. INCOME EMAIL AGE ZIP STATE CITY ADDRESS
CUST_NAME
customer_layout
FIRSTNAME LASTNAME
STG_CUSTOMER
CUST_ADDRESS
STG_CUSTOMER
CUST_CITY
STG_CUSTOMER
CUST_STATE
STG_CUSTOMER
CUST_ZIP_CODE
STG_CUSTOMER
CUST_COUNTRY
STG_CUSTOMER
CUST_PHONE
STG_CUSTOMER
CUST_GENDER
customer_layout
GENDER
STG_CUSTOMER
CUST_AGE_GROUP
customer_layout
AGE
CUST_INCOME
customer_layout
STG_CUSTOMER
CUST_E_MAIL
STG_CUSTOMER
CUST_AGE
81
Instructions
Step 1: Create a Flat File Source Definition
1. 2. 3. 4.
Launch the Designer client tool. Log into the PC8_DEV repository with the user name studentxx, where xx represents your student number as assigned by the instructor. Open your student folder. Import the customer_layout.txt flat file definition. This file is located in the c:\pmfiles\SrcFiles directory. If the file is located in a different directory, your instructor will specify. Ensure that the following parameters are selected:
Import field names from first line. Comma delimited flat file. Text Qualifier is Double quotes. Format of the Date field is Datetime.
Tip: Only one flat file definition is required when using a file list as a source in Power Center. All the files that make up the file list must have same identical layout in order for the file list to be successfully processed by Power Center.
5.
6.
82
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
2.
Confirm that your target definition appears the same as displayed in Figure 4-2.
Figure 4-2. Target Designer View of the STG_CUSTOMERS Table Relational Definition
Create a new mapping named m_Stage_Customer_Contacts_xx. Add customer_layout flat file source to the mapping Add STG_CUSTOMERS target to the mapping. Your mapping will appear similar to Figure 4-3.
Figure 4-3. Mapping with Source and Target Definitions
Select the Filter transformation tool button located on the Transformation tool bar and place it in the workshops between the Source Qualifier and the Target. The icon is shown highlighted below:
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
83
2.
Link the following ports from the Source Qualifier to the Filter:
CUSTOMER_NO FIRSTNAME LASTNAME ADDRESS CITY STATE ZIP COUNTRY PHONE_NUMBER GENDER INCOME EMAIL AGE
3.
84
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
4. 5. 6. 7.
Click the dropdown arrow for the Filter Condition Transformation Attribute to activate the Expression Editor. Remove the TRUE condition from the Expression Editor. Enter in the following the expression: CUSTOMER_NO != 99999 OR ISNULL(CUSTOMER_NO) Click the OK to return to the Properties of the Filter transformation. The Properties will appear as displayed in Figure 4-6.
Figure 4-6. Completed Properties Tab of the Filter Transformation
8.
Create an Expression transformation directly after the Filter transformation. Select the Expression transformation tool button located on the Transformation tool bar and place it in the workspace directly after the Filter. The icon is shown highlighted below:
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
85
2.
Select the following ports from the Filter transformation and pass them to Expression transformation:
3.
Rename it exp_Format_Name_Gender_Phone. Change the port type to input for all of the ports except AGE. (AGE should remain an input/ output port.) Prefix each of these input only ports with IN_. Create a new output port after the AGE port by positioning the cursor on the AGE port and clicking the add icon. Port Name = OUT_CUST_NAME Dataytype = String Precision = 41 Expression = IN_FIRSTNAME ||' ' ||IN_LASTNAME
Velocity Best Practice: Prefixing input only ports with IN_ and output ports with OUT_ is a Velocity best practice. This makes it easier to tell what the ports are without having to go into the transformation. Tip: This new port will concatenate the FIRSTNAME and LASTNAME ports into a single string. Do not use the CONCAT function to concatenate in expressions. Use || to achieve concatenation. The CONCAT function is only available for backwards compatibility.
4.
86
Tip: The expression above uses a technique known as nesting functions. TO_CHAR function is nested inside the SUBSTR function. The TO_CHAR function is performed first. The SUBSTR function is then performed against the return value from TO_CHAR.
5.
Tip: The DECODE function used in the previous expression can be used to replace nested IIF functions or small static lookup tables. The DECODE expression in the previous step will return the value MALE if incoming port GENDER is equal to M, FEMALE if GENDER equals F, or UNK if GENDER equals anything else beside F or M.
6.
Port Name = OUT_AGE_GROUP Datatype = String Precision = 20 Expression = Write an expression using the DECODE function that will assign the appropriate age group label to each customer based on their age. Use the online help to see details about the DECODE. If after 5 minutes you have not successfully created the DECODE statement, refer to the reference section at the end of the lab for the solution. The valid age ranges and age groups are displayed in the table below. The format of the DECODE statement follows the table.
Age Range AGE<20 AGE >= 20 AND <= 29 AGE >= 30 AND <= 39 AGE >= 40 AND <= 49 AGE >= 50 AND <= 60 AGE > 60 Age Group Text LESS THAN 20 20 TO 29 30 TO 39 40 TO 49 50 TO 60 GREATER THAN 60
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
87
7. 8.
Save your work. Connect the following ports from the Expression transformation to the target table:
AGE OUT_CUST_NAME OUT_CUST_PHONE OUT_GENDER OUT_AGE_GROUP CUST_AGE CUST_NAME CUST_PHONE_NMBR CUST_GENDER CUST_AGE_GROUP
9.
Connect the following ports from the Filter transformation to the target table:
CUSTOMER_NO ADDRESS CITY STATE ZIP COUNTRY INCOME EMAIL CUST_ID CUST_ADDRESS CUST_CITY CUST_STATE CUST_ZIP_CODE CUST_COUNTRY CUST_INCOME CUST_E_MAIL
Save your work. Verify that your mapping is valid. Right click in the workspace and select Arrange All Iconic.
Figure 4-9. Iconic View of the Completed Mapping
Launch the Workflow Manager and sign into your assigned folder.
88
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
2. 3. 4. 5. 6. 7.
Open the Workflow Designer tool and create a new workflow named wkf_Stage_Customer_Contacts_xx. Create a Session task using the session task tool button. Select m_Stage_Customer_Contacts_xx from the Mapping list box, and click OK. Link the Start object to the s_m_Stage_Customer_Contacts_xx session task. Edit the s_m_Stage_Customer_Contacts_xx session. Under the Mapping tab:
a. b. c.
Select SQ_customer_layout located under the Sources folder in the navigator window. Confirm that Source file directory is set to $PMSourceFileDir\. In Properties | Attribute | Source filename type in customer_list.txt.
Tip: The source instance you are reading is known a File List. It is a list of files which will be appended together and treated as one source file by Power Center. The name of the text file that is listed in Properties | Attribute | Source filename will be a text file that contains a list of the text files to be read in as individual sources. When you create a file list you open a blank text file with a application such as Notepad and type on a separate line each text file that is to be read as part of the file list. You may precede each file name with directory path information. If you don't provide directory path information, Power Center assumes the files will be located in the same location as the file list file.
d.
In Properties | Attribute | Source filetype click the dropdown arrow and change the default from Direct to Indirect.
Tip: When you use the file list feature in Power Center you have to set Properties | Attribute |
Source filetype to Indirect. The default is Direct. To change the setting, click the dropdown arrow and set the value you want to use.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
89
The file list file used in this exercise lists three text files which are found in the default location of the file list file, $PMSourceFileDir\. Figure 4-11 displays the contents of customer_list.txt.
Figure 4-11. Contents of the customer_list.txt File List
e.
Select STG_CUSTOMERS located under the Target folder in the navigator window. Set the relational target connection object property to NATIVE_STGxx, where xx is your student number. Check the property Truncate target table option in the target properties.
8. 9. 10. 11.
Save your work. Check Validate messages to ensure your workflow is valid. Start the workflow. Review the session properties.
90
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
12.
Review the Source/Target Statistics. Your statistics should be the same as displayed in Figure 4-13.
Figure 4-13. Source/Target Statistics for the Completed Session Run
13.
If your session failed or had errors troubleshoot and correct them by reviewing the session log and making any necessary changes to your mapping or workflow.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
91
Data Results
Preview the target data from the Designer. Your data should appear as displayed in Figure 4-14.
Figure 4-14. Data Preview of the STG_CUSTOMERS Target Table
Observe the CUST_PHONE, CUST_GENDER, CUST_AGE_GROUP columns. These columns required transforming using the Expression transformation. Scroll down and review these columns. Verify you wrote your expressions correctly.
After debugging has been completed run the workflow for a final time for an initial table load. Open the session task for the mapping and ensure the truncate table property is checked. Save any changes to the repository.
92
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
4.
Select Workflows > Edit. This will display the screen seen in Figure 4-15.
Figure 4-15. General Properties for the Workflow
5. 6. 7. 8.
Select the Scheduler tab. Select the Edit Scheduler command button .
Type sch_Stage_Customers_Contacts_xx in the Name text box. Select the Schedule tab.
a. b.
Clear the Run on demand check box. Select the Customized Repeat radio button and click the Edit button.
i. ii. iii.
Select Week(s) from the Repeat every dropdown box. Check the Monday, Tuesday, Wednesday, Thursday, Friday Weekly check boxes. Select the Run once radio button in the Daily frequency group.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
93
Your customized options should appear the same as displayed in Figure 4-16.
Figure 4-16. Customized Repeat Selections
iv. c. d. e.
Click OK.
Set the Start Date in the Start options group to tomorrow's date. Set the Start Time to 00:01. Select the Forever radio button in the End options group. Your schedule options will appear similar to the one displayed in Figure 4-17.
Figure 4-17. Completed Schedule Options
9. 10.
94
11. 12.
Right click in the workspace and select Schedule Workflow. Check the Workflow Monitor to confirm that the workflow has been scheduled.
References
1.
Decode Statement
DECODE(TRUE, AGE < 20, 'LESS THAN 20',
AGE >= 20 AND AGE <= 29, '20 TO 29', AGE >= 30 AND AGE <= 39, '30 TO 39', AGE >= 40 AND AGE <= 49, '40 TO 49', AGE >= 50 AND AGE <= 60, '50 TO 60', AGE > 60, 'GREATER THAN 60')
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
95
96
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
The labs also illustrate various PowerCenter mapping features and techniques.
97
Type
Active.
Description
The Joiner transformation combines fields from two data sources into a single combined data source based on one or more common fields also know as the join condition.
Business Purpose
A business has data from two different systems that needs to be combined to get the desired results.
98
Example
A business has sales transaction data on a flat file and product data on a relational table. The company needs to join the sales transaction to the product table to get some product information. We need to use the Joiner transformation to accomplish this task.
Joiner Properties
99
Join Types
There are four types of join conditions supported by the Joiner transformation:
Joiner Cache
How it Works
There are two types of cache memory, index and data cache. All rows from the master source are loaded into cache memory. The index cache contains all port values from the master source where the port is specified in the join condition. The data cache contains all port values not specified in the join condition. After the cache is loaded the detail source is compared row by row to the values in the index cache. Upon a match the rows from the data cache are included in the stream.
Key Point
If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk.
Performance Considerations
The master source should be the source that will take up the least amount of space in cache. Another performance consideration would be the sorting of data prior to the Joiner transformation (discussed later).
100
101
102
Technical Description
PowerCenter will source from a flat file and relational table. A Joiner transformation is used to create one dataflow that is then written to a relational target. The flat file is missing one field the staging table needsthe cost of each product. This value can be read from the STG_PRODUCT table. Each row of the source file contains a value named Product. This value has an identical corresponding value in the STG_PRODUCT table PRODUCT_ID column. Use the Joiner transformation to join the flat file to the relational table (heterogeneous join) and then write the results to the STG_TRANSACTIONS table.
Objectives
Create a Joiner transformation and use it to join two data streams from two different Source Qualifiers. Select the master side of the join. Specify a join condition.
Duration
30 minutes
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
103
Flat file and oracle table will be joined into one source datastream which will be written to an oracle target table. Daily Target Append
SOURCES
Tables Table Name STG_PRODUCT Files File Name sales_transactions.txtt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info Comma delimiter Schema/Owner TDBUxx Selection/Filter
TARGETS
Tables Table Name STG_TRANSACTIONS Schema Owner Update TDBUxx Delete Insert X Unique Key
Relational Source
104
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
105
Instructions
Step 1: Create a Flat File Source Definition
1. 2. 3. 4. 5.
Launch the Designer client tool (if it is not already running) and open you student folder. Open the Source Analyzer tool. Import sales_transactions.txt comma delimited flat file. Ensure that the Transaction Date field has a Datatype of Datetime. Save the repository.
Verify you are in the Source Analyzer tool, and import STG_PRODUCT table found in your tdbuxx schema. Use ODBC_STG as the ODBC data Source. Note that you are importing the table as a source definition, even though it is in your target (tdbuxx) schema.
2.
Open the Target Designer tool. Import the STG_TRANSACTIONS table found in your tdbuxx schema.
Open the Mapping Designer tool. Create a new mapping named m_STG_TRANSACTIONS_xx. Add the sales_transactions flat file source to the new mapping. Add the STG_PRODUCT relational source to the new mapping. Add the STG_TRANSACTIONS relational target to the new mapping. Your mapping should appear similar to Figure 5-1.
Figure 5-1. Normal View of the Heterogeneous Sources, Source Qualifiers and Target
106
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Select the Joiner transformation icon located on the Transformation tool bar with a single left click. Figure 5-2 shows the Joiner transformation button:
Figure 5-2. Joiner Transformation Button
2. 3. 4.
Create a new Joiner transformation. Select all the ports from the SQ_sales_transactions object and copy/link them to the Joiner transformation. Select only the PRODUCT_ID and PRODUCT_COST ports from SQ_STG_PRODUCT object and copy them to the Joiner transformation. Your mapping should be similar to Figure 5-3.
Figure 5-3. Normal View of Heterogeneous Sources Connected to a Joiner Transformation
5.
Rename it to jnr_Sales_Transaction_To_STG_PRODUCT. Select the Ports tab. Set the Master (M) property to the STG_PRODUCT ports.
Tip: Which ports should be the Master? Use the source that is the smaller, in rows and bytes, if the data is not sorted. If the source data is sorted, use the source with the fewest number of join column duplicates.
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
107
Figure 5-4. Edit View of the Ports Tab for the Joiner Transformation
d. e. 6.
Uncheck the output check box for PRODUCT_ID. Rename the PRODUCT_ID port to IN_PRODUCT_ID.
Click the Add a new condition button. Figure 5-5 displays the Add a new condition button as selected.
Figure 5-5. Edit View of the Condition Tab for Joiner Transformation Without a Condition
108
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
b.
Select the Detail drop down box and set it to PRODUCT. Your condition should be the same as displayed in Figure 5-6.
Figure 5-6. Edit View of the Condition Tab for the Joiner Transformation with Completed Condition
Tip: The Joiner transformation can support multiple port conditions to create a join. If you need multiple port conditions simply click the Add a new condition button to add the other ports that make up the multiple port condition.
c. 7.
Click OK.
Link the following ports from the Joiner transformation to the corresponding columns in the target object. Example: JOINER PORT TARGET COLUMN
CUST_NO PRODUCT DEALERSHIP PAYMENT_DESC PROMO_ID DATE_ID TRANSACTION DATE TRANSACTION_ID EMPLOYEE_ID TIME_KEY SELLING PRICE PRODUCT_COST DELIVERY CHARGES QUANTITY DISCOUNT HOLDBACK REBATE CUST_ID PRODUCT_ID DEALERSHIP_ID PAYMENT_DESC PROMO_ID DATE_ID TRANSACTION_DATE TRANSACTION_ID EMPLOYEE_ID TIME_KEY SELLING_PRICE UNIT_COST DELIVERY_CHARGES SALES_QTY DISCOUNT HOLDBACK REBATE
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
109
2. 3.
Save the repository. Verify your mapping is valid in the Output window. If the mapping is not valid, correct the invalidations that are displayed in the message.
Launch the Workflow Manager application (if it is not already running) and log into the repository and your student folder. Open the Workflow Designer tool and create a new workflow named wkf_STG_TRANSACTIONS_xx. Add a new Session task using the session task icon. Select m_STG_TRANSACTIONS_xx from the Mapping list box and click OK. Link the Start object to the s_m_STG_TRANSACTIONS_xx session task object. Edit the s_m_STG_TRANSACTIONS_xx session task.
a. b. c. d. e.
Select the Mapping tab. Select SQ_sales_transactions located under the Sources folder in the Mapping navigator. Confirm that Properties | Attribute | Source file directory is set to $PMSourceFileDir\ In Properties | Attribute | Source filename verify that sales_transactions.txt is displayed. The file extension (.txt) must be present. Select SQ_STG_PRODUCT located under the Sources folder in the navigator window. Set the Connections | Type to your assigned Native_STGxx connection object.
f.
Select STG_TRANSACTIONS located under the Target folder in the navigator window. Set the Connections | Type to your assigned Native_STGxx connection object. Check the Truncate target table option checkbox.
7.
110
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
8.
Check Validate messages to ensure your workflow is valid. If you received an invalid message, correct the problem(s) and re-validate/save.
Step 8: Start the Workflow and View Results in the Workflow Monitor
1. 2. 3. 4.
Start the workflow. Confirm that the Workflow Monitor application launches automatically. Maximize the Workflow Monitor. Double-click the session with your left mouse button and view the Task Details window. Your information should appear similar to Figure 5-8.
Figure 5-8. Task Details of the Completed Session Run
5.
Select the Transformation Statistics tab. Your statistics should be similar to Figure 5-9.
Figure 5-9. Source/Target Statistics for the Session Run
6. 7. 8.
If your session failed or had an error proceed to the next step. Right-click the Session again and select Get Session Log. Search the session log for error messages that caused your session to have issues. Read the messages and correct the problem. Rerun your workflow to test your fix(s). Ask your instructor for help if you get stuck.
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
111
Data Results
Preview the target data from the Designer. Your data should appear the same as displayed in Figure 5-10.
Figure 5-10. Data Preview of the STG_TRANSACTIONS Table
112
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
113
114
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Technical Description
This lab will detail the use of 13 PowerCenter Designer features. Each of these features will increase the efficiency of any developer who knows how to use them efficiently. At the discretion of the instructor, this lab can also be completed as a demonstration.
Objectives
Auto Arrange Remove Links Revert to Saved Link Path Propagating Ports Autolink by Name and Position Moving Ports Shortcut to Port Editing from Normal View Create Transformation Methods Scale-To-Fit Designer Options Object Shortcuts and Copies Copy Objects Within and Between Mappings
Duration
50 minutes
115
Instructions
Open a Mapping
In the Designer tool:
1. 2.
116
1.
Choose Layout > Arrange All Iconic or right-click in the Workspace and select Arrange All Iconic.
Figure 5-13. Iconic View of an Arranged Mapping
2. 3.
Choose Layout > Arrange All or right-click in the Workspace and select Arrange All. Type Ctrl+S to save.
Tip: Notice the mapping would not save. When only formatting changes are made, it is not considered
a change. Another change must be made to the Repository in order for the formatting to be saved.
117
Tip: By default, each selected link changes in color from blue to red. If any other objects (e.g.,
transformations) were selected along with the links, redo the process. Press the Delete key to remove the connections. Ensure no icons are deleted.
the active mapping in the workspace is reverted. In the Source Analyzer, Target Designer, and Transformation Developer, individual objects may be reverted.
1.
2. 3.
118
In the Ports tab, select the OUT_CUST_NAME port and click the Delete button.
4. 5. 6.
Similarly, delete the AGE port. Edit the SQ_customer_layout Source Qualifier and remove the AGE port. Select only the SQ_customer_layout Source Qualifier and choose Edit > Revert to Saved. The same dialog box appears - all changes must be reverted.
7.
Select Yes to proceed. Notice all changes were reverted, not just the changes made to the SQ_customer_layout.
Ensure that the mapping is in the arranged normal view. Right-click on CUSTOMER_NO in the SQ_customer_layout Source Qualifier and choose Select Link Path > Forward.
Figure 5-16. Selecting the forward link path
Notice how the path for CUSTOMER_NO, from SQ_customer_layout all the way to STG_CUSTOMERS, is highlighted in red.
Figure 5-17. Highlighted forward link path
119
3.
Right-click on the OUT_CUST_NAME port in the exp_Format_Name_Gender_Phone and select Link Path > Both.
Figure 5-18. Highlighted link path going forward and backward
Notice how the OUT_CUST_NAME port's path not only shows where it proceeds to the STG_CUSTOMERS target definition, but also from its origin all the way back to the customer_layout source definition. Both the IN_FIRSTNAME and IN_LASTNAME are used in the formula to produce OUT_CUST_NAME, so both links are highlighted in red.
Edit SQ_customer_layout and change CUSTOMER_NO to CUST_NO and change the Precision to 10. Click OK. Right click on CUST_NO in the SQ_customer_layout transformation and select Propagate Attributes.
Figure 5-19. Selecting to propagate the attributes
120
4.
Under Attributes to propagate, choose Name and Precision with a Direction of Forward.
Figure 5-20. Propagation attribute dialog box
5.
Choose Preview. Notice the arrow between SQ_customer_layout and fil_Customer_No_99999 turns green. The green arrow indicates the places where a change would be made. Why is there only one change?
6.
7. 8. 9. 10.
Click Close. Edit SQ_customer_layout and change GENDER to CUST_GENDER and change the Precision to 7. Click OK. Right click on CUST_GENDER in the SQ_customer_layout transformation and select Propagate Attributes.
a. b.
Under Attributes to propagate, choose Name and Precision with a direction of Forward. Select Preview. Notice the green arrows? What will be changed?
c. d. e.
Select Propagate. Edit exp_Format_Name_Gender_Phone and open the Expression Editor for OUT_GENDER. Notice the expression now contains CUST_GENDER. Close the Propagate dialog box.
Link by name Link by name and prefix Link by name and suffix
The Designer adds links between input and output ports that have the same name. Linking by name is case insensitive. Link by name when using the same port names across transformations.
1.
121
2. 3.
Remove the links between the exp_Format_Name_Gender_Phone and the STG_CUSTOMERS target definition. Right-click in the white space inside the mapping. Choose Autolink by Name
Tip: Only one transformation may be selected in the From Transformation box and one to many transformations may be selected in the To Transformations box. For objects that contain groups, such as Router transformations or XML targets, select the group name from the To Transformations list.
4. 5.
Select the exp_Format_Name_Gender_Phone transformation from the From Transformation dropdown menu; then highlight the STG_CUSTOMERS transformation in the To Transformations box. Click OK.
122
Notice that nothing happened. Look carefully at the exp_Format_Name_Gender_Phone and STG_CUSTOMERS and you will notice that none of the ports match exactly, therefore autolink by name will not work in this situation. Would autolink by position work?
Tip: When autolinking by name, the Designer adds links between ports that have the same name, case insensitive. The Designer also has the ability to link ports based on prefixes or suffixes defined. Adding suffixes and/or prefixes in port names help identify the ports purpose. For example, a suggested best practice is to use the prefix OUT_ when the port is derived from input ports that were modified as it passes through the transformation. Without this feature, Autolink would skip over the names that don't match and force the developer to manually link the desired ports.
6. 7. 8. 9. 10. 11.
Select Layout > Autolink. Select the exp_Format_Name_Gender_Phone transformation from the From Transformation dropdown menu; then highlight the STG_CUSTOMERS transformation in the To Transformations box. Select the Name radio button. Click More to view the options for entering prefixes and suffixes. Note the button toggles to become the Less button. Type OUT_ in the From Transformation Prefix field. Click OK Notice that only the OUT_CUST_NAME port was linked. This is because this is the only port with a matching name.
Figure 5-22. Defining a prefix in the autolink dialog box
Revert to Saved to reset the mapping. Open the exp_Format_Name_Gender_Phone and click on the Ports tab. Single-click on the AGE port and move it up to the top using the Up arrow icon found in the upper right corner of the toolbar.
123
4. 5.
Single-click on the number to the left of the IN_PHONE_NUMBER port. Single-click and hold the left mouse button and note the faint square that appears at the bottom of the pointer.
Figure 5-24. Click and drag method of moving ports
6. 7.
Move PHONE_NUMBER directly below AGE. Click Cancel to discard the changes.
Revert to Saved to reset the mapping. Resize or scroll down until the AGE port appears in the exp_Format_Name_Gender_Phone. Double-click on the AGE port. Notice you are now in the Ports tab. Delete the AGE port.
Revert to Saved to reset the mapping. On the Transformation toolbar, find the Aggregator Transformation Move the mouse into the Workspace. The cursor changes to crosshairs. button and single-click.
2. 3.
124
4.
Single-click in the workspace where you want to place the transformation. The selected transformation appears in the desired location of the Workspace and the cursor changes back to an arrow.
Tip: When the mouse pointer hovers over a transformation icon in the toolbar that the name of the transformation object appears momentarily.
5.
6.
7. 8.
Enter the name agg_TargetTableName and click Create. Click on the Done button and the new transformation appears in the Workspace.
Figure 5-27. Normal View of the Newly Created Aggregator Transformation
125
There are features to change the magnification of the contents of the Workspace. Use the toolbar or the Layout menu options to set zoom levels. The toolbar has the following zoom options:
Figure 5-28. Zoom options
2. 3. 4.
on the toolbar.
Click anywhere in the Workspace and the mapping will zoom out by 10% each time the mouse is clicked. Keep clicking until the mapping is small enough to fit within the window.
Tip: Zoom out 10% on button. Uses a point selected as the center point from which to decrease the
5. 6.
on the toolbar.
Click anywhere in the Workspace and the mapping will zoom in by 10% each time the mouse is clicked.
Tip: Zoom in 10% on button increases the current magnification of a rectangular area selected.
Degree of magnification depends on the size of the area selected, Workspace size, and current magnification.
7.
8.
3. 4. 5.
Select the Expression transformation type from the drop-down list. Delete the Length/Precision from the selected box. Click OK Notice how the Length/Precision no longer appears in the Expression transformation.
6.
Select and double-click the DEV_SHARED folder. Note that the folder name in the Navigator window is now bold. This means that the folder is open.
Figure 5-29. Navigator window in the Designer
2.
Open your student folder by either double clicking on it or by right-clicking on it and selecting open. Note that the DEV_SHARED folder is no longer bold (open) but it remains expanded so you can see the subfolders.
Tip: Only one folder at a time can be open. Any number of folders can be expanded so that the subfolders and objects are visible. As we will see below, it is important to distinguish between expanded folders and the open folder.
3. 4. 5.
Open the Mapping Designer and close any mapping that is in the workspace. Expand the Mappings subfolder in the DEV_SHARED folder. Click and drag the m_Stage_Customer_Contacts mapping to the Mapping Designer workspace and release the mouse button.
127
6. 7.
Click Yes Save the changes to the repository. Note that your folder now has a shortcut to the mapping. Select the menu option Mappings ' Edit to see how the shortcut location is displayed.
8. 9.
Open the Filter transformation in edit mode. Note that all properties are grayed-out and not editable. A shortcut can never be edited directly. Perform the same click-and-drag operation with the same mapping, only this time press the [Ctrl] key after you have begun to drag the mapping. Note that this creates a copy of the mapping instead of a shortcut. Click No in the Copy Confirmation message box.
Tip: The destination folder (the folder you are placing the copy or shortcut into) must be the open folder. The origin folder that contains the original object will be expanded.
10.
We will now learn how to copy an object within the same folder. The instructions below are to copy a mapping but the same procedure can be used for any other object.
1. 2. 3.
In the Navigator window, select any mapping in your folder. Press Ctrl+C on your keyboard, followed immediately by Ctrl+V. Click Yes in the Copy Confirmation message box. The Copy Wizard will be displayed.
4. 5. 6.
The red x on the mapping indicates a conflict. Choose Rename for the conflict resolution. Click the Edit button. If desired, you can supply your own new name to the mapping to replace the 1 added by the Designer. Mappings within a folder must have unique names. Click Next, then Finish.
Tip: A common error when copying objects within a folder is to use the mouse to move the cursor from
the object to the workspace after copying the object Ctrl+C. This is unnecessary and will cause the copy operation to fail.
128
2. 3.
Use your left mouse button to draw a rectangle that encloses the Filter and the Expression transformations. These objects will then appear selected. Press Ctrl+C on your keyboard, followed immediately by Ctrl+V. Note that both transformations have been copied into the mapping, including the data flow between the input and output ports. They have been automatically renamed with a 1 on the end of their names.
4. 5.
Open another mapping in the Mapping Designer. It does not matter which mapping is used, provided it is not a shortcut. Press Ctrl+V. The transformations are copied into the open mapping.
Tip: The copy objects within and between mappings feature can be used only within a single folder.
6.
129
130
131
Type
Passive.
Description
A Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source. In SQL terms a Lookup transformation may be thought as a sub-query. The basic Lookup transformation types are connected, un-connected and dynamic.
Properties
We will discuss only some of the properties in this section. The remaining properties will be discussed in other sections.
132
Option Lookup SQL Override Lookup Table Name Lookup Policy on Multiple Match Lookup Condition Connection Information
Lookup Type Relational Relational Flat File, Relational Flat File, Relational Relational
Description Overrides the default SQL statement to query the lookup table.Use only with the lookup cache enabled. Specifies the name of the table from which the transformation looks up and caches values. Determines what happens when the Lookup transformation finds multiple rows that match the lookup condition. You can select the first or last row returned from the cache or lookup source, or report an error. Displays the lookup condition you set in the Condition tab. Specifies the database containing the lookup table. You can select the exact database connection or you can use the $Source or $Target variable. If you use one of these variables, the lookup table must reside in the source or target database you specify when you configure the session. If you select the exact database connection, you can also specify what type of database connection it is. Indicates that the Lookup transformation reads values from a relational database or a flat file. Sets the amount of detail included in the session log when you run a session containing this transformation. If you do not define a datetime format for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can enter any datetime format. The default is MM/DD/YYYY HH24:MI:SS.
133
Description If you do not define a thousand separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can choose no separator, a comma, or a period. The default is no separator. If you do not define a decimal separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can choose a comma or a period decimal separator. The default is period. If selected, the Integration Service uses case-sensitive string comparisons when performing lookups on string columns. Note: For relational lookups, the case-sensitive comparison is based on the database support. Determines how the Integration Service orders null values. You can choose to sort null values high or low. By default, the Integration Service sorts null values high. Note: For relational lookups, null ordering is based on the database support. Indicates whether or not the lookup file data is sorted.
Decimal Separator
Flat File
Flat File
Null Ordering
Flat File
Sorted Input
Flat File
Business Purpose
A business may bring data from various sources but additional data from local sources may be need such as product codes, dates, names, etc.
Example
In the following example an insurance company pays commissions on each new policy; however there may be a possibility by clerical error duplicate policies may be submitted. The goal is to check submitted policies against current list and reject the policies which are duplicates. A policy number is passed to a connected Lookup transformation is used to check the current policy table for the pre-existence of a policy. If the policy number exists the matching policy number is returned, if the policy number does not exist a null value is returned. The return is used as the Group Filter Condition in the Router transformation. The Router filter condition is ISNULL (POLICY_NO1) and is based on the return value from the Lookup transformation POLICY_NO port NOT the value from the Source Qualifier. Rows from the source which have no match (null return) in the lookup table will make the filter condition and pass to the new (POLICY_NEW) target. All other rows go to the Router Default group and are passed to the reject (ROLICIES_REJ) target.
Performance Considerations
All rows pass through a connected Lookup so there may be performance degradation in executing additional Lookups when there are not needed. Caching a very large table may require a large amount of memory.
134
Reusable transformations are listed in the Transformations node of the Navigator. Drag and drop them in any mapping to make a shortcut and then override the properties as needed.
Key Points
You can also copy them as non-reusable by depressing the Ctrl key while dragging. You can edit ports only in the Transformation Developer. Instances dynamically inherit changes. Source Qualifier transformations cannot be reusable. Changing reusable transformations can invalidate mappings
135
136
Technical Description
We have three text files coming in daily with employee information that we would like to put into a file list. We need to find a salary for each employee, concatenate first name and last name, change the format of age and phone number and add a load date.
Objectives
Duration
45 Minutes
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
137
File list will be read, source data will be reformatted, a load date will be added and salary information for each employee will be added. Daily Target Append
SOURCES
Files File Name employees_central.txt, employees_east.txt, employees_west.txt Definition in employees_layout.txt employees_list.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info These 3 comma delimited flat files will be read into the session using a filelist employees_list.txt. The layout of the flat files can be found in employees_layout.txt. File list
C:\pmfiles\SrcFiles
TARGETS
Tables Table Name STG_EMPLOYEES Schema Owner Update TDBUxx Delete Insert X Unique Key
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override lkp_salary salaries.txt Location C:\pmfiles\LkpFiles
EMPLOYEE_ID = IN_EMPLOYEE_ID
138
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
139
140
Target Table
Target Column
STG_EMPLOYEES
EMPLOYEE_ID
STG_EMPLOYEES
EMPLOYEE_NAME
STG_EMPLOYEES
EMPLOYEE_ADDRESS
STG_EMPLOYEES
EMPLOYEE_CITY
STG_EMPLOYEES
EMPLOYEE_STATE
STG_EMPLOYEES
EMPLOYEE_ZIP_CODE
STG_EMPLOYEES
EMPLOYEE_COUNTRY
STG_EMPLOYEES
EMPLOYEE_PHONE_NMBR
EMPLOYEE_FAX_NMBR
varchar2
employees_layout
FAX_NUMBER
STG_EMPLOYEES
EMPLOYEE_EMAIL
STG_EMPLOYEES
EMPLOYEE_GENDER
GENDER is currently either M or F. It needs to be Male, Female or UNK The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and Greater than 60
STG_EMPLOYEES
AGE_GROUP
varchar2
employees_layout
Derived
NATIVE_LANG_DESC
varchar2
STG_EMPLOYEES
SEC_LANG_DESC
STG_EMPLOYEES
TER_LANG_DESC
STG_EMPLOYEES
POSITION_TYPE
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
STG_EMPLOYEES
REGIONAL_MANAGER
Target Table number(p,s) varchar2 number(p,s) date date employees_layout DATE_ENTERED employees_layout HIRE_DATE employees_layout Derived A Salary field for each Employee ID can be found in salaries.txt. employees_layout DEALERSHIP_MANAGER employees_layout DEALERSHIP_ID
Target Column
Data type
Source File
Source Column
Expression
STG_EMPLOYEES
DEALERSHIP_ID
STG_EMPLOYEES
DEALERSHIP_MANAGER
STG_EMPLOYEES
EMPLOYEE_SALARY
STG_EMPLOYEES
HIRE_DATE
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
STG_EMPLOYEES
DATE_ENTERED
141
Instructions
Step 1: Create a Flat File Source Definition
1. 2. 3.
Launch the Designer client tool (if it is not already running) and log into the PC8_DEV repository. Import employees_layout.txt comma delimited flat file into your student folder. Make sure that you import the field names from the first line. Save the repository. Your source definition should look the same as displayed in Figure 6-1.
Figure 6-1. Source Analyzer view of the employees_layout flat file definition
In the Target Designer, import the STG_EMPLOYEES table. Save the repository. Your target definition should look the same as Figure 6-2.
Figure 6-2. Target Designer view of the STG_EMPLOYEES relational table definition
142
2.
Edit exp_Format_Name_Gender_Phone and check the Make reusable box on the Transformation tab.
Figure 6-3. Transformation edit dialog box showing how to make a transformation reusable
3.
saved in the Transformations node within the Navigator window and will be available as a standalone object to drag into any mapping as a shortcut.
Figure 6-5. Transformation edit dialog box of a reusable transformation
4. 5. 6.
Review the Transformation dialog box. What differences do you now see? Select the Ports tab. Can you change anything here? Why are you unable to make changes? Open the Transformation Developer by clicking the respective icon in the toolbar. .
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
143
7.
From the Navigator Window, locate the Transformations node in your respective student folder.
Figure 6-6. Navigator window depicting the Transformations node
8. 9.
Drag exp_Format_Name_Gender_Phone into the Transformation Developer workspace. Edit exp_Format_Name_Gender_Phone and add the prefix re_ to rename it to re_exp_Format_Name_Gender_Phone_Load_Date.
Velocity Best Practice: It is a Velocity recommendation that reusable transformations use the prefix
Change the name of the OUT_CUST_NAME port to OUT_NAME. Change the name of the OUT_CUST_PHONE port to OUT_PHONE. Click OK.
11.
Open the Mapping Designer by clicking the respective icon in the toolbar. Create a new mapping named m_STG_EMPLOYEES_xx. Add employees_layout.txt flat file source to the new mapping. Add STG_EMPLOYEES relational target to the new mapping. Your mapping should appear similar to Figure 6-7.
Figure 6-7. Partial mapping with source and target
144
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Select the Lookup transformation tool bar button located on the Transformations tool bar with a single left click. The selected icon in Figure 6-8 identifies the Lookup tool button. T
Figure 6-8. Transformation Toolbar
2. 3.
Move your mouse pointer into the Mapping Designer Workspace and single click your left mouse button. This will create a new Lookup Transformation. Choose Import > From Flat File for the location of the Lookup Table.
Figure 6-9. Lookup Transformation table location dialog box
4. 5. 6.
Locate the c:\pmfiles\LkpFiles directory and select the file salaries.txt. If the file is located in a different directory, your instructor will specify. The Flat File Import Wizard will appear. Confirm that the Delimited option button is selected. Select the Import field names from first line check box. Your Wizard should appear similar to Figure 6-10.
Figure 6-10. Dialog box 1 of the 3 step Flat File Import Wizard
7.
Click Next.
145
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Confirm that only the Comma check box under Delimiters is selected. Select the No quotes option button under Text Qualifier. Click Next. Confirm that the field names are displayed under Column Information. These were imported from the first line of the file. Click Finish. Confirm that your Lookup Transformation appears as displayed in Figure 6-11.
Figure 6-11. Normal view of the newly created Lookup Transformation
14. 15.
Drag and drop EMPLOYEE_ID from SQ_employees_layout to the new Lookup Transformation. Edit the Lookup Transformation. Rename it to lkp_salaries.
Velocity Best Practice: Velocity naming conventions specify to name Lookup transformations lkp_LOOKUP_TABLE_NAME.
16.
IN_.
17. 18. 19.
Uncheck the output port for IN_EMPLOYEE_ID. Select the Condition tab. Select the Add a new condition button. PowerCenter will choose the first lookup port and the first input port automatically.
146
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
20. 21.
2. 3.
Save the repository. Link the Following ports from lkp_SALARY to STG_EMPLOYEES:
SALARY EMPLOYEE_SALARY
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
147
4.
5.
6.
Launch the Workflow Manager client and sign into your assigned folder. Open the Workflow Designer tool and create a new workflow named wkf_STG_EMPLOYEES_xx. Create a session task using the session task tool button. Select m_STG_EMPLOYEES_xx from the Mapping list box and click OK. Link the Start object to the s_m_STG_EMPLOYEES_xx session task object. Edit the s_m_STG_EMPLOYEES_xx session. Under the Mapping tab:
Confirm that Source file directory is set to $PMSourceFileDir\. In Properties | Attribute | Source filename type in employees_list.txt. In Properties | Attribute | Source filetype click the drop-down arrow and change the default from Direct to Indirect. Your Mapping | Source | Properties | Attributes should be the same as Figure 6-13.
Figure 6-13. Source properties for the employee_list file list
Select STG_EMPLOYEES located under the Target folder in the navigator window. Set the relational target connection object property to NATIVE_STGxx where xx is your student number. Check the property Truncate target table option in the target properties.
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
148
Select lkp_salaries from the Transformations folder in the navigator window. Verify the Lookup source file directory is $PMLookupFileDir\. Type salaries.txt in the Lookup filename.
8. 9. 10. 11.
Save the repository. Check Validate messages to ensure your workflow is valid. Start the workflow. Review the Task Details
Figure 6-14. Task Details of the completed session run
12.
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
149
13.
Use Preview Data feature in the Designer to view the data results.
Figure 6-16. Data Preview of the STG_EMPLOYEES target table
150
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
151
152
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Technical Description
To load the date staging area, we will use Informatica date functions and variables to transform a date value and date id. The raw dates are in a flat file.
Objectives
Copy an Expression transformation to convert a string date to various descriptive date columns. Use the Expression Editor to create or view expressions and become familiar with date function syntax. Understand the evaluation sequence of input, output, and variable ports. Learn how to use variable ports.
Duration
30 minutes
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
153
A text file will run through an expression to do date manipulation and load to our date staging area. Once Target append
SOURCES
Files File Name dates.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info Comma delimiter
TARGETS
Tables Table Name STG_DATES Schema Owner Update Delete Insert X Unique Key
Source
Expression
Target
154
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
This mapping will generate the date staging table from the dates text file. The Expression transformation is used to derive the different date values.
Target Table
Target Column
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
STG_DATES
DATE_ID_LEGACY
STG_DATES
DATE_VALUE
STG_DATES
DAY_OF_MONTH
STG_DATES
MONTH_NUMBER
STG_DATES
YEAR_VALUE
STG_DATES
DAY_OF_WEEK
STG_DATES
DAY_NAME
STG_DATES
MONTH_NAME
STG_DATES
DAY_OF_YEAR
STG_DATES
MONTH_OF_YEAR
STG_DATES
WEEK_OF_YEAR
STG_DATES
DAY_OVERALL
STG_DATES
WEEK_OVERALL
STG_DATES
MONTH_OVERALL
STG_DATES
YEAR_OVERALL
STG_DATES
HOLIDAY_INDICATOR
STG_DATES
WORKDAY_INDICATOR
STG_DATES
WEEKDAY_INDICATOR
STG_DATES
WEEKEND_INDICATOR
STG_DATES
QUARTER_OF_YEAR
155
156
Target Table Expression The current season. Flag to indicate the current date is last day of the month. Flag to indicate the current date is last day of the quarter. Flag to indicate the current date is the last day of the year. dates.txt dates.txt dates.txt dates.txt derived derived derived derived
Target Column
Source File
Source Column
STG_DATES
SEASON
STG_DATES
LAST_DAY_IN_MONTH
STG_DATES
LAST_DAY_IN_QUARTER
STG_DATES
LAST_DAY_IN_YEAR
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Instructions
Step 1: Create a Flat File Source Definition
1. 2. 3. 4.
Launch the Designer (if it is not already running) and connect to the PC8_DEV repository. Open your student folder. Import the dates.txt comma delimited flat file source using the Flat File Wizard. Make sure that you import the field names from the first line. Save the repository.
Import the STG_DATES table using the Target Designer. Save the repository.
Create a new mapping named m_STG_DATES_xx. Add dates flat file source to the mapping. Add the STG_DATES target to the mapping. Your mapping should appear similar to Figure 6-17.
Figure 6-17. Mapping with Source and Target definitions
4. 5.
select the re_exp_STG_DATES. With your left mouse button, drag the transformation toward your mapping but DO NOT DROP IT. Hold down the Ctrl key. Drop the transformation into the mapping. Click Yes on the Copy Confirmation message box.
Note: If the confirmation box says Shortcut instead of Copy, try again and make sure that you
hold down the Ctrl key continuously as you drop the transformation into the mapping.
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
157
6. 7. 8.
Link the two output ports on the Source Qualifier to the two input ports on the Expression transformation, matching the names. Use the Autolink feature to link the output ports in the Expression transformation to the corresponding fields in the target definition - by Position. Save the mapping and confirm it is valid. Your mapping will appear the same as in Figure 6-18.
Figure 6-18. Completed Mapping
9. 10.
Edit the Expression transformation and click on the Ports tab. Examine the structure of the Expression transformation ports and expressions. Note that the DATE_ID is an integer that is passed directly to the target table unchanged. The input port DATE supplies a string that describes an individual date, such as 'May 20, 2005'. The variable ports will process that string in various ways in order to extract a specific descriptor, such as the day of the week, the quarter, the month, whether the date is a holiday, etc. These descriptors will later be used in the data warehouse to group and filter report data.
11.
Examine some of the variable port expressions and see if you can determine how they work. You can use PowerCenter Help to view the syntax for any function. If you wish, ask your instructor for clarification on any of the expressions. Note that variable ports cannot be output ports, so a separate set of output ports is used at the bottom of the transformation in order to output the data to the target. Most of these output ports simply call a variable port. Variable ports were used in this transformation because they will be resolved one at a time, top to bottom. In this case, some of the later expressions are dependent on the results of the earlier expressions.
Tip: Informatica evaluates ports in the following order: input/output (input only as well), variable, and then output. Variables are evaluated in top down order, so it is important to put them in a specific order.
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
158
Launch the Workflow Manager application (if it's not already running) and connect to the PC8_DEV repository. Open your student folder. Create a new workflow named wkf_Load_STG_DATES_xx. Create a session named s_m_STG_DATES_xx that uses the m_STG_DATES_xx mapping, Edit the session you just created.
a. b. c. d. e. f.
Select the Mapping tab. Select the Source Qualifier icon SQ_dates. In the Properties area scroll down and confirm the source file name and location. Ensure that the Source Filename property value includes the .txt extension. Select the target STG_DATES. Select your appropriate target connection object. Select the option Truncate target table.
6. 7.
Complete the workflow by linking the Start task to the session task. Save the repository.
Start the workflow. Maximize the Workflow Monitor and select the Task View. Review the Task Details. Your information should appear the same as in Figure 6-19.
Figure 6-19. Task Details of the completed session run
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
159
4.
160
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Data Results
Use the Preview Data feature in the Designer to view the data results. Your results should appear similar to those in Figure 6-21 through Figure 6-22.
Figure 6-21. Data preview of the STG_DATES table - screen 1
Figure 6-22. Data preview of the STG_DATES table - screen 2 scrolled right
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
161
162
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Unit 7: Debugger
In this unit you will learn about:
Debugging mappings
Start the Debugger. A spinning Debugger Mode icon is displayed - stops when the Integration Service is ready. Choose an existing session or define a one-time debug session. Options:
Load or discard target data Save debug environment for later use Output window - view Debug or Session log. Transformation Instance Data window - view transformation data. Target Instance window - view target data. Next Instance. Runs until it reaches the next transformation or satisfies a breakpoint condition. Step to Instance. Runs until it reaches the selected transformation instance or satisfies a breakpoint condition. Show current instance. Displays the current instance in the Transformation Instance window.
163
3.
4.
Continue. Runs until it satisfies a breakpoint condition. Break now. Pauses wherever it is currently processing. Change data Change variable values Add or change breakpoints
5.
Modify data and breakpoints. When the Debugger pauses, you can modify:
164
165
166
Technical Description
The Debugger will be used to track down the cause of the error or errors.
Objectives
Duration
30 minutes
167
A text file is run through an Expression transformation to do date manipulation and load our date staging area. Once Target append
Sources
Files File Name dates.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info Comma delimiter
Targets
Tables Table Name STG_DATES_VIEW Schema Owner Update Delete Insert X Unique Key
Source
Expression
Target
168
Target Table
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
169
Instructions
Step 1: Copy and Inspect the Debug Mapping
1.
Locate the mapping m_STG_DATES_DEBUG and copy it to your folder. If a source or target conflict occurs choose Reuse.
2. 3.
Get an overall idea what kind of processing is being done. Read each of the expressions in the Expression transformation. Note that the mapping is a simplified version of the one used in Unit 6 Lab B.
You have been told only that there is an error in the data being written to the target, without any further clarification as to the nature of the error.
Tip: Many mapping errors can be found by carefully inspecting the mapping - without using the Debugger. However, if the error cannot be located in a timely fashion in this manner, the Debugger will assist you by showing the actual data passing through the transformation ports. In order to properly use the Debugger, you must first understand the logic of the mapping.
Press your F9 key. This evokes the Debug Wizard. The first page of the Debug Wizard is informational. Please read it.
Tip: The Debugger requires a valid mapping and session to run; it cannot help you determine why a
mapping is invalid. The Designer Output Window will show you the reason(s) why a mapping is invalid.
2.
170
Your Wizard should appear similar to Figure 7-1 below. Accept the default setting - Create a debug session instance for this mapping, and press the Next button.
Figure 7-1. Debug Session creation dialog box
The next page of the Wizard allows you to set connectivity properties. This information is familiar to you from creating sessions, except that here it is a subset of the regular session options and is formatted somewhat differently.
3. 4.
Set the Target Connection Value to your target schema database connection object. The debugger data will be discarded in a later step so this value will be ignored. Select the Properties tab at the bottom. Your Wizard should appear as in Figure 7-2 below.
Figure 7-2. Debug Session connections dialog box
Ensure that the Source Filename property values includes the .txt extension. In this lab, verify you enter dates.txt. Ensure that the Target load type property value is set to Normal
171
5. 6. 7. 8. 9.
Press the Next button. We will not be overriding transformation properties, so press Next again. Accept the defaults on the Session Configuration Wizard page and press Next. The final Wizard page allows us to choose whether or not to discard the target data (the default) and choose which target data to view. Accept the defaults here as well. When you press the Finish button, a Debug session will be created and it will initialize, opening the required database connections. No data will be read until we are ready to view it.
Set the Target Instance and Instance drop-boxes as shown in Figure 7-3 as well.
Note: The term instance is sometimes used as a synonym for transformation. Figure 7-3. Designer while running a Debug Session
As mentioned earlier, the Debug session is initialized at this point but no data is read. We will manually control the debugger so we can easily review the data values and spot the error. The debugger can be controlled via the Designer menu, via hotkeys (described in the menu), or with the Debug Toolbar. We will use the toolbar.
172
2.
The Debug Toolbar is not visible by default. To make it visible, select the menu option Tools > Customize. You will see the dialog box shown in Figure 7-4.
Figure 7-4. Customize Toolbars Dialog Box
3. 4.
Select the Debugger toolbar. Click OK. The Debug Toolbar is short. When it is undocked, it appears as in Figure 7-5. If you cannot see it right away, look for the red stop sign on the right.
Figure 7-5. Debugger Toolbar
Tip: if you cannot find the Debugger Toolbar after using the menu option to select it, another toolbar has shifted it off the screen. Re-arrange the other docked toolbars until you can see it.
5.
You can cause one row of data to be read by the Source Qualifier by pressing the third toolbar button - tooltip Next Instance. Note that some data is shown in the Instance window.
6.
Toggle the Instance drop-box to the Expression transformation. The data has not yet gone that far.
Note: No data available means null in the Debugger.
7.
Press the fourth toolbar button - tooltip Step to Instance. Note that one more row has been read, and the first row has been pushed into the Expression transformation and the Target table.
8.
Press the Next Instance toolbar button (third) several times. Note that each time it is pressed, one more row is read and one more row (the row that was read from the previous press) is loaded into the target. The Instance window jumps between the Source Qualifier and the Expression (i.e., it follows the row).
173
9.
Press the Step to Instance toolbar button (fourth) several times. Note that it also causes one row to be read and written, but the Instance window shows only the data in one transformation - the one chosen in the drop-box. Examine the data being sent to the target. What is the error? Hint: compare the values with the actual date being read from the source file.
10.
Now that you are familiar with the basics of operating the Debugger, locate the cause of the error.
Stop the Debugger by pressing the second toolbar button. Press Yes. Fix the mapping error. Save the Repository. Re-start the Debug Wizard as in Step 2. Note that your Debug session properties (such as connectivity) have been saved locally, making it easier for you to evoke the Debugger again if needed. Confirm that the data being sent to the target is now correct.
174
175
176
Type
Passive.
Description
The Sequence Generator Transformation generates unique numeric values that can be used to create keys. The values created by the sequence generator are sequential but not guaranteed to be contiguous. The Sequence Generator is an output only transformation with two outputs represented by the NEXTVAL and CURRVAL ports. Typically connect the NEXTVAL port to generate a new key. When connected to multiple targets the output of the Sequence Generator generates sequential values for each target. To use the same value for each target, pass the output of the Sequence Generator to an Expression transformation before connecting it to a target.
177
178
Properties
Property Start Value Description The start value of the generated sequence that you want the Integration Service to use if you use the Cycle option. If you select Cycle, the Integration Service cycles back to this value when it reaches the end value. The value you want the sequence generator to increment by. The maximum value the Integration Service generates. The current value of the sequence. If selected, the Integration Service cycles through the sequence range. The number of sequential values the Integration Service caches at a time. If selected, the Integration Service generates values based on the original current value for each session. Level of detail about the transformation that the Integration Service writes into the session log.
Increment By End Value Current Value Cycle Number of Cached Values Reset Tracing Level
Business Purpose
A business receives customer information which is used to update a data warehouse customer dimension table with a customer history. A sequence generator is used to create surrogate keys to maintain referential integrity within the dimension table since a customer may have duplicate entries.
Example
The following example shows a partial mapping where the sequence generator is used to generate a new key for the Dates dimension table.
Performance Considerations
It is best to configure the Sequence Generator transformation as close to the target as possible in a mapping otherwise a mapping will be carrying extra sequence numbers through the transformation process which will not be transformed.
179
180
Technical Description
PowerCenter will extract the dates from a shared relational table and load them into a shared relational table. All columns in the source table have matching columns in the target table. A primary key for the target table will be assigned using the Sequence Generator transformation.
Objectives
Create sources and targets based on shortcuts Create a Sequence Generator transformation Create unique integer primary key values using the NEXTVAL port
Duration
20 Minutes
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
181
Source relational table will be directly loaded into a relational target. The primary key for the target table will be assigned by a sequence generator. Once
SOURCES
Tables Table Name STG_DATES Schema/Owner TDBUxx Selection/Filter
TARGETS
Tables Table Name DIM_DATES Schema Owner Update TDBUxx Delete Insert X Unique Key DATE_KEY
Relational Source
Relational Target
182
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Target Table
Target Column
DIM_DATES
DATE_KEY
DIM_DATES
DATE_VALUE
DIM_DATES
DATE_ID_LEGACY
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
DIM_DATES
DATE_OF_MONTH
DIM_DATES
MONTH_NUMBER
DIM_DATES
YEAR_VALUE
DIM_DATES
DAY_OF_WEEK
DIM_DATES
DAY_NAME
DIM_DATES
MONTH_NAME
DIM_DATES
DAY_OF_YEAR
DIM_DATES
MONTH_OF_YEAR
DIM_DATES
WEEK_OF_YEAR
DIM_DATES
DAY_OVERALL
DIM_DATES
WEEK_OVERALL
DIM_DATES
MONTH_OVERALL
DIM_DATES
YEAR_OVERALL
DIM_DATES
HOLIDAY_INDICATOR
DIM_DATES
WORKDAY_INDICATOR
DIM_DATES
WEEKDAY_INDICATOR
DIM_DATES
WEEKEND_INDICATOR
DIM_DATES
QUARTER_OF_YEAR
DIM_DATES
SEASON
DIM_DATES
LAST_DAY_IN_MONTH
183
184
Target Column
Source File
Source Column
Expression
DIM_DATES
LAST_DAY_IN_QUARTER
DIM_DATES
LAST_DAY_IN_YEAR
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Instructions
Step 1: Create a Shortcut to a Shared Relational Source Table
1.
Expand the DEV_SHARED folder and locate the source definition STG_DATES in the ODBC_STG node. Notice that this STG_DATES object is a source, while the STG_DATES that you have already used is a target. Ensure that your student folder is open. Drag and drop the STG_DATES source definition from the DEV_SHARED folder into the Source Analyzer. Click Yes to confirm the shortcut. Rename the shortcut SC_STG_DATES.
2. 3. 4. 5.
You should now see the SC_STG_DATES shortcut in your own student folder.
Velocity Best Practice: The SC_ prefix is the Velocity Best Practice naming convention for shortcut
objects.
6.
In the DEV_SHARED folder, located the target DIM_DATES. Drag and drop DIM_DATES into the Target Designer. Click Yes to confirm the shortcut. Rename the shortcut SC_DIM_DATES. Save your work.
You will now be able to see the SC_DIM_DATES shortcut in your own student folder.
Create a new mapping named m_DIM_DATES_LOAD_xx. Add the SC_STG_DATES relational source to the new mapping. Add the SC_DIM_DATES relational target to the new mapping. Expand the mapping objects.
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
185
From the Transformation toolbar, select the Sequence Generator transformation icon.
Figure 8-2. Sequence Generator Transformation icon
2.
Generator transformation by connecting the NEXTVAL port to the desired transformation or target and using the widest range of values (1 to 2147483647) with the smallest interval (1).
3.
From the Sequence Generator transformation select the NEXTVAL port and link it to the DATE_KEY column of the SC_DIM_DATES target.
Figure 8-3. Normal view of the sequence generator NEXTVAL port connected to a target column
4.
186
5.
Select the Properties tab and observe the properties available in the sequence generator.
a. b.
Check the Reset Transformation Attribute Value. Describe the following properties. Use the Help system to find the answers. Increment by:________________________________________________ Current value:________________________________________________
6. 7.
Click the OK button to return to the Normal view of the sequence generator. Save your work.
Link all the ports from the Source Qualifier transformation to the corresponding columns in the target object using Autolink by name. See Figure 8-4.
Figure 8-4. Normal view of connected ports to the target
2. 3.
Save your work. Verify your mapping is valid in the Output window. If the mapping is not valid, correct the invalidations that are displayed in the message.
Launch the Workflow Manager (if not already running) and connect to the repository and open your student folder. From Workflow Designer create a new workflow named wkf_DIM_DATES_LOAD_xx. Use the Session task icon and create a new Session task. Associate the m_DIM_DATES_LOAD_xx mapping to the new session task. Link the Start object to the s_m_DIM_DATES_LOAD_xx session task object.
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
187
6.
Edit the s_m_DIM_DATES_LOAD_xx session task and set the following options in the Mapping tab:
Select SQ_SC_STG_DATES from the Sources folder in the navigator window. Set the Connections Value to your assigned NATIVE_STGxx connection value. Select SC_DIM_DATES from the Target folder in the navigator window. Set the Connections Value to your assigned NATIVE_EDWxx connection value. Set the Target Load type to Normal. Check the property Truncate target table option in the target properties.
7. 8. 9. 10.
Save your work. Check Validate messages to ensure your workflow is valid. If you received an invalid message, correct the problem(s) and re-validate/save. Start the workflow. Review the Task Details. Your information should appear similar to Figure 8-5.
11.
Select the Source/Target Statistics tab. Your statistics should be similar to Figure 8-6.
188
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Data Results
Preview the target data from the Designer. Your data should appear similar to Figure 8-7.
Figure 8-7. Data Preview of the DIM_DATES table
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
189
190
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Properties
This section will discuss the cache related properties. Dynamic cache will be discussed in a later module.
Option Lookup Caching Enabled Lookup Cache Directory Name Lookup Type Flat File, Relational Flat File, Relational Description Indicates whether the Integration Service caches lookup values during the session. Specifies the directory used to build the lookup cache files when you configure the Lookup transformation to cache the lookup source. Also used to save the persistent lookup cache files when you select the Lookup Persistent option. By default, the Integration Service uses the $PMCacheDir directory configured for the Integration Service process.
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
191
Lookup Type Flat File, Relational Flat File, Relational Flat File, Relational Flat File, Relational Flat File, Relational
Description Indicates whether the Integration Service uses a persistent lookup cache. Indicates the maximum size the Integration Service allocates to the data cache in memory. When the Integration Service cannot store all the data cache data in memory, it pages to disk as necessary. Indicates the maximum size the Integration Service allocates to the index cache in memory. When the Integration Service cannot store all the index cache data in memory, it pages to disk as necessary. Use only with persistent lookup cache. Specifies the file name prefix to use with persistent lookup cache files. Use only with the lookup cache enabled. When selected, the Integration Service rebuilds the lookup cache from the lookup source when it first calls the Lookup transformation instance. If you use a persistent lookup cache, it rebuilds the persistent cache files before using the cache. If you do not use a persistent lookup cache, it rebuilds the lookup cache in memory before using the cache.
Lookup Cache
How it Works
There are two types of cache memory, index and data cache. All port values from the lookup table where the port is part of the lookup condition are loaded into index cache. The index cache contains all port values from the lookup table where the port is specified in the lookup condition. The data cache contains all port values from the lookup table that are not in the lookup condition and that are specified as output ports. After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache. Upon a match the rows from the cache are included in the stream.
Key Point
If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk.
Performance Considerations
Lookup caching typically improves performance because the Integration Service need not execute an external read request to perform the lookup. However, this is true only if the time taken to load the lookup cache is less than the time that would be taken to perform the external read requests. To reduce the amount of cache required, turn off or delete any unused output ports in the Lookup transformation. You can also index the lookup file to speed the retrieval time. You can use where clauses in the SQL override to minimize the amount of data written to cache.
192
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
Rule Of Thumb
Cache if the number (and size) of records in the lookup table is small relative to the number of mapping rows requiring a lookup.
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
193
194
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache)
Business Purpose
Mersche Motors runs a number of promotions that begin and end on certain dates. The promotions are stored in the promotions dimension table. This table stores the start and expiry dates as date keys that reference the date dimension table.
Technical Description
The DIM_PROMOTIONS table requires start and expiration date keys. These exist in the DIM_DATES table that was populated in the previous lab. To obtain these date keys, which were created by the sequence generator, it will be necessary to perform a Lookup to the DIM_DATES table in the EDW database. The DIM_DATES table changes infrequently so it will be loaded into cache in a persistent state. The lookup cache will be used often by other Mappings that load Dimension tables.
Objectives
Understand how to configure and use a persistent Lookup cache.
Duration
25 minutes
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
195
Promotion data is run through the mapping and a lookup must be performed to the DIM_DATE table to acquire the date keys for the start date and expiration date in the DIM_PROMOTIONS table. To be determined DIM_DATES must be loaded
SOURCES
Tables Table Name STG_PROMOTIONS Schema/Owner TDBUxx Selection/Filter None
TARGETS
Tables Table Name DIM_PROMOTIONS Schema Owner Update TDBUxx Delete Insert X Unique Key PROMO_ID
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override Lookup Name Table Match Condition(s) Filter/SQL Override lkp_EXPIRY_DATE_KEY DIM_DATES Location EDW lkp_START_DATE_KEY DIM_DATES Location EDW
DIM_DATES.DATE_VALUE = STG_PROMOTIONS.START_DATE
DIM_DATES.DATE_VALUE = STG_PROMOTIONS.EXPIRY_DATE
196
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Lookup
Source
Target
Lookup
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
197
Instructions
Step 1: Create a Shortcut to a Shared Relational Source Table
1. 2. 3.
In the Source Analyzer, create a short cut to the STG_PROMOTIONS source table from the DEV_SHARED > Sources > ODBC_STG folder. Rename the shortcut to SC_STG_PROMOTIONS. Save your work.
In the Target Designer, create a shortcut to the DIM_PROMOTIONS target table from the DEV_SHARED > Targets folder. Rename the shortcut to SC_DIM_PROMOTIONS.
Note: If the SC_DIM_DATES target table is not displayed in the Target Designer drag it in from the Targets folder in your student folder. Notice the primary key-foreign key relationships.
3.
Create a new mapping named m_DIM_PROMOTIONS_LOAD_xx. Add the source definition shortcut SC_STG_PROMOTIONS to the mapping. Add the target definition shortcut SC_DIM_PROMOTIONS to the mapping. Arrange transformations appropriately and Autolink the ports By Name between:
5.
Save your work. It should look like the mapping in Figure 9-1.
Step 4: Create Lookups for the Start and Expiry Date Keys
1.
2.
In Figure 9-1, compare START_DATE and EXPIRY_DATE in SQ_SC_STG_PROMOTIONS to START_DK AND EXPIRY_DK in the SC_DIM_PROMOTIONS Target table. Notice that these two ports are not connected and the datatypes are different. The target requires key values (number), not dates. In what table do these Date Key values exist? _________________________.
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
198
3.
The date dimension table (DIM_DATES) was populated by the previous lab, the DATE_KEY was generated by the seq_DIM_DATES_DATE_KEY Sequence Generator transformation and DATE_VALUE has a datatype of date/time.
4.
To acquire the value for the START_DK in the DIM_PROMOTIONS target you need to perform a Lookup on the DIM_DATES table. You will base the Lookup Condition on the ____________________ port from SQ_SC_STG_PROMOTIONS Source Qualifier and the ____________________ column in the DIM_DATES Lookup table.
5.
Similarly, to acquire the value for the EXPIRY_DK in the DIM_PROMOTIONS Target you will need a second Lookup on the DIM_DATES as well. You will base the Lookup Condition on the ____________________ port from SQ_SC_STG_PROMOTIONS Source Qualifier and the ____________________ column in the DIM_DATES Lookup table.
6.
Add a Lookup Transformation to the mapping based on the SC_DIM_DATES (shortcut to DIM_DATES) target table.
Figure 9-3. Select Lookup Table
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
199
Rename the Lookup Transformation to lkp_START_DATE_KEY. Click OK. Click YES to verify the Look up condition is empty. You will define this shortly. Now Drag and Drop the START_DATE port from SQ_SC_STG_PROMOTIONS Source Qualifier to an empty port in the lkp_START_DATE_KEY transformation. Make START_DATE input only. Rename START_DATE to IN_START_DATE. Define the Lookup Condition to look like Figure 9-4:
Figure 9-4. Lookup Condition
14.
Lookup Table Name = DIM_DATES (default). Lookup Caching Enabled = Checked (default). Lookup Cache Persistent = Checked (needs to be set). Cache File Name Prefix = LKPSTUxx (where xx is your student number).
15. 16.
Link the DATE_KEY port from the lkp_START_DATE_KEY transformation to the START_DK port in the SC_DIM_PROMOTIONS target. Save your work.
Note: Notice that this transformation has many ports. We could have unchecked to Output column on all except for the ones that we need but since this Lookup Transformation will be persistent it would have limited its functionality for other Mappings that might leverage it.
The lkp_START_DATE_KEY transformation will not retrieve values for EXPIRY_DK because the lookup conditions will be different.
17. 18.
Create a second Lookup transformation called lkp_EXPIRY_DATE_KEY by selecting the lkp_START_DATE_KEY transformation and pressing Ctrl+C and Ctrl+V. Make the changes necessary to the Lookup to ensure that the EXPIRY_DATE finds the proper DATE_KEY.
a. b. c.
Rename it to lkp_EXPIRY_DATE_KEY. Rename port IN_START_DATE to IN_EXPIRY_DATE. Verify the Lookup Condition is correct.
19.
Link the EXPIRY_DATE port from SQ_SC_STG_PROMOTIONS Source Qualifier to the IN_EXPIRY_DATE port in the lkp_EXPIRY_DATE_KEY transformation.
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
200
20. 21.
Link the DATE_KEY port from the lkp_EXPIRY_DATE_KEY transformation to the EXPIRY_DK port in the SC_DIM_PROMOTIONS target. Save your work.
Launch the Workflow Manager and sign into your assigned folder. Create a new Workflow named wkf_DIM_PROMOTIONS_LOAD_xx. Create a new Session task using the mapping m_DIM_PROMOTIONS_LOAD_xx. Edit the s_m_DIM_PROMOTIONS_LOAD_xx session task. In the Mapping tab:
a. b. c. d. e.
Select SQ_SC_STG_PROMOTIONS located under the Sources folder in the navigator window. Set the Connections > Type to your assigned NATIVE_STGxx connection object. Select SC_DIM_PROMOTIONS located under the Target folder in the navigator window. Set the Connections > Type to your assigned NATIVE_EDWxx connection object. Ensure that the Target load type is set to Normal.
6. 7. 8.
Complete the workflow by linking the Start and Session tasks and save your work. Run the workflow. Review the Task Details.
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
201
9.
Select the Source/Target Statistics tab. Your statistics tab should appear as Figure 9-7.
202
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Data Results
Preview the target data. The results should be similar as Figure 9-8. Note the values for START_DK and EXPIRY_DK.
Figure 9-8. Data Preview of the DIM_PROMOTIONS target table
By setting the Lookup Cache Persistent property on the Lookup transformations, two files were created in the cache file directory defined for the Integration Service process. See Figure 9-9. Note that in this lab, these files are on the Integration Service process machine, not your local computer. Also note that the names correspond to the name you entered in the Cache File Name Prefix Lookup property. To view these files, you will need to map to the file system on the Integration Service process machine. Verify that the files have a timestamp similar to when you ran the above workflow.
Figure 9-9. Preview files created when Persistent Cache is set on Lookup Transformation
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
203
204
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Technical Description
This lab will detail the use of 4 PowerCenter Designer features. Each of these features will increase the efficiency of any developer who knows how to use them appropriately. At the discretion of the instructor, this lab can also be completed as a demonstration.
Objectives
Find Within Workspace View Object Dependencies Compare Objects Overview Window
Duration
15 minutes
205
Instructions
Open a Mapping
In the Designer tool, perform the following steps:
1. 2. 3.
Right-click the DEV_SHARED folder and select Open. This folder will be used in one of the features. Right-click your Studentxx folder and select Open. Drag the m_Stage_Customer_Contacts_xx mapping into the Mapping Designer within your Studentxx folder.
1. 2. 3.
Type the word customer in the Find What text box. Click Find Now.
206
Note: In the Find in workspace feature, the term fields can mean columns in sources or targets or ports in transformations. The term table can mean a source or target definition or a transformation. Velocity Best Practice: By using the Velocity Methodology object naming conventions (such as transformation type prefixes) it will be easier to locate the found objects in the workspace. For example, in Figure 9-10 we know that SQ_customer_layout is a Source Qualifier and fil_Customer_No_99999 is a filter.
Select the flat-file source definition promotions in the Navigator window. Right-click and select Dependencies
You will see the Dependencies dialog box as shown in Figure 9-11.
Figure 9-11. View Dependencies dialog box
3.
Click OK You will see the View Dependencies window, which will show detailed information about each of the dependencies found. Browse through this window, noting that some of the information relates to Team-Based Development (version control) properties like Version, Timestamp, and Version Comments.
Note: By clicking the Save button on the toolbar, the dependencies can be saved as an .htm file for future reference.
4.
Tip: Dependencies can also be viewed by right-clicking on an object directly in a workspace, such as a source definition in the Mapping Designer or the Source Analyzer.
207
Open the m_DIM_PROMOTIONS_LOAD_xx mapping. Right-click the Lookup transformation lkp_START_DATE_KEY and select Compare Objects. For the Instance 2 drop-box, select the Lookup transformation lkp_EXPIRY_DATE_KEY. This is the object we wish to compare with the Lookup transformation. Your screen should appear as Figure 912.
Figure 9-12. Transformation compare objects dialog box
4.
Click Compare.
208
5.
Browse the tabs in the Transformations window that appears. Select the Properties tab, and what you see should be similar to Figure 9-13.
Figure 9-13. Compare Transformation objects Properties details
Note: A great deal of comparative information is displayed in the tabs. All differences will appear in
red. Ports that are highlighted in yellow indicate a difference in the expression which may not be easily visible in this view. We will now learn how to compare objects that are in different folders.
6. 7.
Open the target definition STG_DATES in the Target Designer. Right-click the target and select Compare Objects.
209
8.
The Select Targets dialog box allows you to choose a comparison object in another folder. Click the Browse button for Target 2 and select the DIM_DATES table in the DEV_SHARED folder. Your screen should appear as Figure 9-14.
9. 10.
Click Compare. Browse the information in the various tabs. Note that this method can quickly tell you the differences, if any, between two objects in two different folders. See Figure 9-15.
Figure 9-15. Column differences between two target tables
Tip: In order to compare objects across folders, both folders must be open.
210
In the Mapping Designer, set the zoom level to 100-percent. Click the Toggle Overview Window toolbar button. The Overview window will appear in the upper-right hand corner of your screen. Use your left mouse button to drag the dotted rectangle to a different location within the mapping. If you were searching for a target or a source in a large and complex mapping, this feature would make it faster to locate.
211
212
Sorter transformations Aggregator transformations Active and passive transformations Data concatenations Self-joins
Type
Active.
Description
The Sorter transformation sorts the incoming data based on one or more key values - the sort order can be ascending, descending or mixed. The Sorter transformation attribute, Distinct provides a facility to remove duplicates from the input rows.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
213
Properties
Description The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Integration Service passes all incoming data into the Sorter transformation before it performs the sort operation. You can specify any amount between 1 MB and 4 GB for the Sorter cache size. The Case Sensitive property determines whether the Integration Service considers case when sorting data. When you enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters. The directory that the Integration Service uses to create temporary files while it sorts data. After the Integration Service sorts the data, it deletes the temporary files. You can configure the Sorter transformation to treat output rows as distinct. If you configure the Sorter transformation for distinct output rows, the Mapping Designer configures all ports as part of the sort key. Sets the amount of detail included in the session log when you run a session containing this transformation. Enable this property if you want the Integration Service to treat null values as lower than any other value when it performs the sort operation. Specifies how the Integration Service applies the transformation logic to incoming data: - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
Case Sensitive
Work Directory Distinct Tracing Level Null Treated Low Transformation Scope
214
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Business Purpose
A business may aggregate data on records received from relational sources (Databases) or flat files with related records in random order. Sorting the records prior to passing them on to an Aggregator transformation may improve the overall performance of the aggregation task.
Example
In the following example Gross Profit and Profit Margin are calculated for each item sold. To improve performance of this session a Sorter transformation is added prior to the Aggregator transformation. The Aggregator Sorted Input property must be checked to notify the Aggregator to expect input in sort order.
Sorter Cache
How It Works
If the cache size specified in the properties exceeds the available amount of memory on the Integration Service process machine then the Integration Service fails the session. All of the incoming data is passed into cache memory before the sort operation is performed. If the amount of incoming data is greater than the cache size specified then the PowerCenter will temporarily store the data in the Sorter transformation work directory.
Key Points
The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory.
Performance Considerations
Using a Sorter transformation may improve performance over and Order By clause in a SQL override in aggregate session when the source is a database because the source database may not be tuned with the buffer sizes needed for a database sort.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
215
Type
Active.
Description
The Aggregator transformation calculates aggregates such as sums, minimum or maximum values across multiple groups of rows. The Aggregator transformation can apply expressions to its ports however those expressions will be applied to a group of rows unlike the Expression transformation which applies calculations on a row-by-row basis only. Aggregate functions are created in output ports only. Function grouping requirements are set using the Aggregator GroupBy port.
216
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Properties
Option Cache Directory Tracing Level Sorted Input Aggregator Data Cache Size Aggregator Index Cache Size Transformation Scope
Description Local directory where the Integration Service creates the index and data cache files. Amount of detail displayed in the session log for this transformation. Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation. Data cache size for the transformation. Default cache size is set to Auto. Index cache size for the transformation. Default cache size is set to Auto Specifies how the Integration Service applies the transformation logic to incoming data: - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
217
Business Purpose
A business may want to calculate gross profit or profit margins based on items sold or summarize weekly, monthly or quarterly sales activity.
Example
The following example calculates a value for units sold ( OUT_UNITS_SOLD) and revenue (OUT_REVENUE) and cost (OUT_COST) for each promotion id by date.
218
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Aggregator Cache
How It Works
There are two types of cache memory, index and data cache. All rows are loaded into cache before any aggregation takes place. The index cache contains group by port values. The data cache contains all port values variable and connected output ports.
Non group by input ports used in non-aggregate output expression. Non group by input/output ports. Local variable ports. Port containing aggregate function (multiply by three). One output row will be returned for each unique occurrence of the group by ports.
Key Points
If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk. No rows are returned until all of the rows have been aggregated. Checking the sorted input attribute will bypass caching. You enable automatic memory settings by configuring a value for the Maximum Memory Allowed for Auto Memory Attributes or the Maximum Percentage of Total Memory Allowed for Auto Memory Attributes. If the value is set to zero for either of these attributes, the Integration Service disables automatic memory settings and uses default values.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
219
Performance Considerations
Aggregator performance can be increased when you sort the input data in the same order as the Aggregator Group By ports prior to doing the Aggregation. The Aggregator sorted input property would need to be checked. Relational source data can be sorted using an order by clause in the Source Qualifier override. Flat file source data can be sorted using an external sort application or the Sorter transformation. Cache size is also important in assuring optimal performance in the Aggregator. Make sure that your cache size settings are large enough to accommodate all of the data. If they are not the system will cache out to disk causing a slow down in performance.
Passive transformations operate on one row at a time AND preserve the number of rows. Examples: Expression, Lookup, Sequence Generator. Active transformations operate on groups of rows AND/OR change the number of rows. Examples: Source Qualifier, Filter, Joiner, Aggregator.
220
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Data concatenation brings together different pieces of the same record (row). Data concatenation works only if combining branches of the same source pipeline. For example, one branch has a customer ID and the other branch has the customer name. But if either branch contains an active transformation, the correspondence between the branches no longer exists.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
221
Description
The Joiner transformation combines fields from two data sources into a single combined data source based on one or more common fields also know as the join condition. However when values to be combined are located within the same pipeline a self join provides a solution. The two pipelines being joined need to be sorted in the same order.
Business Purpose
A business may have to extract data from a single employee master table with employee data such as names, title, salary and reporting department and create a new table showing only those employees whose salary is greater than the average salary for the department.
Example
The following example creates a new table with only those employees whose salary is greater than the average salary for the department that they work in.
222
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Key Points
The inputs to the Joiner from the single source must separate into two data streams. For self-joins between two branches of the same pipeline.
Must add a transformation between the Source Qualifier and the Joiner in at least one branch of the pipeline. Data must be pre-sorted by the join key. Configure the Joiner to accept sorted input. For self-joins between records from the same source. Create two instances of the source and join the pipelines from each source.
Performance Considerations
There is a performance benefit in a self join since it requires both the master and detailed side to be sorted.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
223
224
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Technical Description
We will copy the m_STG_EMPLOYEES_xx mapping created in a previous lab and modify it to derive the Manager Name and load it into the DEALERSHIP_MANAGER column of the STG_EMPLOYEES table. To do this we will have to split the data into two streams. One stream will have all employee records and the other will have only manager records that will need to be joined back together using the manager records as the master. On the Manager stream we will filter on the POSITION_TYPE column for MANAGER records and relate them back to the SALESREP records using the DEALERSHIP_ID. This is necessary because there is only one Manager per dealership. We will also need to maintain the Lookup with respect to the salaries.txt file to ensure that salary data is still populated.
Objectives
Leverage an existing Mapping to solve a data integrity issue Split the data stream and use a self-join to bring it back together Copy and modify an existing reusable Expression transformation
Duration
70 Minutes
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
225
File list will be read, source data will be reformatted and salary information for each employee will be added. Determine the names of the Managers and populate the DEALERSHIP_MANAGER column. Daily Target Append
SOURCES
Files File Name employees_central.txt, employees_east.txt, employees_west.txt Definition in employees_layout.txt employees_list.txt File Location C:\pmfiles\SrcFiles Fixed/Delimited Delimited Additional File Info These 3 comma delimited flat files will be read into the session using a filelist employees_list.txt. The layout of the flat files can be found in employees_layout.txt. File list
C:\pmfiles\SrcFiles
TARGETS
Tables Table Name STG_EMPLOYEES Schema Owner Update TDBUxx Delete Insert X Unique Key
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override lkp_salary salaries.txt Location C:\pmfiles\LkpFiles
EMPLOYEE_ID = IN_EMPLOYEE_ID
226
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Source
Expression
Sorter
Joiner
Target
Filter
Aggregator
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
227
228
Target Table
Target Column
STG_EMPLOYEES
EMPLOYEE_ID
STG_EMPLOYEES
EMPLOYEE_NAME
STG_EMPLOYEES
EMPLOYEE_ADDRESS
STG_EMPLOYEES
EMPLOYEE_CITY
STG_EMPLOYEES
EMPLOYEE_STATE
STG_EMPLOYEES
EMPLOYEE_ZIP_CODE
STG_EMPLOYEES
EMPLOYEE_COUNTRY
STG_EMPLOYEES
EMPLOYEE_PHONE_NMBR
EMPLOYEE_FAX_NMBR
varchar2
employees_layout
FAX_NUMBER
STG_EMPLOYEES
EMPLOYEE_EMAIL
STG_EMPLOYEES
EMPLOYEE_GENDER
GENDER is currently either M or F. It needs to be Male, Female or UNK The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and Greater than 60
STG_EMPLOYEES
AGE_GROUP
varchar2
employees_layout
Derived
NATIVE_LANG_DESC
varchar2
STG_EMPLOYEES
SEC_LANG_DESC
STG_EMPLOYEES
TER_LANG_DESC
STG_EMPLOYEES
POSITION_TYPE
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
STG_EMPLOYEES
REGIONAL_MANAGER
Target Table number(p,s) varchar2 employees_layout DEALERSHIP_MANAGER Concatenated FIRSTNAME and LASTNAME of the manager. The employee records are split apart and then joined back together based on DEALERSHIP_ID A Salary field for each Employee ID can be found in salaries.txt. employees_layout DEALERSHIP_ID
Target Column
Data type
Source File
Source Column
Expression
STG_EMPLOYEES
DEALERSHIP_ID
STG_EMPLOYEES
DEALERSHIP_MANAGER
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
EMPLOYEE_SALARY
number(p,s)
employees_layout
Derived
STG_EMPLOYEES
HIRE_DATE
STG_EMPLOYEES
DATE_ENTERED
229
Instructions
Step 1: Copy an Existing Mapping
1. 2. 3. 4. 5.
Launch the Designer and sign into your assigned folder. Locate the mapping m_STG_EMPLOYEES_xx in the Navigator window. Copy it and rename it m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx. Open m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx in the Mapping Designer to make it the current mapping for editing. Save your work.
Figure 10-1. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD mapping
Which of these columns can we use to determine Manager records? Answer: ________________________ Which of these columns can we use for a self-join condition to obtain the Dealership Manager name for the employee records? Answer: ________________________
230
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Right-click and select Arrange all and expand the Source Qualifier, Lookup and Target large enough to view all the ports and links. Remove all the links to the lkp_salaries transformation and all of the links to the STG_EMPLOYEES target. Rename the re_exp_Format_Name_Gender_Phone_Load_Date reusable transformation to exp_Format_Name_Gender_Phone_Load_Date_Mgr (notice the name change but the reusable transformation name that this expression is an instance of stays the same)
Figure 10-3. Renaming an instance of a Reusable Transformation
4.
Save your work and notice that the mapping is now invalid. Your mapping should look similar to Figure 10-4 if you Arrange all Iconic.
Figure 10-4. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD after most links removed
2.
Select the following ports from SQ_employees_layout and drag them into the Sorter Transformation:
DEALERSHIP_ID EMPLOYEE_ID
231
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
ADDRESS CITY STATE ZIP_CODE COUNTRY FAX_NUMBER EMAIL NATIVE_LANGUAGE SECOND_LANGUAGE THIRD_LANGUAGE POSITION_TYPE REGIONAL_MANAGER HIRE_DATE DATE_ENTERED
3.
Select all the output ports from the exp_FORMAT_NAME_GENDER_PHONE_LOAD_DATE_MGR transformation and drag them into the srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation. Edit the Sorter transformation.
a. b.
4.
On the DEALERSHIP_ID port check the checkbox in the 'Key' column to define the sort column. Rename the following ports: OUT_NAME to EMPLOYEE_NAME OUT_PHONE to EMPLOYEE_PHONE OUT_GENDER to EMPLOYEE_GENDER OUT_AGE_GROUP to AGE_GROUP
5.
Create a Filter transformation named fil_MANAGERS. Link the following ports from srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation to the fil_MANAGERS transformation:
3. 4.
Set the filter condition to only allow 'MANAGER' position types. Save your work.
232
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
2.
Link the following ports from fil_MANAGERS transformation to the agg_MANAGERS transformation:
DEALERSHIP_ID EMPLOYEE_NAME On the DEALERSHIP_ID port, check the checkbox in the 'Group By' column. Under the Properties tab, check the 'Sorted Input' checkbox.
3.
4.
The mapping depicting the Sorter to Filter to Aggregator flow should be the same as Figure 10-7.
Figure 10-7. Partial mapping flow depicting the flow from the Sorter to the Filter to the Aggregator
Create a Joiner transformation and name it jnr_MANAGERS_EMPLOYEES. On the Properties tab set Sorted Input property to checked. Click OK on the Edit Transformations dialogue and the click Yes on the Join Condition is empty dialogue. The join condition will be set shortly.
233
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
4. 5. 6.
Link all ports from the agg_MANAGERS transformation into the jnr_MANAGERS_EMPLOYEES Joiner transformation. Link all ports from the srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation to the jnr_MANAGERS_EMPLOYEES transformation. Edit the jnr_MANAGERS_EMPLOYEES transformation:
a.
Rename the two ports linked from the Aggregator transformation as follows: DEALERSHIP_ID to MANAGER_DEALERSHIP_ID EMPLOYEE_NAME to MANAGER_NAME Ensure that both ports have checks under the M column defining them as the Master record. Rename the following ports linked from the Sorter transformation: DEALERSHIP_ID1 to EMPLOYEE_DEALERSHIP_ID EMPLOYEE_NAME1 to EMPLOYEE_NAME (remove the '1') Add the following join condition: MANAGER_DEALERSHIP_ID = EMPLOYEE_DEALERSHIP_ID
b. c.
d.
7.
Link the EMPLOYEE_ID port from the jnr_MANAGERS_EMPLOYEES transformation to the IN_EMPLOYEE_ID port in the lkp_salaries Lookup transformation.
234
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Tip: Hint: Some ports can be auto-linked by name; the rest must be done manually.
MANAGER_NAME --> DEALERSHIP_MANAGER
EMPLOYEE_DEALERSHIP_ID --> DEALERSHIP_ID EMPLOYEE_ID EMPLOYEE_NAME ADDRESS CITY STATE ZIP_CODE COUNTRY EMPLOYEE_PHONE FAX_NUMBER EMAIL NATIVE_LANGUAGE SECOND_LANGUAGE THIRD_LANGUAGE POSITION_TYPE REGIONAL_MANAGER HIRE_DATE EMPLOYEE_GENDER AGE_GROUP DATE_ENTERED 2. --> EMPLOYEE_ID --> EMPLOYEE_NAME --> EMPLOYEE_ADDRESS --> EMPLOYEE_CITY --> EMPLOYEE_STATE --> EMPLOYEE_ZIP_CODE --> EMPLOYEE_COUNTRY --> EMPLOYEE_PHONE_NUMBER --> EMPLOYEE_FAX_NUMBER --> EMPLOYEE_EMAIL --> NATIVE_LANG_DESC --> SEC_LANG_DESC --> TER_LANG_DESC --> POSITION_TYPE --> REGIONAL_MANAGER --> HIRE_DATE --> EMPLOYEE_GENDER --> AGE_GROUP --> DATE_ENTERED
Link the SALARY port from the lkp_salaries transformation to the EMPLOYEE_SALARY port in the STG_EMPLOYEES target.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
235
3.
Launch the Workflow Manager and sign into your assigned folder. Create a new workflow named wkf_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx Create a session task using the m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx mapping. Link the Start task to the s_m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx session task. Edit session s_m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx.
a. b.
In the Mapping tab, confirm that Source file directory is set to $PMSourceFileDir\. In Properties > Attribute > Source filename type in employees_list.txt and change Source filetype property from Direct to Indirect. The properties should look similar to Figure 10-10.
Figure 10-10. Source properties for the employee_list.txt file list
c.
Select STG_EMPLOYEES located under the Target folder in the Mapping navigator.
i. ii.
Set the relational target connection object property to NATIVE_STGxx where xx is your student number. Check the property Truncate target table option in the target properties. (this will need to be set because the data load from a previous lab needs to be replaced).
d.
Select lkp_salaries from the Transformations folder on the mapping tab and verify the following property values: Lookup source file directory = $PMLookupFileDir\. Lookup source filename = salaries.txt.
6.
236
7. 8.
9.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
237
Data Results
Preview the target data from the Designer. Your data should appear the same as displayed in Figure 10-13 through Figure 10-14.
Figure 10-13. Data preview of the self-join of Managers and Employees in the STG_EMPLOYEES target table - screen 1
Figure 10-14. Data preview of the STG_EMPLOYEES target table - screen 2 scrolled right
238
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
239
240
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Router transformation Update Strategy transformation Source Qualifier override Target update override Session task mapping overrides
Type
Active.
Description
The Router transformation is similar to the Filter transformation because it passes row data that meet the Router Group filter condition to the downstream transformation or target. The Router transformation has a single input group and one or more output groups with each output group representing a filter condition.
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
241
Business Purpose
A business may receive records that are re-directed to specific targets, the records are routed to each target based on conditions of one or more record (row) fields.
242
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Example
In the following example a business receives sales results based on responses to coupons featured in the local newspapers, magazines, and at their website. Each record is loaded into different target tables based on a promotion code.
In the example the DEFAULT group routes rows that do not meet any of the group filters to an exception table. This would capture a record where a promo code (PROMO_ID) was incorrectly entered or a new code that has not been included in a filter group.
Performance Considerations
When splitting row data based on field values a Router transformation has a performance advantage over multiple Filter transformations because a row is read once into the input group but evaluated multiple times based on the number of groups. Whereas using multiple Filter transformation requires the same row data to be duplicated for each Filter transformation.
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
243
Type
Active.
244
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Description
The Update Strategy transformation tags a row with the appropriate DML (data manipulation language) for the PowerCenter writer to apply to a relational target. Each row can be tagged with one of the following flags (the DD label stands for Data Driven):
DD_INSERT - tags a row for insert to a target DD_UPDATE - tags a row for update to a target DD_DELETE - tags a row for delete to a target DD_REJECT - tags a row for reject
Note: For the row tags DD_DELETE and DD_UPDATE, the table definition in a mapping must have
a key identified otherwise the session created from that mapping will fail. Rows tagged with DD_REJECT will be passed on to the next transformation or target and subsequently placed in the appropriate bad file if the Forward Rejected Rows attribute is checked (default). If the attribute is un-checked then reject rows will be skipped.
Business Purpose
A business process may require more than a single DML action on a target table. A target table may require historical information dealing with previous entries. Rows written to a target table, based on one or more criteria, may have to be inserted, updated or deleted. The Update Strategy transformation can be applied to meet this requirement.
Example
In the following example a business wants to maintain the MASTER_CUSTOMER table with current information. Using a set of Filter transformations along with previous mapping objects, two data paths
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
245
have been developed, one for inserts (DD_INSERT) with the addition of a sequence number for new records and one for updates (DD_UPDATE) to update existing records with new information.
Performance Considerations
The Update Strategy transformation performance can vary depending on the number of updates and inserts. In some cases there may be a performance benefit to split a mapping with updates and inserts into two mappings and sessions, one mapping with inserts and the other with updates.
246
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
247
Properties
Property SQL Query User Defined Join Source Filter Number of Sorted Ports Description Allows you to override the default SQL query that PowerCenter creates at runtime. Allows you to specify a user defined join. Allows you to create a where clause that will be inserted into the SQL query that is generated at runtime. The where portion of the statement is not required. EG. Table1.ID = Table2.ID. PowerCenter will insert an order by clause in the generated SQL query. The order by will be on the number of ports specified, from the top down. EG. In the sq_Product_Product_Cost Source Qualifier, if the number of sorted ports = 2, the order by will be: ORDER BY PRODUCT.PRODUCT_ID, PRODUCT.GROUP_ID. Specifies the amount of detail written to the session log. Allows you to select distinct values only. Allows you to specify SQL that will be run prior to the pipeline being run. The SQL will be run using the connection specified in the session task. Allows you to specify SQL that will be run after the pipeline has been run. The SQL will be run using the connection specified in the session task. Source or transformation output that does not change between session runs when the input data is consistent between runs. When you configure this property, the Integration Service does not stage source data for recovery if transformations in the pipeline always produce repeatable data. Source or transformation output that is in the same order between session runs when the order of the input data is consistent. When output is deterministic and output is repeatable, the Integration Service does not stage source data for recovery.
Tracing Level Select Distinct Pre SQL Post SQL Output is Deterministic
Output is Repeatable
248
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
By default, target tables are updated based on key values. You can change this in target properties:
1. 2. 3.
Update Override Generate SQL Edit UPDATE WHERE clause with non-key items
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
249
You can override some mapping attributes in the Session task Mapping tab.
Examples
250
User-defined join: Modify a homogeneous join in the Source Qualifier Source filters: Add a filter to the Source Qualifier Target writers: Turn a relational target into a flat file
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
251
252
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Technical Description
Rows from the STG_EMPLOYEES table need to be loaded into the DIM_EMPLOYEES table. Before loading the rows, EMPLOYEE_ID needs to be tested for NULL values. Invalid rows need to be written to an error file. Valid rows need to be tested to see if they exist already in DIM_EMPLOYEES and tagged for either INSERT or UPDATE accordingly. Finally, any rows sent to the DIM_EMPLOYEE table need to get valid dates from DIM_DATES.
Objectives
Use of Update Strategy to tag rows for INSERT or UPDATE. Use of the Router transformation to conditionally route rows to different target instances. Source Qualifier Session property override. Using the Default Values option for NULL data replacement. Overriding Target writer option.
Duration
60 minutes
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
253
Move data from staging table to the dimension target table with error rows written to a flat file. Lookups required for date entries and to target table to test existing rows. Daily Target Append/Update
SOURCES
Tables Table Name STG_EMPLOYEES Schema/Owner TDBUxx Selection/Filter SQ override for daily loads only
TARGETS
Tables Table Name DIM_EMPLOYEES Files File Name dim_employees_err1.outt File Location C:\pmfiles\TgtFiles Fixed/Delimited Fixed Additional File Info Based on DIM_EMPLOYEES definition Schema Owner Update X TDBUxx Delete Insert X Unique Key EMPLOYEE_ID
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override lkp_DIM_EMPLOYEES_EMPLOYEE_ID DIM_EMPLOYEES Location TDBUxx
STG_EMPLOYEES.EMPLOYEE_ID = DIM_EMPLOYEES.EMPLOYEE_ID
254
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Lookup Name Table Match Condition(s) Filter/SQL Override Lookup Name Table Match Condition(s) Filter/SQL Override
Update Strategy
Relational Source
Expression
Router Lookup
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
255
256
Target Table
Target Column
DIM_EMPLOYEES
EMPLOYEE_ID
DIM_EMPLOYEES
EMPLOYEE_NAME
DIM_EMPLOYEES
EMPLOYEE_ADDRESS
DIM_EMPLOYEES
EMPLOYEE_CITY
DIM_EMPLOYEES
EMPLOYEE_STATE
DIM_EMPLOYEES
EMPLOYEE_ZIP_CODE
DIM_EMPLOYEES
EMPLOYEE_COUNTRY
DIM_EMPLOYEES
EMPLOYEE_PHONE_NMBR
DIM_EMPLOYEES
EMPLOYEE_FAX_NMBR
DIM_EMPLOYEES
EMPLOYEE_EMAIL
DIM_EMPLOYEES
EMPLOYEE_GENDER
DIM_EMPLOYEES
AGE_GROUP
DIM_EMPLOYEES
NATIVE_LANG_DESC
DIM_EMPLOYEES
SEC_LANG_DESC
DIM_EMPLOYEES
TER_LANG_DESC
DIM_EMPLOYEES
POSITION_TYPE
DIM_EMPLOYEES
DEALERSHIP_ID
DIM_EMPLOYEES
REGIONAL_MANAGER
DIM_EMPLOYEES
DEALERSHIP_MANAGER
DIM_EMPLOYEES
INSERT_DK
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
DIM_EMPLOYEES
UPDATE_DK
DIM_DATES
Instructions
Step 1: Copy the Mapping
1. 2. 3.
Launch the Designer and open your assigned folder. Copy the m_DIM_EMPLOYEES_LOAD partial mapping from the DEV_SHARED folder to your student folder and rename it to m_DIM_EMPLOYEES_LOAD_xx. Click Yes when the Target Dependencies dialog box comes up.
Figure 11-1. Mapping copy Target Dependencies dialog box
4.
Open the mapping m_DIM_EMPLOYEES_LOAD_xx. Your mapping should appear similar to Figure 11-2.
Figure 11-2. Iconic view of the m_DIM_EMPLOYEES_MAPPING
2.
Edit the exp_NULL_EMPLOYEE_ID Expression transformation and add a Default value of 99999 to the EMPLOYEE_ID port. Click the button to validate the default entry and the click OK.
3.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
257
4.
2. 3. 4.
Drag both ports from lkp_DIM_EMPLOYEES_EMPLOYEE_ID into the Router. Drag all ports except EMPLOYEE_ID from exp_NULL_EMPLOYEE_ID to the Router. Edit the Router transformation.
a. b.
Rename the Router to rtr_DIM_EMPLOYEES. In the Groups tab add 3 new groups using the Add new group icon:
i. ii. iii. iv. v. vi.
Name the first group INSERTS. Add the Group filter condition: Name the second group UPDATES. Add the Group Filter Condition: Name the third group ERRORS. Add the Group Filter Condition:
IN_EMPLOYEE_ID = 99999
258
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
2. 3.
In the Router, scroll down to the INSERTS Router group and drag all ports, except EMPLOYEE_ID1 and HIRE_DATE1, to the upd_INSERTS Update Strategy transformation. Edit the upd_INSERTS Update Strategy transformation:
a. b.
Select the Update Strategy Expression Value box. Delete the 0 and enter DD_INSERT. See Figure 11-4.
Create a Lookup transformation named lkp_DIM_DATES_INSERTS that references the SC_DIM_DATES target table. Pass DATE_ENTERED1 from upd_INSERTS to lkp_DIM_DATES_INSERTS. Edit the lkp_DIM_DATES_INSERTS Lookup transformation:
a. b. c. d.
Uncheck all the Output checkmarks on all the ports except for DATE_KEY. Rename the DATE_ENTERED1 port to IN_DATE_ENTERED. Create the condition DATE_VALUE = IN_DATE_ENTERED. Ensure that you use DATE_VALUE and not DATE_KEY. In the Properties tab set the following values: Lookup cache persistent = Checked (needs to be set) Cache File Name Prefix = LKPSTUxx (where xx is your student number)
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
259
Link the DATE_KEY port from lkp_DIM_DATES_INSERTS to the INSERT_DK column in the DIM_EMPLOYEES_INSERTS target. Right click anywhere in the workspace and select Autolink Select upd_INSERTS from the From transformation drop box and DIM_EMPLOYEES_INSERTS from the To transformation box. Select the More button and enter a '1' for From Transformation Suffix. Click OK. Iconize the upd_INSERTS, lkp_DIM_DATES_INSERTS and DIM_EMPLOYEES_INSERTS transformations. Save your work.
4. 5. 6.
Create an Update Strategy transformation named upd_UPDATES. In the Router, scroll down to the UPDATES Router group and drag all ports, except IN_EMPLOYEE_ID3 and HIRE_DATE3, to the upd_UPDATES Update Strategy transformation. Edit the upd_UPDATES Update Strategy transformation. In the Properties tab, select the Update Strategy Expression Value box. Delete the 0 and enter DD_UPDATE.
Right click on the existing lkp_DIM_DATES_INSERTS Lookup transformation and select Copy. Move the cursor to the workspace, right click, and select Paste. Link DATE_ENTERED3 from upd_UPDATES to IN_DATE_ENTERED in the new Lookup transformation. Edit the new Lookup transformation:
a. b.
Rename the new Lookup lkp_DIM_DATES_UPDATES. Ensure the Lookup condition is: DATE_VALUE = IN_DATE_ENTERED.
From lkp_DIM_DATES_UPDATES, link DATE_KEY to UPDATE_DK in DIM_EMPLOYEES_UPDATES. Right click anywhere in the workspace and select Autolink. Select upd_UPDATES from the From transformation drop box and DIM_EMPLOYEES_UPDATES from the To transformation box. Select the More button and enter a '3' for From transformation Suffix. Click OK.
4.
260
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
5. 6.
Iconize the upd_UPDATES, lkp_DIM_DATES_UPDATES and DIM_EMPLOYEES_UPDATES transformations. Save your work.
Select the ERRORS group of rtr_DIM_EMPLOYEES from the From Transformation drop down box and DIM_EMPLOYEES_ERR from the To Transformation box. Select the More>> button and enter a '4' for From Transformation Suffix. Click OK. Delete the link for EMPLOYEE_ID4 and link instead IN_EMPLOYEE_ID4. Save your work and ensure the mapping is VALID. Arrange All Iconic and the mapping should look similar to Figure 11-5:
2. 3. 4. 5.
Launch the Workflow Manager and sign into your assigned folder. Locate and run the wkf_U11_Preload_DIM_PAYMENT_DEALERSHIP_PRODUCT_xx workflow. Make sure that it completed successfully and that all rows were successful. Create a workflow named wkf_DIM_EMPLOYEES_LOAD_xx. Add a new Session task named s_m_DIM_EMPLOYEES_LOAD_xx, using the m_DIM_EMPLOYEES_LOAD_xx mapping. Link the Start task to the new Session task.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
261
6.
Select the
i. ii. iii.
Change all DB Connection values that relate to the target tables (DIM) to NATIVE_EDWxx. Change all DB Connection values that relate to the source tables (STG) to NATIVE_STGxx. Change the $Target connection value to NATIVE_EDWxx as well. (This will take care of the three lookup tables pointing to $Target.)
b.
Click on SQ_STG_EMPLOYEES. Scroll down in the Properties section window to the Source Filter attribute. Add the Source Filter condition: DATE_ENTERED = '01/02/2003'
Tip: It is sometimes easier to add a quick Source filter in the Session than to go back and modify the mapping, save it, refresh the session, save it, then run the workflow. SQL overrides will override any entries in the mapping until the override is deleted. Make sure if using 'shortcuts' the prefix to the table is deleted before saving the filter.
c.
Under the Writers section: Change Relational Writer to File Writer. The error handling specifications want error rows written to a file, not a table.
262
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
ii.
In the Properties Attribute, rename the Output filename to include your student number. .
Tip: To create a flat file as a target instead of the original table, simply change the Writers type from Relational to File. A fixed width flat file based on the format of the target definition will be created automatically. The properties of this file can also be altered by the user.
7. 8.
Save your work and start the workflow. Review the Task Details and Source/Target statistics. They should be the same as displayed in Figure 11-8 and Figure 11-9.
Figure 11-8. Task Details of the completed session run
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
263
Data Results
Preview the DIM_EMPLOYEES target data from the Designer, your data should appear similar as displayed in Figure 11-10.
Figure 11-10. Data Results for DIM_EMPLOYEES
Scroll all the way to the right to confirm that the INSERT_DK column was updated and not the UPDATE_DK column. Also, you may want to review the three rows that were written to the error file. See the instructor for the location of the files. If the Integration Service process runs on UNIX, you may need special permission from your administrator to see the files.
Figure 11-11. Data Results for the Error Flat File (Located on the Machine Hosting the Integration Service Process
Edit the s_m_DIM_EMPLOYEES_LOAD_xx session task. In the Mapping tab, click SQ_STG_EMPLOYEES in the Navigation window. Scroll down the Properties section and edit the Source filter to reflect day two loading: 01/03/2003. Save and run the workflow. Review the Task Details and Source/Target statistics.
264
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
They should be the same as displayed in Figure 11-12 and Figure 11-13.
Figure 11-12. Task Details tab results for second run
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
265
Preview the DIM_EMPLOYEES target data from the Designer. Scroll to the far right of the data screen and notice that there are now entries for UPDATE_DK and new entries at the bottom of the list for INSERT_DK.
Figure 11-14. Data preview showing updates to the target table
266
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
267
268
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Type
Passive.
Description
A Basic Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source. However when the lookup table is also the target row data may go out of sync with the target table image loaded in memory. The Dynamic Lookup transformation allows for the synchronization of the target lookup table image in memory with its physical table in a database.
Business Purpose
In a data warehouse dimension tables are frequently updated and changes to new row data must be captured within a load cycle.
Example
A business updates their customer master table on a daily basis. Within a day a customer may change there status or correct an error in their information. A new customer record may be added in the morning
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
269
and a change to that record may be added later in the day, the change (insert followed by an update) needs to be detected dynamically. The following data is an example of two new records followed by two changed records within the day. The record for David Mulberry shows a change in the zip code from 02061 to 02065. The record for Silvia Williamson shows a change in marital status from S to M.
The following mapping uses a Lookup transformation Dynamic Lookup Cache option to capture the changes:
270
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Relational
Relational
Relational
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
271
272
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Description The Integration Service does not update or insert the row in the cache. The Integration Service inserts the row into the cache. The Integration Service updates the row in the cache.
The Lookup transformation Associated Port matches a Lookup input port with the corresponding port in the Lookup cache. The Ignore Null Inputs for Updates should be checked for ports where null data in the input stream may overwrite the corresponding field in the Lookup cache. The Ignore in Comparison should be checked for any port that is not to be compared. The flag New Lookup Row indicates the type of row manipulation of the cache. If an input row creates an insert in the Lookup cache the flag is set to 1. If an input row creates an update of the lookup cache the flag is set to 2. If no change is detected the flag is set to 0. A Filter or Router transformation can be used with an Update Strategy transformation to set the proper row tag to update a target table.
Performance Considerations
A large lookup table may require more memory resources than available. A SQL override in the Lookup transformation can be used to reduce the amount of memory used by the Lookup cache.
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
273
Transformation. An error occurs within a transformation. The data row has only passed partway through the mapping transformation logic. Data reject. The data row is fully transformed according to the mapping logic but due to a data issue, it cannot be written to the target. For example:
Target database constraint violations, out-of-space errors, log space errors, null values not accepted Target table properties 'reject truncated/overflowed rows'
A data reject can also be forced by an Update Strategy. These error types are recorded as follows:
Error Type Transformation errors Data rejects Logging OFF (Default) All errors written to session log then row discarded Appended to reject (.bad) file configured for session target Logging ON Fatal errors written to session log. All errors appended to flat file or relational tables. Written to row error tables or file
274
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
275
PMERR_SESS: Session metadata e.g. workflow name, session name, repository name PMERR_MSG: Error messages for a row of data PMERR_TRANS: Transformation metadata e.g. transformation group name, source name, port names with datatypes PMERR_DATA: Error row and source row data in string format e.g. [indicator1: data1 | indicator2: data2] Flat File - produces one file containing session metadata followed by de-normalized error information in the following format: Transformation || Transformation Mapplet Name || Transformation Group || Partition Index || Transformation Row ID || Error Sequence || Error Timestamp || Error UTC Time || Error Code || Error Message || Error Type || Transformation Data || Source Mapplet Name || Source Name || Source Row ID || Source Row Type || Source Data
276
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
277
Source row logging does not work downstream of active transformations (where output rows are not uniquely correlated with input rows).
278
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Technical Description
PowerCenter will source from the staging table STG_CUSTOMERS and load the dimension table DIM_CUSTOMERS. Customer data may have more than one occurrence in the source. Data will have to be tested for new rows, existing rows and invalid rows. A Dynamic Lookup will need to be used since a customer row could occur more than once in the source. Some rows will have null data so flat file error logging will be used to capture these.
Objectives
Introduce Dynamic Lookups. Reinforce the Update Strategy. Introduce error logging.
Duration
50 minutes
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
279
Customer data will be loaded into the customer dimension table. Daily
SOURCES
Tables Table Name STG_CUSTOMERS Schema/Owner TDBUxx Selection/Filter
TARGETS
Tables Table Name DIM_CUSTOMERS Schema Owner Update X TDBUxx Delete X Insert X Unique Key CUST_ID
Relational Source
Lookup
Filter
Update Strategy
Relational Target
280
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
281
Instructions
Step 1: Create a Relational Source Definition
1. 2. 3.
Launch the Designer and sign into your assigned folder. Verify you are in the Source Analyzer tool and create a shortcut to the STG_CUSTOMERS source table found in the DEV_SHARED folder. Rename it to SC_STG_CUSTOMERS.
Open the Target Designer tool. Create a shortcut to the DIM_CUSTOMERS target table found in the DEV_SHARED folder. Rename it to SC_DIM_CUSTOMERS.
Open the Mapping Designer tool. Create a new mapping named m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx. Add the SC_STG_CUSTOMERS relational source to the new mapping. Add the SC_DIM_CUSTOMERS relational target to the new mapping. Save your work.
Create a new Lookup transformation using the SC_DIM_CUSTOMERS table. Drag the lookup window and make it taller. Select all the ports from SQ_SC_STG_CUSTOMERS and drop them on to an empty port at the bottom of the Lookup. Edit the Lookup transformation.
a. b.
Click on the Dynamic Lookup Cache value. Click on the Insert Else Update value.
c. d.
Select the Ports tab and for all ports from the SQ_SC_STG_CUSTOMERS prefix them with IN_ and remove the 1 from the end of the name. Select the Condition tab and create the condition CUST_ID = IN_CUST_ID.
282
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
e.
Select the Ports tab again; It should look the same as Figure 12-1. Notice the new port entry called NewLookupRow.
Figure 12-1. Port tab view of a dynamic Lookup
Note: Dynamic lookups allow for inserts and updates to take place in cache as the same operations
of different names. This enables PowerCenter to update the Lookup Cache with correct values.
Note: NewLookupRow is used to store the values; 0, 1, 2.
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
283
f.
Under the Associated Port column, click the box where it says N/A and select the port names from the list that you want to associate. See Figure 12-2.
Figure 12-2. Port to Port Association
g. h. 5.
Associate the remaining ports. Clear the Output checkmarks for all of the ports prefixed with IN_.
Create a Filter transformation named fil_ROWS_UNCHANGED. Drag all output ports from the Lookup transformation to the Filter transformation. Create a condition that allows all rows that are marked for update or insert, or all rows where the CUST_ID is NULL to pass through. Any rows where NewLookupRow != 0 are deemed to be inserts or updates. If you need assistance refer to the reference section at the end of the lab.
Create an Update Strategy transformation named upd_DIM_CUSTOMERS. Drag all ports from the Filter transformation to the Update Strategy transformation. Edit the upd_DIM_CUSTOMERS Update Strategy transformation.
a.
Add an Update Strategy Expression that marks the row as an insert, update or reject. Use the following pseudo code to construct your expression. If you need assistance refer to the reference section at the end of the lab. If CUST_ID is NULL then reject the row Else If NewLookupRow equals 1 then mark the row for insert Else if NewLookupRow equals 2 then mark the row for update.
b.
Ensure the Forward Rejected Rows option is checked. This will send any rejected rows to error logs which will be created later.
Tip: Refer to the Unit 11 lab for details on the Update Strategy Transformation.
4.
284
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
5.
Launch the Workflow Manager and sign into your assigned folder. Create a new workflow named wkf_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx. Add a new Session task using m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx mapping. Edit the s_m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx session task.
a. b. c.
Set the connection value for the SQ_STG_CUSTOMERS source to your assigned NATIVE_STGxx connection object. Set the connection value for the SC_DIM_CUSTOMERS target to your assigned NATIVE_EDWxx connection object. In the Config Object tab:
i. ii.
Change Error Handling section for the entry Error Log Type from None to Flat File as shown in Figure 12-4. Change the Error Log File Name to PMErrorxx.log where xx refers to your student number.
5.
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
285
6.
Review the Task Details, your information should appear similar to Figure 12-5.
Figure 12-5. Task Details of the Completed Session Run
7.
Select the Source/Target Statistics tab. Your statistics should be the same as displayed in Figure 12-6.
Figure 12-6. Source/Target Statistics for the Session Run
286
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Data Results
Preview the target data from the Designer, your data should appear the same as displayed in Figure 12-7.
Figure 12-7. Data preview of the DIM_CUSTOMERS table
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
287
Reference
1.
2.
288
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
289
290
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
291
Type
Passive.
Description
The unconnected Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source when it is referenced within any transformation that supports expressions.
292
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Business Purpose
A source table or file may have a percentage of records with incomplete data. The holes in the data can be filled by performing a look up to another table or tables. As only a percentage of the rows are affected it is better to perform the look up on only those rows that need it and not the entire data set.
Example
In the following example an insurance business received records of policy renewals, a small percentage of records have the CUSTOMER_ID field data missing. The following mapping uses an Unconnected Lookup transformation to fill in the missing data.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
293
Key Points
Use the lookup function within a conditional statement. The condition is evaluated for each row but the lookup function is only called if the condition evaluates to TRUE. The unconnected Lookup transformation is called using the key expression :lkp.lookupname. Data from several input ports may be passed to the Lookup transformation but only one port may be returned. An Unconnected Lookup transformation returns on one value designated by the Lookup transformation R (return) port. If the R port is not checked that mapping will be valid but the session created from the mapping will fail at run time.
Performance Considerations
Using a cached Lookup attribute can improve performance if the Lookup table is static.
294
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
295
Description
System variables hold information that is derived from the system. The user cannot control the content of the variable but can use the information contained within the variable. Three variables that we will discuss are described in the table shown below.
Variable SESSSTARTTIME SYSDATE $$$SESSSTARTTIME Description The time that the session starts execution. This is based on the time of the Integration Service. The current date/time on the system that PowerCenter is running on. The Session Start time returned as a string.
Business Purpose
The main reason that system variables are utilized to build mappings in PowerCenter is that they can provide consistency to program execution. Business and systems professionals will find this very useful when building systems.
Example
Setting a port to the system date. To set a value of a port to the system date the developer needs to do this in an expression within a transformation. For this example we will set the DATE_UPDATED port to the system date. Port: DATE_UPDATED Datatype: Date Expression: SYSDATE
296
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Description
A mapping can utilize parameters and variables to store information during the execution. Each parameter and variable is defined with a specific data type. Parameters are different from variables in that the value of a parameter is fixed during the run of the mapping while the value for a variable can change. Both parameters and variables can be accessed from anywhere in the mapping. To create a parameter or variable, select Mapping>Parameters and Variables from within the Mapping Designer in the Designer client.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
297
Scope
Parameters and variables can only be utilized inside of the object that they are created in. For instance a mapping variable created for mapping_1 can only be seen and used in mapping_1 and is not available in any other mapping or mapplet. A parameter or variable's scope is the mapping in which it was created. As a general rule for Informatica, when a variable is created its scope is relative to the object in which it was created.
298
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
User-defined variable and parameter names must always begin with $$.
$$PARAMETER_NAME
or $$VARIABLE_NAME
To change the value of a variable, you must use one of the following functions within an expression:
Function Name SetVariable Usage Notes Sets the variable to a value that you specify (executes only if a row is marked as insert or update). At the end of a successful session, the Integration Service saves either the MAX or MIN of (start value.final value) to the repository, depending on the aggregate type of the variable. Unless overridden, it uses the saved value as the start value of the variable for the next session run. Increments a counter variable. If the Row Type is Insert increment +1, if Row Type is Delete increment -1. 0 for Update and Reject. Compare current value to value passed into the function. Returns the higher value and sets the current value to the higher value. Compare current value to the value passed into the function. Returns the lower value and sets the current value to the lower value. Example SetVariable($$VAR_NAME, 1)
At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined. The final value written to the repository is not necessarily the last value processed by the SetVariable function. The final value written to the repository for a variable that has an Aggregate type of Max will be whichever value is greater, current value or initial value. The final value for a variable with a MIN Aggregation Type will be whichever value is smaller, current value or initial value.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
299
Variable Definition
The Integration Service determines the value of a variable by checking for it in a specific order. The following table describes the order of precedence.
Number 1 2 3 4 Item Parameter File Repository Saved Value Initial Value Default Value Description This file can hold information about definitions of variables and parameters Values for variables that were saved in the repository upon the successful completion of a session. The Initial value as defined by the user. The Default value set by the system.
Parameter Definition
The Integration Service determines the value of a parameter by checking for it in a specific order. The following table describes the order of precedence.
Number 1 2 3 Item Parameter File Initial Value Default Value Description This file can hold information about definitions of variables and parameters The Initial value as defined by the user. The Default value set by the system.
Purpose
Mapping variables and parameters are used:
300
We will discuss two examples, one using a variable and one using a parameter. The first example uses a variable to implement incremental extracts from relational sources. The second example uses parameters to replace naked numbers and strings within expressions.
Example 1
Tracking Last Execution Date To set up a mapping to perform an incremental extract, we will utilize a variable to track when the mapping was last executed. We will then use this variable as part of the SQL that extracts the data to ensure that we only pick up new and modified records. The variable will be updated to today's date when the mapping is complete so that we can use it the next time we run. The SQL WHERE clause will be modified in the Source Qualifier Transformation. The following is an example of a statement that could become part of a Source Qualifier Filter.
F1_LAST_UPDATE_DATE >= '$$LAST_RUN_DT'
Where:
F1_LAST_UPDATE_DATE $$LAST_RUN_DT
is a database field that contains the date when the record was last touched
is a user created mapping variable that holds the value of the date of the last run. Note that this variable is surrounded by single quotes. The quotes are required so that the SQL syntax will be proper.
Setvariable( $$LAST_RUN_DT, SESSSTARTTIME)
Example 2
Replacement of Nameless Numbers It is not always wise to embed a number or character string into expressions because the support team may not understand the meaning of the number or character string. To help eliminate misunderstandings, use parameters to leave a better record of how the value is derived. Without a parameter, the expression might be:
IIF(ISNULL(SOLD_DT),TO_DATE('1/1/3000','MM/DD/YYYY')
Someone could misinterpret this statement, possibly thinking it might be a mistake. But if we used a parameter then there would less chance of a misunderstanding. For instance, the following statement is much clearer.
IIF(ISNULL(SOLD_DT),$$OFFICIAL_DEFAULT_DT) $$OFFICIAL_DEFAULT_DT is equal to '1/1/3000'. If all mappings used the same parameter and a common parameter file, then it would be easy to ensure that all processes used the same value. This would ensure
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
301
consistency. Additional examples of ways mappings could utilize variables or parameters to replace nameless numbers and strings is shown in the following table.
Variable/Parameter Usage Examples Reason/Goal Replace Naked Numbers with Number e.g. an expression that determines if tech support cases have been open greater than 100 days. Replace Naked Characters - Set the value of Processing Center where the session is executed using a variable that is defined in the mapping and has its value set in a parameter file. Consistency - Utilize parameters to make sure that everyone uses the same value in expressions. Create two parameters that represent yes and no. Have all mappings use the same values via a parameter file. Potential Value 100 'US Param Or Var Param Var Name $$MAX_NUM_DAYS_OPEN $$REG_PROC_LOCATION
'Y' N
Param Param
$$YES_1_CHAR $$NO_1_CHAR
302
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Technical Description
The information needed resides in two separate staging tables. To compound this, the relationship between the two tables does not exist on the database. Referential integrity will have to be created within PowerCenter. Special formulas are needed to process the discounts out of range. To make this more efficient the use of mapping parameters and variables will be used.
Objectives
Duration
35 Minutes
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
303
Will have to join two tables to get the payment id. This relationship does not exist in the RDBMS so it will need to be created in PowerCenter. Daily Target Append
SOURCES
Tables Table Name STG_TRANSACTIONS STG_PAYMENT Schema/Owner TDBUxx TDBUxx Selection/Filter Where STG_TRANSACTIONS.PAYMENT_DESC = STG_PAYMENT.PAYMENT_TYPE_DESC
TARGETS
Tables Table Name FACT_SALES Schema Owner Update X TDBUxx Delete X Insert X Unique Key CUST_ID, PRODUCT_KEY, DEALERSHIP_ID, PROMO_ID, DATE_KEY
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override lkp_DIM_DATES DIM_DATES Location TDBUxx
304
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Lookup Name Table Match Condition(s) Lookup Type Lookup Name Table Match Condition(s) Filter/SQL Override
STG_TRANSACTIONS.PRODUCT_ID = DIM_PRODUCT.PRODUCT_ID
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
305
306
Target Table
Target Column
FACT_SALES
CUST_ID
FACT_SALES
PRODUCT_KEY
FACT_SALES STG_PAYMENT STG_TRANSACTIONS DIM_DATES STG_TRANSACTIONS STG_TRANSACTIONS STG_TRANSACTIONS STG_TRANSACTIONS STG_PROMOTIONS DISCOUNT/Derived Derived Derived Derived sum of SALES_QTY sum of (SELLING_PRICE * SALES_QTY) - DISCOUNT - HOLDBACK - REBATE) sum (UNIT_COST * SALES_QTY) If the discount is > 17.75 then look up to the STG_PROMOTIONS table to select a discount rate. The discount is the discount rate divided by 100 times the selling price. DATE_KEY PROMO_ID Lookup from STG_TRANSACTIONS to DIM_DATES using TRANSACTION_DATE as the lookup value. PAYMENT_ID Source Qualifier join on payment description. (stg_transactions/stg_payment)
DEALERSHIP_ID
STG_TRANSACTIONS
DEALERSHIP_ID
FACT_SALES
PAYMENT_ID
FACT_SALES
PROMO_ID
FACT_SALES
DATE_KEY
FACT_SALES
UNITS_SOLD
FACT_SALES
REVENUE
FACT_SALES
COST
FACT_SALES
DISCOUNT
HOLDBACK
STG_TRANSACTIONS
HOLDBACK
FACT_SALES
REBATE
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Instructions
Step 1: Create an Internal Relationship Between two Source Tables
1. 2. 3.
Launch the Designer and sign into your assigned folder. Drag the STG_TRANSACTIONS and STG_PAYMENT relational source tables into the Source Analyzer workspace. The PAYMENT_DESC column from STG_TRANSACTIONS and the PAYMENT_TYPE_DESC column of the STG_PAYMENT table are logically related so we can build a join on them. They both contain the payment type description. Link the PAYMENT_DESC column from STG_TRANSACTIONS to the PAYMENT_TYPE_DESC column of the STG_PAYMENT table. This will create a PK-FK relationship between the two tables.
Note: Creating the PK_FK relationship within the Source Analyzer does not create this relationship
on the actual database tables. The relationship is created on the source definitions within PowerCenter only. Your source definitions should look the same as displayed in Figure 13-1.
Figure 13-1. Source Analyzer view of the STG_TRANSACTIONS and STG_PAYMENT tables
Open the mapping named m_FACT_SALES_LOAD_xx. Add a mapping parameter be clicking Mappings > Parameters and Variables.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
307
3.
On the next screen, click the Add a new variable to this table icon. See Figure 13-2:
Figure 13-2. Declare Parameters and Variables screen
4.
Parameter Name = $$MAX_DISCOUNT Type = Parameter Datatype = decimal Precision = 15,2 For the initial value, enter 17.25.
5. 6.
Create a Lookup transformation using the SC_DIM_PROMOTIONS relational target table and name it lkp_DIM_PROMOTIONS.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
308
2.
Name the new port IN_PROMO_ID and make it an input only port. Make DISCOUNT the Return port. Uncheck the Output ports for all other ports except PROMO_ID and DISCOUNT.
3. 4.
Create the lookup condition comparing PROMO_ID to IN_PROMO_ID. Click OK and save the repository.
Edit the exp_DISCOUNT_TEST Expression transformation. If the IN_DISCOUNT port has a value greater than the value passed in via a mapping parameter, then we need to get an acceptable value from the DIM_PROMOTIONS table. The variable port v_DISCOUNT will be used to hold the return value. Edit the v_DISCOUNT variable port and add the expression:
IIF(IN_DISCOUNT > $$MAX_DISCOUNT, :LKP.LKP_DIM_PROMOTIONS(PROMO_ID),IN_DISCOUNT)
3.
The discount is held as a whole number. We need to change this to a percentage and apply it against the selling price to derive the dollar value of the discount. Edit the output port OUT_DISCOUNT and add the expression:
v_DISCOUNT / 100 * SELLING_PRICE
Create an Aggregator transformation named agg_FACT_SALES. Drag the PRODUCT_KEY port from lkp_DIM_PRODUCT to agg_FACT_SALES. Drag the DATE_KEY port from lkp_DIM_DATES to agg_FACT_SALES.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
309
4.
Drag the following ports from the Expression transformation to the Aggregator:
PAYMENT_ID CUST_ID DEALERSHIP_ID PROMO_ID SELLING_PRICE UNIT_COST SALES_QTY HOLDBACK REBATE OUT_DISCOUNT
5.
Open the Aggregator and re-order the key ports in the following order: CUST_ID, PRODUCT_KEY, DEALERSHIP_ID, PAYMENT_ID, PROMO_ID, DATE_KEY.
6.
7. 8.
Uncheck the output ports for SELLING_PRICE, UNIT_COST and SALES_QTY. Rename the following ports:
SELLING_PRICE to IN_SELLING_PRICE. UNIT_COST to IN_UNIT_COST. SALES_QTY to IN_SALES_QTY. OUT_DISCOUNT to DISCOUNT. Create a new output port after the DISCOUNT port.
Port Name Datatype OUT_UNITS_SOLD decimal
9.
310
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Precision Expression
3 SUM(IN_SALES_QTY)
10.
Use Autolink by name to link the ports from the agg_FACT_SALES transformation to the SC_FACT_SALES target table. You will need to use the prefix of OUT_ to link all of the ports.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
311
11. 12.
Launch the Workflow Manager client and sign into your assigned folder. Create a new workflow named wkf_FACT_SALES_LOAD_xx. Add a new Session task named s_m_FACT_SALES_LOAD_xx that uses the m_FACT_SALES_LOAD_xx mapping. Edit the s_m_FACT_SALES_LOAD_xx session.
a. b. c.
Set the connection value for the sq_STG_TRANSACTIONS_PAYMENT source table to NATIVE_STGxx where xx is your student number. Set the connection value for the SC_FACT_SALES target table to NATIVE_EDWxx where xx is your student number. Change the Target load type to Normal.
312
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
d. e. 5. 6. 7.
Under the mapping tab, select the lkp_DIM_DATES transformation and ensure that the Cache File Name Prefix is set to your pre-defined persistent cache (LKPSTUxx). Under the Session Properties tab, set $Target connection value to NATIVE_EDWxx.
Save your work. Start the workflow. Review the Task Details.
Figure 13-9. Task Details of the completed session run
8.
Review the Source/Target Statistics. Your statistics should be the same as displayed in Figure 13-10.
Figure 13-10. Source/Target Statistics of the completed session run
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
313
Data Results
Preview the target data from the Designer, your data should appear the same as displayed in Figure 13-11.
Figure 13-11. Data Preview of the FACT_SALES target table
314
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
315
316
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Mapplets
Description
Mapplets can combine multiple mapping object for re-usability; they can also simplify complex mapping maintenance. A mapplet can receive the input data from either an internal source or from the mapping pipeline that calls the mapplet. A mapplet must pass data out of the mapplet via a Mapplet Output transformation.
317
318
319
Description
The Mapplet Input transformation acts as an input to a Mapplet.
Example
In the following example a business as part of its daily sales needs to apply discounts to the data, perform a number of lookups and aggregate the sales values. This functionality is used in several types of feeds so a Mapplet was created to provide this functionality to many mappings. The Mapplet Input transformation is used to receive the sales transactions by customers, discounts are applied and then two lookups are used to find the product key and date keys. An Aggregator is used to sum the cost and revenue. A Mapplet Output transformation is used pass the output of the Mapplet back into the mapping that called it.
320
Type
Passive.
Description
The Mapplet Output transformation acts as an output from a Mapplet.
Example
The following example illustrates the Mapplet Output transformation.
321
Warning: When the mapplet is expanded at runtime, an unconnected output group can result in a
transformation having no output connections. If that is illegal, the mapping will be invalid.
322
Examples:
If the mapplet outputs are fed by an Expression transformation, the mapping is invalid because an Expression requires a connected output. If the mapplet outputs are fed by a Router, the mapping is valid because a Router can have unconnected output groups.
323
324
Technical Description
To take advantage of previously created objects, we will create a mapplet from existing objects used in a previous mapping. This mapplet can then be used in other mappings.
Objectives
Create a Mapplet
Duration
15 Minutes
325
Instructions
Step 1: Create the Mapplet
1. 2.
In the Mapping Designer, re-open the m_FACT_SALES_LOAD_xx mapping. Highlight the following five transformations by holding down the Ctrl key and pressing the left mouse button:
3. 4.
Select Edit > Copy or type Ctrl+C. Open Mapplet Designer. Create a mapplet named mplt_AGG_SALES.
a. b. c.
Select Edit > Paste or type Ctrl+V. Right click in the workspace and Arrange All. Select the Scale to Fit icon.
d.
e. f.
Name the Mapplet Input Transformation in_TRANSACTIONS. Add a mapplet Output transformation.
g.
326
h. i. j. k.
From the exp_DISCOUNT_TEST transformation, drag all Input ports to the Input transformation. From the Aggregator agg_FACT_SALES, drag all Output ports to the Output transformation. Select the Scale to Fit icon. The mapplet should look similar to Figure 14-2.
Figure 14-2. Mapplet Designer view of MPLT_AGG_SALES with Input and Output transformations
l.
Save your work. Notice the mapplet is invalid. Scroll through the messages in the output window. They point to the expression exp_DISCOUNT_TEST as having an invalid symbol reference. The reference to the parameter $$MAX_DISCOUNT is invalid as it does not exist within the mapplet parameter definition.
Note: Mapping parameters and variables that are created in a mapping are not available for use in a mapplet that is called from the mapping.
m.
Create a new parameter: Parameter Name = $$MAX_DISCOUNT Type = Parameter Datatype = decimal Precision = 15,2 Initial Value = 17.25
5.
Make a copy of the m_FACT_SALES_LOAD_xx mapping and open it in the Mapping Designer. Rename the mapping to m_FACT_SALES_LOAD_MAPPLET_xx. Delete the 5 transformations that you previously copied to the mapplet. Drag the mapplet mplt_AGG_DAILY into the mapping. Use Autolink by name to link the ports from the sq_STG_TRANSACTIONS_PAYMENT to the mplt_AGG_SALES input.
327
6. 7. 8. 9.
Manually link the DISCOUNT port to the IN_DISCOUNT port. Use Autolink by name to link the Output portion of the mapplet to the target. You will need to specify OUT_ for the prefix and 1 for the suffix. Arrange All Iconic. Save your work.
328
329
330
Designing mappings The workshop will give you practice in designing your own mappings.
What to Consider
The mapping process requires much more up front research than it appears. Before designing a mapping, it is important to have a clear picture of the end-to-end processes that the data will flow through.
Design a high-level view of the mapping and document a picture of the process with the mapping, using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it will follow to accomplish its goal. After the high level flow has been established, document the details at the field level, listing each of the target fields and the source field(s) that are used to create the target field. Document any expression that may take place in order to generate the target field (e.g., a sum of a field, a multiplication of two fields, a comparison of two fields, etc.). Whatever the rules, be sure to document them at this point and remember to keep it at a physical level. The designer may have to do some investigation at this point for some business rules. For example, the business rules may say 'For active customers, calculate a late fee rate'. The designer of the mapping must determine that, on a physical level, that translates to 'for customers with an ACTIVE_FLAG of 1, multiply the DAYS_LATE field by the LATE_DAY_RATE field'. Create an inventory of Mappings and Reusable objects. This list is a 'work in progress' list and will have to be continually updated as the project moves forward. The lists are valuable to all but particularly for the lead developer. These objects can be assigned to individual developers and progress tracked over the course of the project. The administrator or lead developer should gather all of the potential Sources, Targets and Reusable objects and place these in a shared folder accessible to all who may need access to them. As for Reusable objects, they need to be properly documented to make it easier for other developers to determine if they can/should use them in their own development. As a developer the specifications for a mapping should include required Sources, Targets and additional information regarding derived ports and finally how the ports relate from the source to the target. The Informatica Velocity methodology provides a matrix that assists in detailing the relationship between source fields and target fields. It also depicts fields that are derived from values in the Source and eventually linked to ports in the target.
331
If a shared folder for Sources and Targets is not available, the developer will need to obtain the source and target database schema owners, passwords and connect strings. With this information ODBC connections can be created in the Designer tool to allow access to the Source and Target definitions. Document any other information about the mapping that is likely to be helpful in developing the mapping. Helpful information may, for example, include source and target database connection information, lookups and how to match data in the lookup tables, data cleansing needed at a field level, potential data issues at a field level, any known issues with particular fields, pre or post mapping processing requirements, and any information about specific error handling for the mapping. The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. In addition, the design document should be updated if the business rules change or if more information is gathered during the build process.
Relational Source
Expression
Router
Relational Target
Relational Target
Mapping Specifics
The following are tips that will make the mapping development process more efficient. (not in any particular order)
One of the first things to do is to bring in all required source and target objects into the mapping. Only connect fields that are needed or will be used. Only connect from the Source Qualifier those fields needed subsequently. Filter early and often. Only manipulate data that needs to be moved and transformed. Reduce the number non-essential records that are passed through the mapping. Decide if a Source Qualifier join will net the result needed versus creating a Lookup to retrieve desired results. Reduce the number of transformations. Excessive number of transformations will increase overhead. Consider increasing the shared memory from 12MB to 25MB or 40MB when using a large number of transformations. Make use of variables, local or global, to reduce the number of times functions will have to be used. Watch the data types. The Informatica engine converts compatible data types automatically. Excessive number of conversions is inefficient. Make use of variables, reusable transformations and mapplets for reusable code. These will leverage the work done by others.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
332
Use active transformations early in the process to reduce the number of records as early in the mapping as possible. When joining sources, select appropriate driving/master table. Utilize single pass reads. Design mappings to utilize one Source Qualifier to populate multiple targets. Remove or reduce field-level stored procedures. These will be executed for each record and slow performance. Lookup Transformation tips:
When the source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. Standard rule of thumb is not to cache tables over 500,000 rows. Use equality (=) conditions if possible in the Condition tab. Use IIF or DECODE functions when lookup returns small row sets. Avoid date comparisons in lookup; convert to string. Operations and Expression Transformation tips:
Numeric operations are faster than string. Trim Char and Varchar fields before performing comparisons. Operators are faster than functions (i.e. || vs. CONCAT). Use flat files. File read/writes are faster than database reads/writes on same server. Fixed width files are faster than delimited file processing.
333
334
Technical Description
The instructions will provide enough detail for you to design and build the mapping necessary to load the promotions aggregate table. It is suggested that you use the Velocity best practices that have been discussed during the course. The workshop will provide tables that can be filled in before you start building the mapping. If you are unclear on any of the instructions please ask the instructor.
OBJECTIVE
Design and create a mapping to load the aggregate table
Duration
120 minutes
335
Workshop Details
Sources and Targets
SOURCE: STG_TRANSACTIONS This relational table contains sales transactions for 7 days. It will be located in the TDBUxx schema and contains 5,475 rows. For the purpose of this mapping we will read all 7 days of data. See Figure 15-1 for the source table layout.
Figure 15-1. Source table definition
TARGET: FACT_PROMOTIONS_AGG_DAILY This is a relational table is located in the TDBUxx schema. After running the mapping it should contain 1,073 rows. See Figure 15-2 for the target table layout.
Figure 15-2. Target table definition
Mapping Details
In order to successfully create the mapping you will need to know some additional details. The management has decided that they don't want to keep track of the Manager Discount or the Employee Discount (PROMO_ID 105 and 200) so these will need to be excluded from the load.
The PRODUCT_KEY can be obtained from the DIM_PRODUCT table by matching on the PRODUCT_ID. The DATE_KEY can be obtained from the DIM_DATES table by matching the TRANSACTION_DATE to the DATE_VALUE. UNITS_SOLD is derived by summing the SALES_QTY.
336
REVENUE is derived by taking the SALES_QTY times the SELLING_PRICE and then subtracting the DISCOUNT, HOLDBACK and REBATE.
Most of the discounts are valid but occasionally they may be higher than the acceptable value of 17.25. When this occurs you will need to obtain an acceptable value based on the PROMO_ID. The acceptable value can be obtained from the DIM_PROMOTIONS table by matching the PROMO_ID. The DISCOUNT is a percentage stored as a number. To calculate the actual discount in dollars you will need to divide the DISCOUNT by 100 and multiply it by the SELLING_PRICE. REVENUE_PER_UNIT is derived by dividing the REVENUE by the SALES_QTY. COST is derived by summing the UNIT_COST. COST_PER_UNIT is derived by summing the UNIT_COST and dividing it by the sum of the SALES_QTY. Save your work often.
337
SOURCES
Tables Table Name Schema/Owner Selection/Filter
TARGETS
Tables Table Name Schema Owner Update Delete Insert Unique Key
LOOKUPS
Lookup Name Table Match Condition(s) Filter/SQL Override Lookup Name Table Location Location
338
Match Condition(s) Filter/SQL Override Lookup Name Table Match Condition(s) Filter/SQL Override Location
Workflow Details
This is a simple workflow containing a Start task and a Session task.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer 339
Run Details
Your Task Details should be similar to Figure 15-3.
Figure 15-3. Task Details of the completed session run
340
341
342
Link Conditions Workflow variables Assignment tasks Decision tasks Email tasks
If the link condition is True, the next task is executed. If the link condition is False, the next task is not executed.
To set a condition, right-click a link and enter an expression that evaluates to True or false. You can use workflow variables in the condition (see later).
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
343
User-defined workflow variables are created by selecting Workflows > Edit and then selecting the Variables tab.
344
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Description
Workflow variables can be user-defined or pre-defined: User-defined workflow variables can be used to pass information from one point in a workflow to another.
1. 2. 3. 4.
Declare workflow variables in the workflow Variables tab. Selecting persistent will write the last value out to the repository and make it available the next time the workflow is executed. Use an Assignment task to set the value of the variable. Use the variable value later in the workflow. System Variables (SYSDATE and WORKFLOWSTARTTIME) can be used for example when calculating variable dates and times in the Assignment task link conditions. Task-specific workflow variables are available in Decision, Assignment and Timer tasks, and in link conditions. They include EndTime, ErrorCode, ErrorMsg, FirstErrorCode, FirstErrorMsg, PrevTaskStatus, SrcFailedRows, SrcSuccessRows, StartTime, Status, TgtFailedRows, TgtSuccessRows and TotalTransErrors.
Workflow variables are discussed in more detail in the Workflow Administration Guide.
Business Purpose
A workflow can contain multiple tasks and multiple pipelines. One or more tasks or pipelines may be dependent on the status of previous tasks.
Example
S2 may be dependent on the successful running of S1. Success may be defined as session status = Successful and the number of source and target failed rows = zero. The link that precedes S2 can be coded such that S2 will not run if all 3 of the criteria are not true. Use the Task Specific Workflow Variables 'Status', 'SrcFailedRows' and 'TgtFailedRows' in the Link Condition Expression. In this proposed case, there is no allowance for only 1 of the 3 conditions being true.
S4 may be desired not to run if S3 took more than 1 hour past the workflow start time. A truncation and testing of WORKFLOWSTARTTIME in the Link Condition preceding S4, is appropriate.
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
345
Description
The Assignment task can establish the value of a Workflow Variable (refer to the subsequent Workflow Variables section of this document) whose value can be used at a later point in the workflow, as testing criteria to determine if (or when) other workflow tasks/pipelines should be run. It is a 3-step process: create a Workflow Variable in the workflow properties; establish the value of that variable with an Assignment task; test that variable value at some subsequent point in the workflow.
Business Purpose
Running a workflow task may depend on the results of other tasks or calculations in the workflow. An Assignment task can do certain calculations, establish a variable value for a Workflow Variable. What that value is may determine whether other tasks or pipelines are run.
Example
S5 should run at least 1 hour after S2 completes. ASGN1 can be coded to set a time that TIMER1 will wait for, before proceeding to S5. To prevent ASGN1 from running until S2 completes, use a Link Condition (refer to Workflow Design section of this document).
346
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Code the Assignment task ASGN1 (in part) using the PowerCenter TRUNC date function, and the pseudocode for the variable date value > = Session2's EndTime + 1 hour. The Timer task TIMER1 will wait for that variable time to exist before running S5.
Description
Decision tasks enable the workflow designer to set criteria by which the workflow will or will not proceed to the next set of tasks, depending on whether the set criteria is true or false.
Business Purpose
Commonly, workflows have multiple paths. Some are simply concurrent tasks. Others are pipelines of tasks that should only be run if results of preceding tasks are successful. Still others are pipelines of tasks that should only be run if those results are not successful. What determines the success or failure of a task or group of tasks is User Defined, depending on the business-defined rules and operational rules of processing. That criterion is set as the Decision Condition in a Decision Task and subsequently tested for a True or False condition.
Example
If a session, group of sessions or any combination of workflow tasks is successful, a subsequent set of sessions should run. If any one of the tasks fails or does not produce desired results, those sessions should
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
347
not be run. Instead, an email should be sent to the processing operator to perhaps run a back out session, or simply notify the Development Team Lead or Business Unit Lead, that an error condition existed.
Description
Email tasks enable PowerCenter to send email messages at various points in a workflow. Users can define email addresses, a subject line and the email message text. When called from within a Session task, the message text can contain variable session-related metadata.
Business Purpose
Various business and operational staff may need to be notified of the progress of a workflow, the status of tasks (or combination of tasks) within it, or various metadata results of a session.
Example
The Business Unit Team Lead may request to receive an email detailing the time a load finished, the total number of rows loaded and the number of rows rejected. This could be accomplished with either a
348 Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
reusable email task (which allows variable session metadata) called from within a session. If sessionspecific variable metadata is not required, a standard text message could be send by using a non-reusable email task which follows the session in the workflow. Operational staff may request receipt of an email if a session-required source file does not arrive by the time the session is scheduled to run. Receipt of the email message would be the operator's signal that some type of manual intervention or restore routine is required to correct the problem.
Performance Considerations
A running, configured email server is required; however, the impact of the Integration Service sending emails is minimal.
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
349
350
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Technical Description
The source for the weekly product aggregate table will be the daily product aggregate table. The mapping to load this table is located in the DEV_SHARED folder. A workflow needs to be created that will run the weekly aggregate load session after the daily aggregate load session has run 7 times. This can be accomplished using an assignment task, a decision task, link conditions and session tasks. A load date equal to the beginning day of the week will be used to provide the date key for the weekly aggregate table. The mapping to accomplish this has already been created and will need to be copied from the DEV_SHARED folder. It contains a mapping variable that will be incremented by 1 at the end of the session/mapping run.
Objectives
Assigning Workflow Variables Incrementing Workflow Variables using the Assignment Task Branching in a workflow using a Decision Task Using Link Conditions
Duration
35 Minutes
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
351
Load the weekly product aggregate table from the daily product aggregate table. Weekly Target Append
SOURCES
Tables Table Name FACT_PRODUCT_AGG_DAILY Schema/Owner TDBUxx Selection/Filter
TARGETS
Tables Table Name FACT_PRODUCT_AGG_WEEKLY Schema Owner Update X TDBUxx Delete Insert X Unique Key PRODUCT_KEY, DEALERSHIP_ID, DATE_KEY
Email Task
Instructions
Step 1: Copy the Mappings
1. 2. 3. 4. 5.
in the Designer, copy the m_FACT_PRODUCT_AGG_DAILY_LOAD mapping and the m_FACT_PRODUCT_AGG_WEEKLY_LOAD mapping from the DEV_SHARED folder. Select Yes for any Target Dependencies. Select Skip or Reuse to resolve any conflicts. Rename the mappings to include your student number. Save your work.
In the Workflow Manager, copy the wkf_FACT_PRODUCT_AGG_WEEKLY_LOAD workflow from the DEV_SHARED folder. Resolve the conflict by selecting the m_FACT_PRODUCT_AGG_DAILY_LOAD_xx mapping. Drag the new workflow into the Workflow Designer. Edit the session and make the following changes:
a. b.
Change the $Target Connection Value to reflect your assigned student connection NATIVE_EDWxx. Change the Session Log File Name to include your student number.
c. 5.
Change the source and target connections to reflect your assigned student connection, NATIVE_STGxx and NATIVE_EDWxx respectively.
Rename it to include your student number. In the Properties tab change the Workflow Log File Name to include your student number. In the Variables tab create a new workflow variable: Variable Name = $$WORKFLOW_RUNS Datatype = integer Persistent = checked Default Value = 0
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
353
2. 3.
Link the s_m_FACT_PRODUCT_AGG_DAILY_LOAD_xx Session task to the Assignment task. Double click the link.
354
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
4.
Add a link condition to ensure that the assignment task only executes if the s_m_FACT_PRODUCT_AGG_DAILY_LOAD_xx Session task was successful. See Figure 16-2 for details.
Figure 16-2. Link condition testing if a session run was successful
5. 6. 7.
Edit the Assignment task. Rename it to asgn_WORKFLOW_RUNS. In the Expressions tab, create an expression that increments the User Defined Variable named $$WORKFLOW_RUNS by 1. See Figure 16-3 for details.
8.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
355
2. 3. 4. 5.
Link the asgn_WORKFLOW_RUNS Assignment task to the Decision task. Double click the link. Add a link condition to ensure that the decision task only executes if the asgn_WORKFLOW_RUNS Assignment task was successful (refer to previous step). Edit the Decision task.
a. b.
Rename it to dcn_RUN_WEEKLY. In the Properties tab. Create a Decision Name expression using the modulus function that checks to see if this is the seventh run of the workflow. This can be done by dividing the workflow variable by seven and checking to see if the remainder is 0. See Figure 16-4 for details.
Tip: The decision task will evaluate the expression and return a value of either TRUE or FALSE. This can be checked in a link condition to determine the direction taken.
Create a session task named s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx that uses the m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx mapping. Link the dcn_RUN_WEEKLY Decision task to the s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx session task. Double click the link.
356
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
4.
Add a link condition that checks to see if the dcn_RUN_WEEKLY Decision task has returned a value of TRUE, meaning that it is time to load the weekly aggregate table. See Figure 16-5 for details.
Figure 16-5. Link condition testing for a Decision Task condition of TRUE
5.
Set the relational connection value for the SQ_SC_FACT_PRODUCT_AGG_DAILY source to NATIVE_EDWxx, where xx is your student number. Set the relational connection value for the SC_FACT_PRODUCT_AGG_WEEKLY target to NATIVE_EDWxx, where xx is your student number. Verify the Target load type is set to Normal. In the Properties tab set the Incremental Aggregation option to on.
2. 3.
Link the dcn_RUN_WEEKLY Decision task to the Email task. Double click the link. Add a link condition that checks to see if the dcn_RUN_WEEKLY Decision task has returned a value of FALSE, meaning that the daily load has completed and that it is NOT time to load the weekly aggregate table.
4.
Rename it to eml_DAILY_LOAD_COMPLETE.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
357
b.
Add anyperson@anycompany.com as the Email User Name. Add appropriate text for the Email Subject and Email Text. See Figure 16-6 for details.
5. 6.
Right-click in the workspace and select Arrange > Horizontal. Save your work. Your workflow should appear the same as displayed in Figure 16-7.
Figure 16-7. Completed Workflow
358
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
2.
Review the workflow in the Gantt view of the Workflow Monitor. It should appear similar to Figure 16-8.
Figure 16-8. Gantt chart view of the completed workflow run
3. 4.
Return to the Workflow Manager. Right-click the wkf_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx workflow in the Navigator window and select View Persistent Values. The value should be set to 1. See Figure 16-9 and Figure 16-10.
Figure 16-9. View Workflow Variables
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
359
Note: Each time you run the workflow this value will increase by one. Figure 16-10. Value of the $$WORKFLOW_RUNS variable after first run
5. 6.
Click Cancel to exit. Run the workflow six more times to emulate a week's normal runs and after the last run the Gantt Chart view should be similar to Figure 16-11.
Figure 16-11. Gantt chart view of the completed workflow run after the weekly load runs
7.
360
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
361
362
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
Event Wait task Event Raise task Command task Reusable tasks Reusable Session task Reusable Session configuration pmcmd Utility
Description
Event Wait tasks wait for either the presence of a named flat file (pre-defined event) or some other userdefined event to occur in the workflow processing.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
363
For a predefined event, the task waits for the physical presence of a file in a directory local to the Integration Service process machine. This file is known as an indicator file. If the file does not exist, the Event Wait task will not complete. When the file is found, the Event Wait task completes and the workflow proceeds to subsequent tasks. The Event Wait task can optionally delete the indicator file once detected or the file can be deleted by a subsequent process. For a user-defined event, the developer:
1. 2. 3.
Defines an event in the workflow properties (prior to workflow processing) Includes an Event Wait task at a suitable point in the workflow, where further processing must await some specific event. Includes an Event Raise task at a suitable point in the workflow, e.g. after a parallel pipeline has completed. The Event Raise task sets the event to active. (Event Raise task is described later).
Pre-Defined Event
Business Purpose
An Event Wait task watching for a flat file by name is placed in a workflow because some subsequent processing is dependent on the presence of the file.
Example
A Session task may be expecting to process a flat file as source data. Inserting a Pre-Defined Event Wait task containing the specific name and location of the flat file causes the workflow to proceed if the file is found. If not found, the workflow goes into a Wait status.
364
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Performance Considerations
The only known consideration is the length of time the Integration Service may have to wait if the file does not arrive. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case a file does not arrive in a reasonable length of time. Refer to the Email task earlier in this section.
User-Defined Event
Business Purpose
An Event Wait task waiting for the occurrence of a user-defined event will be strategically placed such that the workflow should not proceed further unless a different but specific series of pre-determined tasks and conditions have occurred. It will always work in concert with an Event Raise task. Per the 3 steps mentioned above: the user creates the workflow Event, the Event Raise triggers the Event (or sets it 'active') and the Event Wait task does not proceed to subsequent tasks until it detects that the specific Event was triggered.
Example
A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4 and S5. S5 cannot run until S4 runs.
One way to ensure that S5 does not run unless S1 and S2 have run, is to create a workflow Event in the workflow properties, insert an Event Raise task after S2 that triggers (activates) the Event and place a User-Defined Event Wait task after S4 to detect whether the Event has been triggered. If not, the workflow waits until it is triggered.
Performance Considerations
The only known performance consideration is the length of time the Integration Service may have to wait if the Event is not raised. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case the Event does not occur within a reasonable length of time. (Refer to the Email and Timer tasks earlier in this section.)
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
365
Description
Event Raise tasks are always used in conjunction with User-Defined Event Wait tasks. They send a signal to an Event Wait task that a particular set of pre-determined events have occurred. A user-defined event is defined as the completion of the tasks from the Start task to the Event Raise task. It is the same 3 step process previously mentioned: the developer defines an 'Event' in the workflow properties; the Event Raise task 'raises' the event at some point in the running workflow; an Event Wait task is placed at a different point in the workflow to determine if the Event has been raised.
Business Purpose
This task allows signals to be passed from one spot in the workflow, to another that a particular series of pre-determined events have occurred.
Example
This example is the same as the one in the Event Wait task section of this document. A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4 and S5. S5 cannot run until S4 runs.
366
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
One way to ensure that S5 is not run unless S1 and S2 have run, is to create a workflow Event in the workflow properties, insert an Event Raise task after S2 that triggers (activates) the Event and place a User-Defined Event Wait task after S4 to detect whether the Event has been triggered. If not, the workflow waits until it is triggered.
Performance Considerations
As before, the only known performance consideration is the length of time the Integration Service may have to wait if the Event is not raised. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case the Event does not occur within a reasonable length of time. (Refer again to the Email and Timer tasks earlier in this section.)
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
367
Description
Command tasks are inserted in workflows and worklets to enable the Integration Service to run one or more OS commands of any nature. All commands or batch files referenced must be executable by the OS login that owns the Integration Service process.
Business Purpose
OS commands can be used for any operational or Business Unit related procedure and can be run at any point in a workflow. Command tasks can be set to run one or more OS commands or scripts/batch files, before proceeding to the next task in the workflow. If more than one command is coded into a Command task, the entire task can be set to fail if any one of the individual commands fails. Additionally and optionally, each individual command can be set not to run if a preceding command has failed.
Example
A Session task that produces an output file could be followed by a Command task that copies the file to another directory or FTPs the file to another box location. The command syntax would be the same as the command syntax that would accomplish this at the OS command prompt on the Integration Service process machine. A Session task that is relying on a flat file data as its source data could be preceded by a Command task which contains a script that step-by-step verifies the presence of the file, opens it and verifies/compares control totals or record counts to some external source of information (again, any sequence of steps that could be accomplished at the OS level). A series of multiple concurrent or sequential Sessions in a workflow could all be followed by one Command task coded to copy (or move) all session logs created by the workflow to a special daily backup directory.
368
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Performance Considerations
The only known consideration is the length of time the OS commands collectively take to run on the Integration Service process machine. This is not within the control of the Integration Service.
Session, Email and Command tasks can be reusable. Use the Task Developer to create reusable tasks. Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow. In a workflow, a reusable task is indicated by a special symbol.
Business Purpose
Occasionally, a certain mapping logic may be required to be run in multiple workflows. Since a mapping is a reusable object, the developer could code multiple sessions, all based on the same mapping. However, there is a simpler way to create 'like-sessions' that are all based on the same mapping - a Reusable Session. Once created in the Task Developer, an instance of the Reusable Session can be placed in any workflow or worklet.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
369
Examples
If the same mapping needs to be used a number of times and if a number of Session properties need to be changed in each of the uses of a session (e.g. - time-stamped logs, increased Line Sequential Buffer Length, special error handling), the changes could be made in the parent session and every time an instance of the session were placed in a workflow, the session would automatically take on all those customized properties. This results in less developer effort versus creating separate new sessions, each with multiple customized session properties. A business receives 25 data file sources for its 25 customers. The data structure of each customer is different enough that a different mapping is required for each, to get the data into one common format. Once data is structured the same, each needs to be subsequently run using common mapping logic to further transform the data in a like manner. If 25 output files were created, 25 instances of one Reusable Session could be used to process all data files. Each workflow would contain one customer-specific session/mapping and one instance of the Reusable Session, pre-coded with common session properties.
Performance Considerations
It is recommended to use reusable session tasks sparingly because retrieving the metadata for a reusable session task and its child instances from the repository takes longer than retrieving the metadata for a non-reusable session task.
Define session properties that can be reused by multiple sessions within a folder. Use Tasks - Session Configuration menu option or Tasks toolbar icon. Opens Session Config Browser where you set session properties. Invoke in Session tasks, in Config Object tab, Config Name box. Can override these session properties further down in the Config Object tab.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
370
Description
The pmcmd command line utility allows the developer to perform most Workflow Manager operations outside of the PowerCenter Client tool.
Syntax Example
pmcmd startworkflow -sv integrationservicename -u yourusername -p yourpassword workflowname
This command will start the workflow located on the named Integration Service. You must supply the user name and password to sign in to the Integration Service, as well as the workflow name.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
371
372
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Description
Worklets are optional processing objects inside workflows. They contain PowerCenter tasks that represent a particular grouping of, or functionally-related set of tasks. They can be created directly in a workflow (non-reusable) or from within the Workflow Designer (reusable).
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
373
374
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Business Purpose
A workflow may contain dozens of tasks, whether they are concurrent or sequential. During workflow design they will be developed naturally into 'groupings' of meaningfully-related tasks, run in the appropriate operational order. The workflow can run as-is, from start to finish, executing task-by-task or the developer can place natural groupings of tasks into worklets. A worklet's relationship to a workflow is like a subroutine to a program or an applet to an application. Worklets can be used in a very large workflow to encapsulate the natural groupings of tasks.
Example
This example is similar to the one in the Event Wait task section of this document. Workflow with individual tasks: A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4. S5 cannot run until all 4 sessions run.
Workflow converted to internal worklets: Worklet1 contains S1 and S2; Worklet2 contains S3 and S4; S5 will run after both worklets complete.
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
375
Description
Timer tasks are used to keep track of the time an object of the workflow started. They can be based on:
Absolute Time. The user can specify the date and time to start the timer from. Datetime Variable. The user can provide a variable that lets the Timer task know when to start the timer from. Relative Time. The Timer task can start the timer from the start time of the Timer task, the start time of the workflow or worklet, or from the start time of the parent workflow.
Business Purpose
Business or operational processing specifications may require that a workflow run to a certain point then sit idle for a length of time or until a fixed physical point in time.
Example
A workflow may contain sessions that should only run for a maximum amount of time. A Timer task can be set to wait for the maximum amount of time and then send an email or abort the workflow of the time limit is exceeded. The Timer task could be set to execute one hour after the start of the workflow.
376
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Performance Considerations
There are no real performance considerations for the timer task.
Description
Control tasks are used to alter the normal processing of a workflow. They can stop, abort or fail any workflow or worklet.
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
377
Control Options
Business Purpose
When error condition exists, operational staff may prefer that a workflow or worklet simply stop, abort or fail rather than be emailed that an error exists.
Example
As with the example in the Pre-Defined Event (Event Wait task section) a workflow may have a session which is expecting a flat file for source data. If the file does not arrive within one hour after the workflow start time, the desired action may be to fail the workflow. A workflow may have a session that runs through to a successful processing conclusion but could contain data row errors. A Control task could be placed subsequent to the session, with conditions set to stop, abort or fail the workflow (and use an Email task to notify someone of the issue).
378
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
379
380
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Technical Description
The support team has suggested that a worklet be created that controls the loading of the staging table. This worklet will contain the staging load session task, a Timer task to keep track of the run time, an Email task to inform the administrator should the load take longer than 1 hour and a Control task to stop the worklet if the Session task finishes in less than 1 hour. A workflow will be created that contains the worklet and then runs the fact load after the worklet has completed.
Objectives
Create a Worklet Create a Timer task Create a Control task Create an Email task Use a Worklet within a Workflow
Duration
25 minutes
Start Task
Worklet
Session Task
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
381
382
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
Instructions
Step 1: Copy the Mappings
1. 2. 3.
Copy the m_STG_INVENTORY_LOAD and the m_FACT_INVENTORY_LOAD mappings from the DEV_SHARED folder. Rename them m_STG_INVENTORY_LOAD_xx and m_FACT_INVENTORY_LOAD_xx. Save your work.
Launch the Workflow Manager client and sign into your assigned folder. Open the Worklet Designer workspace.
3. 4.
Select the menu option Worklets Create. Delete the default Worklet name and enter wklt_STG_INVENTORY_LOAD_xx.
Velocity Best Practice: The wklt_ as a prefix for a Worklet name is specified in the Informatica Velocity
Methodology.
Create a Session task named s_m_STG_INVENTORY_LOAD_xx that uses the m_STG_INVENTORY_LOAD_xx mapping. Edit the s_m_STG_INVENTORY_LOAD_xx session.
a. b. c. d.
Ensure that the filename for the SQ_inventory flat file source is set to inventory.txt. Set the relational connection value for the STG_INVENTORY target to NATIVE_STGxx where xx is your student number. Set the Target truncate table option to on. In the General tab. Select the Fail parent if this task fails check box.
3.
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
383
2.
Velocity Best Practice: The tim_ as a prefix for a Timer task name is specified in the Informatica
Velocity Methodology.
b.
In the Timer tab. Set the Relative time to start after 1 hour from the start time of this task. See Figure 18-1 for details.
Figure 18-1. Timer Task Relative time setting
3. 4.
Link the Start task to the tim_MAX_RUN_TIME Timer task. Save your work.
2.
Velocity Best Practice: The eml_ as a prefix for an Email task name is specified in the Informatica Velocity Methodology.
384
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
b.
Enter administrator@anycompany.com as the Email User Name. Enter Session s_m_STG_INVENTORY_LOAD_xx exceeded max time allotted as the Email Subject. Enter something appropriate for the Email Text.
3. 4.
Link the tim_MAX_RUN_TIME Timer task to the eml_MAX_RUN_TIME_EXCEEDED Email task. Save your work.
2.
Velocity Best Practice: The ctl_ as a prefix for a Control task name is specified in the Informatica Velocity Methodology.
b.
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
385
c.
Select Stop parent for the Control Option Value. See Figure 18-3 for details.
Figure 18-3. Control Task Properties Tab
3. 4. 5.
Link the s_m_STG_INVENTORY_LOAD_xx Session task to the ctl_STOP_TIMEOUT Control task. Save your work. Right Click anywhere in the Worklet workspace and select Arrange>Horizontal. Your Worklet should appear the same as displayed on Figure 18-4.
Figure 18-4. Completed Worklet
Create a workflow named wkf_FACT_INVENTORY_LOAD_xx. Drag the wklt_STG_INVENTORY_LOAD_xx Worklet from the Worklets folder in the Navigator Window into the workflow.
386
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
3. 4. 5.
Link the Start task to the wklt_STG_INVENTORY_LOAD_xx worklet. Create a session task named s_m_FACT_INVENTORY_LOAD_xx that uses the m_FACT_INVENTORY_LOAD_xx mapping. Edit the s_m_FACT_INVENTORY_LOAD_xx session:
a. b. c.
Set the relational connection value for the SQ_STG_INVENTORY source to NATIVE_STGxx where xx is your student number. Set the relational connection value for the FACT_INVENTORY target to NATIVE_EDWxx where xx is your student number. Set the Target load type to Normal.
6. 7.
Link the wklt_STG_INVENTORY_LOAD_xx worklet to the s_m_FACT_INVENTORY_LOAD_xx Session task. Add a link condition to ensure that the Session task only executes if the wklt_STG_INVENTORY_LOAD_xx Worklet did not fail. Hint: worklet status != FAILED. Your Workflow should appear the same as displayed on Figure 18-5.
Figure 18-5. Completed Workflow
Start the workflow. Review the workflow in the Gantt Chart view of the Workflow Monitor. When completed it should appear similar to Figure 18-6.
Figure 18-6. Gantt chart view of the completed workflow run
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
387
388
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
Designing workflows The workshop will give you practice in designing your own workflow.
Considerations
The workflow process requires some up front research. Before designing a workflow, it is important to have a clear picture of the task-to-task processes.
Design a high-level view of the workflow and document the process within the workflow, using a textual description to explain exactly what the workflow is supposed to accomplish and the methods or steps it will follow to accomplish its goal. The load development process involves the following steps:
Clearly define and document all dependencies Analyze the processing resources available Develop operational requirement Develop tasks, worklets and workflows based on the results Create an inventory of Worklets and Reusable tasks. This list is a 'work in progress' list and will have to be continually updated as the project moves forward. The lists are valuable to all but particularly for the lead developer. Making an up front decision to make all Session, Email and Command tasks reusable will make this easier. The administrator or lead developer should put together a list of database connections to be used for Source and Target connection values. Reusable tasks need to be properly documented to make it easier for other developers to determine if they can/should use them in their own development. If the volume of data is sufficiently low for the available hardware to handle, you may consider volume analysis optional, developing the load process solely on the dependency analysis. Also, if the hardware is not adequate to run the sessions concurrently, you will need to prioritize them. The highest priority within a group is usually assigned to sessions with the most child dependencies. Another possible component to add into the load process is sending e-mail. Three e-mail options are available for notification during the load process:
Post-session e-mails can be sent after a session completes successfully or when it fails E-mail tasks can be placed in workflows before or after an event or series of events E-mails can be sent when workflows are suspended Document any other information about the workflow that is likely to be helpful in developing the workflow. Helpful information may, for example, include source and target database connection
389
information, pre or post workflow processing requirements, and any information about specific error handling for the workflow.
Create a Load Dependency Analysis. This should list all sessions by dependency, along with all other events (Informatica or other) that they depend on. Also be sure to specify the dependency relationship between each session or event, the algorithm or logic needed to test the dependency condition during execution, and the impact of any possible dependency test results (e.g., don't run a session, fail a session, fail a parent or worklet, etc.) Create a Load Volume Analysis. This should list all the sources and row counts and row widths expected for each session. This should include all Lookup transformations in addition to the extract sources. The amount of data that is read to initialize a lookup cache can materially affect the initialization and total execution time of a session. The completed workflow design should then be reviewed with one or more team members for completeness and adherence to the business requirements. And, the design document should be updated if the business rules change or if more information is gathered during the build process.
Workflow Overview
Session Task 1 Start Task Session Task 2 Decision Task Command Task Control Task Email Task
Workflow Specifics
The following are tips that will make the workflow development process more efficient (not in any particular order):
If developing a sequential workflow, use the Workflow Wizard to create Sessions in sequence. There is also the option to create dependencies between the sessions Use a parameter file to define the values for parameters and variables used in a workflow, worklet, mapping, or session. A parameter file can be created by using a text editor such as WordPad or Notepad. List the parameters or variables and their values in the parameter file. Parameter files can contain the following types of parameters and variables:
Workflow variables Worklet variables Session parameters Mapping parameters and variables When using parameters or variables in a workflow, worklet, mapping, or session, the Integration Service checks the parameter file to determine the start value of the parameter or variable. Use a parameter file to initialize workflow variables, worklet variables, mapping parameters, and mapping variables. If not defining start values for these parameters and variables, the Integration Service checks for the start value of the parameter or variable in other places.
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
390
Session parameters must be defined in a parameter file. Since session parameters do not have default values, when the Integration Service cannot locate the value of a session parameter in the parameter file, it fails to initialize the session. To include parameter or variable information for more than one workflow, worklet, or session in a single parameter file, create separate sections for each object within the parameter file. Also, create multiple parameter files for a single workflow, worklet, or session and change the file that these tasks use, as necessary. To specify the parameter file that the Integration Service uses with a workflow, worklet, or session, do either of the following:
Enter the parameter file name and directory in the workflow, worklet, or session properties. Start the workflow, worklet, or session using pmcmd and enter the parameter filename and directory in the command line. On hardware systems that are under-utilized, you may be able to improve performance by processing partitioned data sets in parallel in multiple threads of the same session instance running on the Integration Service node. However, parallel execution may impair performance on over-utilized systems or systems with smaller I/O capacity Incremental aggregation is useful for applying captured changes in the source to aggregate calculations in a session. If the source changes only incrementally, and you can capture changes, you can configure the session to process only those changes. This allows the Integration Service to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. Target Load Based Strategies:
Loading directly into the target is possible when the data is going to be bulk loaded. Load into flat files and bulk load using an external loader. Load into a mirror database. From the Workflow Manager Tools menu, select Options and select the option to 'Show full names of task'. This will show the entire name of all tasks in the workflow.
391
392
Technical Description
The instructions will provide enough detail for you design and build the workflow that will load all of the staging tables in a single run. If you are unclear on any of the instructions please ask the instructor.
Objectives
Duration
60 minutes
Workshop Details
Mappings Required
This section contains a list of the mappings that will be used in the workflow.
Workflow/Worklet Details
This section contains the workflow processing details.
1.
Name the workflow wkf_LOAD_ALL_STAGING_TABLES. The workflow needs to start at a certain time each day. For this workshop you can set the start time to be a couple of minutes from the time you complete the workflow. Remember that the start time is relative to the time on the Integration Service process machine.
2.
No sessions can begin until an indicator file shows up. The indicator file will be named fileindxx.txt and will be created by you using any text editor. You will need to place this file in the directory indicated by the instructor after you start the workflow. If you are in a UNIX environment you may skip this requirement. In order to utilize the CPUs in a more efficient manner you will want to run some of the sessions concurrently and some of them sequentially:
a. b.
3.
The sessions containing mappings m_Stage_Payment_Type, m_Stage_Product and m_Dealership_Promotions can be run sequentially. The session containing mapping m_Stage_Customer_Contacts can be run concurrently to the sessions in the previous bullet point.
393
c. d. e. f. 4. 5.
If any of the previous sessions fails then an email should be sent to the administrator and the workflow aborted. Use administrator@anycompany.com as the Email User Name. The session containing mapping m_STG_EMPLOYEES can only be run after the 4 previously mentioned sessions complete successfully. The session containing mapping m_STG_TRANSACTIONS needs to be run concurrently to the m_STG_EMPLOYEES. If either of the previous sessions fails an email should be sent to the administrator.
All sessions need to truncate the target tables. You may want to create reusable sessions from previously created workflows. The management only wants the workflow to run a maximum of 50 minutes. Should the workflow take longer than the 50 minutes an email must be sent to the Administrator. Should the workflow finish in the allotted time the timer task will need to be stopped.
There is more than one solution to the workshop. You will know that your solution has worked when all of the sessions have completed successfully.
394
395
396
397
Note: For more information and to register to take an exam, see http://www.informatica.com/services/ education_services/certification/default.htm
398