Sie sind auf Seite 1von 15

Course Analytics Data Warehouse Pilot

Informatica Architecture, Migration, and Installation

Prepared for:

The Ohio State University

Prepared by:

Hemant Swarup, Covansys

October 9, 2002

Informatica Architecture, Migration, and Installation

DOCUMENT REVISION LIST


Client: The Ohio State University Project: Course Analytics Pilot Data Conversion Project Code: OSU004 Document Name: Informatica Architecture, Migration, and Installations. Version # 1.0 1.1 Revision Date 10/7/2002 10/9/2002 Revision Description Initial Release Reformatting Author Hemant Swarup Don Rohde

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation

Table of Contents
1. INTRODUCTION .................................................................................................................................. 1 1.1. 1.2. 2. 2.1. 2.2. 2.3. 2.4. 3. 4. 3.1. Purpose of Document .................................................................................................................. 1 Background.................................................................................................................................. 1 Development Repository (etl_repository) .................................................................................... 3 QA Repository (Proposed)........................................................................................................... 4 Production Repository (dwprod1_repo)....................................................................................... 4 Naming Conventions ................................................................................................................... 5 Version Control ............................................................................................................................ 7

INFORMATICA ARCHITECTURE....................................................................................................... 2

MIGRATION ......................................................................................................................................... 7 INFORMATICA INSTALLATION AND CONFIGURATION ................................................................ 8 4.2. Overview ...................................................................................................................................... 8 4.3. Detailed Installation and Configuration........................................................................................ 8 4.3.1. Installing the Informatica Server............................................................................................. 8 4.3.2. Configuring the Informatica Server ........................................................................................ 8 4.3.3. Connecting to Databases....................................................................................................... 9 4.3.4. Registering the Informatica Server......................................................................................... 9 4.3.5. Starting and Stopping the Informatica Server ........................................................................ 9

APPENDIX A PILOT PROJECT INFORMATICA INSTALLATION........................................................ 11

A.1. A.2.

Overview .................................................................................................................................. 11 Pilot Project Informatica Architecture................................................................................... 12

Covansys Corporation Confidential Documentation

ii

October 9, 2002

Informatica Architecture, Migration, and Installation

1. Introduction
1.1. Purpose of Document
The purpose of this document is to provide OSU with a proposed Informatica architecture, guidelines for migrating Informatica objects into production, and installation procedures.

1.2. Background
The Ohio State University engaged Covansys to manage the Course Analytics Data Warehouse Pilot project. Prior to beginning this project, Ohio State selected Informatica PowerMart as its Extract, Transformation and Load (ETL) product to support its data warehousing data conversion needs. Convansys installed Informatica PowerMart and PowerPlug in May 2002. An overview of the installation was documented during the initial project. This information is included as a reference in Appendix A.

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation

2. Informatica Architecture
During Course Analytics Data Warehouse Pilot project Informatica was installed on the DWDEV server. This environment was used to support the development, testing and data conversion requirements of the Course Analytics data mart. The development architecture included Informatica PowerMart 5.1.2, PowerPlug 5.1 on a Windows Server 2000 platform using Microsoft SQL Server 2000 to support the Informatica Repositories. Covansys performed the installation of the production Informatica environment (dwprod1_repo) as part the current project. Ohio State wants to establish an Informatica architecture to support its for future data warehousing projects. This section of this document will propose an Informatica Architecture that will support development, quality assurance, and production needs of OSU. The diagram in figure 1 proposes a repository structure that would support an enterprise implementation of Informatica.

etl_repository ETL1 (proposed) ETL2 (proposed) ETL3(proposed) Shared Objects Developer 1 Developer 2

dwqa_repo (proposed) ETL1 ETL2 ETL3

dwprod1_repo

ETL1 ETL2 ETL3

Figure 1 Proposed Repository Structure for OSU The proposed Informatica environment will have three repositories, development (etl_repository), QA (proposed) and Production (dwprod1_repo). Ideally each repository should exist on a different machine with its own Informatica server. The benefit of building this architecture is that it will decrease the development time and provide management controls to manage a production environment. Developers can develop while QA testing is being performed. However due to the high cost of three Informatica server licenses, OSU could configure the development and QA repositories on one server. The risk associated with this approach is that the development and QA testing performance may be impacted if both activities are being performed at the same time. The development and QA environments should have its own databases for unit testing and system testing. It is recommended that each developer should have its own database schema or file area against which they can test their mappings. This will ensure that each developer can work independently without causing conflict with other developers.

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation

2.1. Development Repository (etl_repository)


The development repository should have a folder for each developer and a folder for each grouping of the mappings for e.g. student enrollment, departments etc. For the purpose of explanation these folders have been named ETL1, ETL2 etc. in Figure 1. The following are guidelines to consider with a development environment: Create all new programs in the development (etl_repository) repository. Each developer is assigned a folder where all of his or her development is done. Any versions used during development should be retained in the developers folder. It is important to establish and use standard naming conventions. This topic is discussed in a later section of this document. It is recommended that the mappings be divided into folders by business areas, e.g. student information, financial aid, etc. This approach of dividing the mappings by business area will help in better visibility and maintenance of the code. After the mappings have been unit tested they should be moved to the appropriate QA folder (ETL1, ETL2 etc.) in the development repository ready to be moved to the QA repository by the administrator. Do not move any unfinished programs to the QA folders. Exceptions to the above should be approved by the appropriate team leader/manager. Certain business rules that can be used across mappings should be made reusable objects. Transformations like mapplets, lookups, expressions, etc. can be made into reusable objects in Informatica and be reused between mappings and repositories. These objects can be created in the Shared Objects folder. The use the shared objects will increase productivity, maintainability, quality and consistency of the systems being built. House keeping - remove old programs on a frequent basis. The meta-data is stored in a set of tables. Removing older (unused) programs will help speed access to the Informatica repository. Provide detailed meta-data information on your programs. All the pre and post session scripts should be developed and unit tested in the development environment. Each program should be unit tested and documented in the unit test template. For unit testing purposes, the output can be written to the personal schema or the schema for unit testing. Unit testing should test each business rule coded in the mapping. The results should be verified against source data for each business rule. The results should be documented as part of the developers standard practices. After unit test is complete, move the programs (final version) to the appropriate QA folder, to be moved to the QA repository for system test. The following checklist should be verified in the following sequence: o o Check if all the shared objects used by the program are available Shared Objects folder. Move the input, output, mappings and the associated information to the appropriate folders. 3 October 9, 2002

Covansys Corporation Confidential Documentation

Informatica Architecture, Migration, and Installation Definition of the program is complete when: o o It has been unit tested for the business rules. A control balancing test is performed on the output data (i.e. match the number of rows in the source and target) o A referential integrity test has been done. o A data quality test has been done (i.e. identifying the bad data if any) o The mapping has been tested for performance o All of the above has been documented and all the exceptions have been accounted for. It is recommended that after a program is moved from the Development folder to QA folder to Production folder its version in the previous folder be deleted.

2.2. QA Repository (Proposed)


The QA repository should have a folder for each mapping groups for e.g. student enrollment, departments etc. For the purpose of explanation these folders have been named ETL1, ETL2 etc. in Figure 1. The following are guidelines to consider with a QA environment: Only the mappings from the QA folders (ETL1, ETL2, ETL3) in the development repository should be moved to the QA repository. If the mapping uses any stored procedures or external programs they will have to be moved separately. Moving or copying a mapping from one repository to another does not copy or move any external procedures used in the mapping. Developers should perform all system testing in the QA repository. For system test purposes, developers should use the actual data as source. All performance tuning should be done in the QA environment. Performance tuning includes setting up memory parameters, cache parameters etc. System testing also includes testing batch programs running pre session and post session scripts etc. The business analysts should validate the results from the QA testing to ensure that the mapping executed all the business rules and that the output matches the expected output. Exceptions to above should be explained, documented and made available to the users. After the system testing the mapping should be moved to the appropriate folder in the Production repository.

2.3. Production Repository (dwprod1_repo)


The following are guidelines to consider with a production environment: Developers should not make any program changes in the Production environment. Exceptions to the above should be approved by the appropriate team leader/manager. If any program changes are made in the Production environment, the appropriate programs should be copied over to the development repository. Exceptions to the above are performance settings to take advantage of the servers processors and memory. 4 October 9, 2002

Covansys Corporation Confidential Documentation

Informatica Architecture, Migration, and Installation The production programs should be set to read access only. Production ODBC connections should be setup with administrative access only and should not be accessible to developers. Developer should synchronize the completed programs between the QA and Production repositories.

2.4. Naming Conventions


As in any architected environment, the naming of objects plays an important role in developing metadata. Informatica naming convention can be based on either business areas or input sources. The following naming conventions can be adapted in Informatica. Informatica mappings can be grouped by business area. This folder will include all the mappings that load a particular business area, e.g. student enrollment, departments etc. Business areas can also be defined as data bands. The table below provides a few examples for the abbreviation for the data bands and the input sources. Data Band Student enrollment Department Abbreviation stdnt_enroll dept

Naming convention for Informatica mappings and objects. Mappings All mappings should have m_<target name> or m_<business name>. Sources: All sources should have <table name>_<db name>.

The table below provides a few examples for the abbreviation of the Input sources Input source Financial_assistance Ccourse_current Abbreviation fin_assist ccourse_curr

Targets The targets should have the same name as the target table names. Transformations Transformations created for global business rules should be names as <transformation type>_global_<description> 5 October 9, 2002

Covansys Corporation Confidential Documentation

Informatica Architecture, Migration, and Installation For example a global transformation to filter bad dates should be named as the following filter_global_baddates Also a transformation local to a specific input source should be named as the following expression_<input_source> Sessions/Batches S_<mapping name> B_<functionality name> For example if a set of three mapping that load student information are to be executed in a batch the batch can be named as b_student_information ODBC Connections To better manage and control the ODBC connections a naming convention should be adopted for the connections. For source ODBC connections SRC_<database name> For target ODBC connections TGT_<database name> This will help identify if the connection is being used to extract data or to load data. This will also help identify from where the data is being extracted or to which database the data is being loaded.

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation

3. Migration
3.1. Version Control
Informaticas version control features are very basic. It is recommended to use them only if there is no other version control tools are available. A mapping or a folder can be copied or moved from one repository/folder to another using Informaticas Repository Manager tool. A single mapping cannot be versioned in Informatica. Informatica versions the entire folder in which the mapping is stored.

3.2. Migration Processes


1. Copy the mapping to be modified from the production repository into the developer folder in the development repository 2. Modify the mapping and Unit test it 3. Move the mapping to the appropriate QA folder in Development repository 4. Version the folder in the QA repository to which the mapping needs to be copied. On versioning the folder Informatica automatically saves the old version of the folder. 5. Make the new version as the active version 6. Copy/Replace the mapping from the development repository in the new version folder in QA repository 7. System test the new version of the folder 8. Move the new version of the folder into production 9. Delete the old version from the production repository (optional) The above approach will ensure that if due to any reason OSU needs to back out of a version the previous version still exists in the QA and production repository. It is recommended to use different versioning schemes for different releases. This approach will help keeping track of the releases. Major releases should be versioned as 1.0, 2.0 etc., minor releases should be versioned as 1.1, 1.2, 1.3 etc. and patches as 1.1.1, 1.1.2 etc. On copying the new version of the folder into production repository it is recommended to delete the old version. If the mappings in the old folder are being scheduled as a batch, the mappings and batches in the new version will have the same name that can cause a runtime error in Informatica. However this can be avoided by suffixing the new mapping and batch names with the version number for e.g. m_student_info_v_1.0 and by deleting the old schedule and creating a new schedule for the new folder. While copying the mappings from one folder to another or from one repository to another Informatica does not copy the session information. Each time a single mapping is copied the session will have to be recreated. When the entire folder is copied from one repository to another session information is also copied.

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation

4. Informatica Installation and Configuration


4.1. Overview
Before installing Informatica, create an empty schema in the database where Informatica repository will to be installed. To install Informatica four kinds of user accounts are needed: 1. Windows administrator. This is used to install Informatica server. 2. Windows user that has the rights to run a service. This account is used to run the Informatica server. 3. Repository user. This will be created when configuring the Informatica server. The account is created using Informaticas Repository manager. 4. Database user. This is needed to create the repository in the database defined. The following steps are required to install and configure Informatica server on a Windows Server. These steps were used to install Informatica to establish the development and production environments. 1. 2. 3. 4. Install and Configure Informatica server Configure database connectivity for the repository, source and target databases Register the Informatica server in Informatica server manager Start the Informatica server

4.2. Detailed Installation and Configuration 4.2.1. Installing the Informatica Server
1. 2. 3. 4. Log on to the Windows machine using the account with administrator rights. Run SETUP.EXE Chose the directory where the program needs to be installed. When the Edit Service account dialog box appears enter the required information Domain (optional) . [for development] . [for production] User <xxxx > [for development]* <xxxx > [for production]* *The user should have the rights to run a service Password <xxxx > [for development] <xxxx > [for production]

4.2.2. Configuring the Informatica Server


After installing the Informatica server select the Configure Informatica server option. 1. Keys: Enter the platform keys and the database keys supplied with the software 2. Network: Select TCP/IP protocol and enter the following information TCP/IP Host address (optional): Enter TCP/IP host address as an IP number or a local host name TCP/IP Port number: 4001 [Informatica can only run on this port] 3. Repository: Covansys Corporation Confidential Documentation 8 October 9, 2002

Informatica Architecture, Migration, and Installation Repository name: Name of the repository to be created Database type: MS SQL Server Repository User: Administrator [Informatica default, can be changed if needed] Repository password: Administrator [Informatica default, can be changed if needed] Database user: <xxxx > [account for the database containing the repository] Database password: <xxxx> [password for the above account] Connect String: <servername@dbname> [the native connect string that Informatica server uses to access the database Domain: (optional) For MS SQL server repository only, the NT domain of the database user specified above. Use Trusted Connection: (optional) For MS SQL server repository only, if selected the repository uses NT integrated security 3. Compatibility and Database: Default settings were selected for this tab 4. Miscellaneous: Default settings were selected for this tab

4.2.3. Connecting to Databases


To connect to databases Informatica uses either native connectivity or ODBC. OSU uses ODBC connections to connect to the source and target databases.

4.2.4. Registering the Informatica Server


Open Informatica server manager and go to Server Configuration Register Server. Enter the following information: Server Name: <xxxx > Host name: <xxxx > [Name or IP address of the machine on which the Informatica server is installed. $PMRootDir: <C:\ProgramFiles\pmart> Informatica defaults were chosen for the rest.

4.2.5. Starting and Stopping the Informatica Server


To start the server: 1. 2. 3. 4. 5. Verify that the repository database is running Log on as user that has rights to run a service Go to Control Panel Services Informatica Click Start Go to Actions Refresh, the status of the Informatica service should now be started.

Covansys Corporation Confidential Documentation

October 9, 2002

Informatica Architecture, Migration, and Installation 6. If the service fails to start go to Administrative tools Event Viewer Log Application. Look for source PowerMart, select the latest event and view the description of why the service failed to start. To stop the server: 1. Log on as user that has rights to run a service 2. Go to Control Panel Services Informatica 3. Click Stop The Informatica server can also be stopped using Informatica server manager. However in order to do that you must be an administrator or super user in Informatica.

Covansys Corporation Confidential Documentation

10

October 9, 2002

Informatica Architecture, Migration, and Installation

Appendix A Pilot Project Informatica Installation

A.1. Overview The following provides a summary of the Informatica installation for the support of the Course Analytics Data Warehouse Pilot project: Informatica was installed on Server DWDEV, which will be used for the development of the pilot project. The version of Informatica installed was PowerMart 5.1.2 for Windows 2000. The database for the Informatica repository is SQL Server 2000. PowerPlug 5.1 will be used to import metadata from ERwin into the Informatica Repository. The information in the Informatica Repository will be used by BRIO to publish metadata to the Data Warehouse Users. ERwin, Power Plug, Informatica, and the Data Warehouse Pilot Database all reside on the DWDEV server. PowerPlug 5.1 was installed on the DWDEV server so it can be accessed remotely using Terminal Services Client, which is widely used at The Ohio State University. The Informatica PowerMart clients were installed on several machines that will be used by developers, data modelers, and database administrators. Since source systems were not yet identified during the installation process, a copy of the operational data store was used to test the connection between a source database and informatica. To ensure the connectivity between ERwin and Informatica, the physical data model of the operation data store in ERwin was imported into the Informatica repository using PowerPlug 5.1. A trial run was conducted using Informatica where data was extracted from the operational data store and loaded into the target database. This test ensured the connectivity of Informatica between the source and target systems. Due to security concerns Usernames, Passwords, and other connectivity settings will not be stated in this document. For futher information regarding these settings contact the Office of Information Technology.

Covansys Corporation Confidential Documentation

11

October 9, 2002

Informatica Architecture, Migration, and Installation A.2. Pilot Project Informatica Architecture

Informatica Developer/Admin

ERwin Developer/Admin

ERwin 4.0

Brio Developer/Admin

PowerPlug 5.1

Copy of ODS Database SQL Server 2000

Informatica Server

Brio

GEORGE Windows 2000 Informatica Repository SQL Server 2000 DW Pilot Database SQL Server 2000

CARME Windows 2000

ERODSDEV Windows 2000

DWDEV Windows 2000

Covansys Corporation Confidential Documentation

12

October 9, 2002

Das könnte Ihnen auch gefallen