Sie sind auf Seite 1von 99

What are Target Types on the Server?

Target types are relational,file and XML

What are Target Options on the Servers?


Target Options for File Target type are FTP File, Loader and MQ. There are no target options for ERP target type Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else Insert), Delete, and Truncate Table

How do you identify existing rows of data in the target table using lookup transformation?
You with can use a Connected Lookup with dynamic cache on the target.

What are Aggregate transformation? The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform calculations on groups. What are various types of Aggregation?

AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE

What are Dimensions and various types of Dimensions?


The DWH contains two types of tables, 1.Dimension table 2.Fact table. The Dimensions are classified 3 types. 1.SCD TYPE1(Slowly Changing Dimension) : contains current data. 2.SCD TYPE2(Slowly Changing Dimension) : contains current data + complete historical data. 3.SCD TYPE3(Slowly Changing Dimension) : contains current data + one type historical data.

What are 2 modes of data movement in Informatica Server? The data movement mode depends on whether Informatica Server should process single byte or multi-byte character data. This mode selection can affect the enforcement of code page relationships and code page validation in the Informatica Client and Server. a) Unicode - IS allows 2 bytes for each character and uses additional byte for each nonascii character (such as Japanese characters) b) ASCII - IS holds all data in a single byte The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes into effect once you restart the Informatica Server. What is Code Page Compatibility? Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page. Superset - A code page is a superset of another code page when it contains the character encoded in the other code page, it also contains additional characters not contained in the other code page. Subset - A code page is a subset of another code page when all characters in the code page are encoded in the other code page. What is Code Page used for? Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, u must select the Japanese code page of source data. What is Router transformation? Router transformation allows you to use a condition to test data. It is similar to filter transformation. It allows the testing to be done on one or more conditions. A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. Thus the added advantage over filter transformation is that we can also route rejected records as per requirement.Router transformation allows to test more than one condition and based on condition you can route the data into different targets or target instances. A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the

option to route rows of data that do not meet any of the conditions to a default output group. What is Load Manager? While running a Workflow,the PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks.When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks: 1. Locks the workflow and reads workflow properties. 2. Reads the parameter file and expands workflow variables. 3. Creates the workflow log file. 4. Runs workflow tasks. 5. Distributes sessions to worker servers. 6. Starts the DTM to run sessions. 7. Runs sessions from master servers. 8. Sends post-session email if the DTM terminates abnormally. When the PowerCenter Server runs a session, the DTM performs the following tasks: 1. Fetches session and mapping metadata from the repository. 2. Creates and expands session variables. 3. Creates the session log file. 4. Validates session code pages if data code page validation is enabled. Checks query conversions if data code page validation is disabled. 5. Verifies connection object permissions. 6. Runs pre-session shell commands. 7. Runs pre-session stored procedures and SQL. 8. Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and load data. 9. Runs post-session stored procedures and SQL. 10. Runs post-session shell commands. 11. Sends post-session email. What is Data Transformation Manager? After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks. The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads. If we partition a session, the DTM creates a set of threads for each partition to allow concurrent processing.. When Informatica server writes messages to the session log it includes thread type and thread ID. Following are the types of threads that DTM creates:

Master thread - Main thread of the DTM process. Creates and manages all other threads.Mapping thread - One Thread to Each Session. Fetches Session and Mapping Information.Pre and Post Session Thread-One Thread each to Perform Pre and Post Session Operations.reader thread-One Thread for Each Partition for Each Source Pipeline.WRITER THREAD-One Thread for Each Partition if target exist in the source pipeline write to the target.tRANSFORMATION THREAD - One or More Transformation Thread For Each Partition. What is Session and Batches? Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session.Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches : Sequential - Run Session One after the Other.concurrent - Run Session At The Same Time. what is a source qualifier? When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session. Why we use lookup transformations? Lookup Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the following tasks: Get a related value-Get the Employee Name from Employee table based on the Employee IDPerform Calculation. Update slowly changing dimension tables - We can use unconnected lookup transformation to determine whether the records already exist in the target or not. Use a Lookup transformation in your mapping to look up data in a relational table, view, or synonym. Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. You can use multiple Lookup transformations in a mapping While importing the relational source defintion from database,what are the meta data of source U import? Source name Database location Column names Datatypes Key constraints

How many ways you can update a relational source defintion and what r they? in 2 ways we can do it 1) by reimport the source definition 2) by edit the source definition Where should U place the flat file to import the flat file defintion to the designer? There is no such restrication to place the source file. In performance point of view its better to place the file in server local src folder. if you need path please check the server properties availble at workflow manager. It doesn't mean we should not place in any other folder, if we place in server src folder by default src will be selected at time session creation. To provide support for Mainframes source data,which files r used as a source definitions? COBOL Copy-book files Which transformation should u need while using the cobol sources as source defintions?
Normalizer transformaiton which is used to normalize the data.Since cobol sources r oftenly consists of Denormailzed data.

How can U create or import flat file definition in to the warehouse designer? U can not create or import flat file defintion in to warehouse designer directly.Instead U must analyze the file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source defintion into warehouse desginer workspace,the warehouse designer creates a relational target defintion not a file defintion.If u want to load to a file,configure the session to write to a flat file.When the informatica server runs the session,it creates and loads the flatfile. Yes, you can import flat file directly into Warehouse designer. This way it will import the field definitions directly. U can create flat file definition in warehouse designer.in the warehouse designer,u can create new target: select the type as flat file. save it and u can enter various columns for that created target by editing its properties.Once the target is created, save it. u can import it from the mapping designer. What is the maplet? Maplet is a set of transformations that you build in the maplet designer and U can use in multiple mapings. For Ex:Suppose we have several fact tables that require a series of dimension keys.Then we can create a mapplet which contains a series of Lkp transformations to find each dimension key and use it in each fact table mapping instead of creating the same Lkp logic in each mapping. Part(sub set) of the Mapping is known as Mapplet

A mapplet should have a mapplet input transformation which recives input values, and a output transformation which passes the final modified data to back to the mapping. when the mapplet is displayed with in the mapping only input & output ports are displayed so that the internal logic is hidden from end-user point of view. What r the designer tools for creating tranformations? Mapping designer Tansformation developer Mapplet designer What r the active and passive transforamtions? Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation. What r the connected or unconnected transforamations? An unconnected transformation cant be connected to another transformation. but it can be called inside another transformation. In addition to first answer, uncondition transformation are directly connected and can/used in as many as other transformations. If you are using a transformation several times, use unconditional. You get better performance. Connected transformation is a part of your data flow in the pipeline while unconnected Transformation is not. much like calling a program by name and by reference. use unconnected transforms when you wanna call the same transform many times in a single mapping. How many ways u create ports? Two ways 1.Drag the port from another transforamtion 2.Click the add buttion on the ports tab. What r the reusable transforamtions? Reusable transformations can be used in multiple mappings.When u need to incorporate this transformation into maping,U add an instance of it to maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,U can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save U great deal of work.

A transformation can reused, that is know as reusable transformation

You can design using 2 methods using transformation developer create normal one and promote it to reusable What r the reusable transforamtions? Reusable transformations can be used in multiple mappings.When u need to incorporate this transformation into maping,U add an instance of it to maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,U can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save U great deal of work. A transformation can reused, that is know as reusable transformation

You can design using 2 methods using transformation developer create normal one and promote it to reusable What r the unsupported repository objects for a mapplet? COBOL source definition Joiner transformations Normalizer transformations Non reusable sequence generator transformations. Pre or post session stored procedures Target defintions Power mart 3.5 style Look Up functions XML source definitions IBM MQ source defintions Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data. Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions. Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data. Reusable transformations. Transformations that you can use in multiple mappings.

Mapplets. A set of transformations that you can use in multiple mappings. Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. What r the mapping paramaters and maping variables? Maping parameter represents a constant value that U can define before running a session.A mapping parameter retains the same value throughout the entire session. When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session. You can use mapping parameters and variables in the SQL query, user-defined join, and source filter of a Source Qualifier transformation. You can also use the system variable $ $$SessStartTime. The Informatica Server first generates an SQL query and scans the query to replace each mapping parameter or variable with its start value. Then it executes the query on the source database. Can U use the maping parameters or variables created in one maping into another maping? NO. We can use mapping parameters or variables in any transformation of the same maping or mapplet in which U have created maping parameters or variables Can u use the maping parameters or variables created in one maping into any other reusable transformation? Yes.Because reusable tranformation is not contained with any maplet or maping. How can U improve session performance in aggregator transformation?
use sorted input: 1. use a sorter before the aggregator 2. donot forget to check the option on the aggregator that tell the aggregator that the input is sorted on the same keys as group by. the key order is also very important.

What is aggregate cache in aggregator transforamtion? The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator transformation,the informatica

server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files. When you run a workflow that uses an Aggregator transformation, the Informatica Server creates index and data caches in memory to process the transformation. If the Informatica Server requires more space, it stores overflow values in cache files. What r the diffrence between joiner transformation and source qualifier transformation? Source qualifier Homogeneous source Joiner Heterogeneous source U can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation. U need matching keys to join two relational sources in source qualifier transformation.Where as u doesnt need matching keys to join two sources. Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources which r coming from diffrent sources also. In which condtions we can not use joiner transformation(Limitaions of joiner transformation)?
Both pipelines begin with the same original data source. Both input pipelines originate from the same Source Qualifier transformation. Both input pipelines originate from the same Normalizer transformation. Both input pipelines originate from the same Joiner transformation. Either input pipelines contains an Update Strategy transformation. Either input pipelines contains a connected or unconnected Sequence Generator transformation.

what r the settiings that u use to cofigure the joiner transformation? Master and detail source Type of join Condition of the join the Joiner transformation supports the following join types, which you set in the Properties tab: Normal (Default) Master Outer Detail Outer Full Outer Master and detail source Type of join Condition of the join the Joiner transformation supports the following join types, which you set in the Properties tab: Normal (Default)

Master Outer Detail Outer Full Outer What r the join types in joiner transformation? In the Mapping Designer, choose Transformation-Create. Select the Joiner transformation. Enter a name, click OK. The naming convention for Joiner transformations is JNR_TransformationName. Enter a description for the transformation. This description appears in the Repository Manager, making it easier for you or others to understand or remember what the transformation does. The Designer creates the Joiner transformation. Keep in mind that you cannot use a Sequence Generator or Update Strategy transformation as a source to a Joiner transformation. Drag all the desired input/output ports from the first source into the Joiner transformation. The Designer creates input/output ports for the source fields in the Joiner as detail fields by default. You can edit this property later. Select and drag all the desired input/output ports from the second source into the Joiner transformation. The Designer configures the second set of source fields and master fields by default. Double-click the title bar of the Joiner transformation to open the Edit Transformations dialog box. Select the Ports tab. Click any box in the M column to switch the master/detail relationship for the sources. Change the master/detail relationship if necessary by selecting the master source in the M column. Tip: Designating the source with fewer unique records as master increases performance during a join. Add default values for specific ports as necessary. Certain ports are likely to contain NULL values, since the fields in one of the sources may be empty. You can specify a default value if the target database does not handle NULLs. Select the Condition tab and set the condition. Click the Add button to add a condition. You can add multiple conditions. The master and detail ports must have matching datatypes. The Joiner transformation only supports equivalent (=) joins: 10. Select the Properties tab and enter any additional settings for the transformations.

Click OK. Choose Repository-Save to save changes to the mapping. Normal (Default) -- only matching rows from both master and detail Master outer -- all detail rows and only matching rows from master Detail outer -- all master rows and only matching rows from detail Full outer -- all rows from both master and detail ( matching or non matching) What r the joiner caches?
Specifies the directory used to cache master records and the index to these records. By default, the cached files are created in a directory specified by the server variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive.

what is the look up transformation? Use lookup transformation in ur mapping to lookup data in a relational table,view,synonym. Informatica server queries the look up table based on the lookup ports in the transformation.It compares the lookup transformation port values to lookup table column values based on the look up condition. Using it we can access the data from a relational table which is not a source in the mapping. For Ex:Suppose the source contains only Empno, but we want Empname also in the mapping.Then instead of adding another tbl which contains Empname as a source ,we can Lkp the table and get the Empname in target. Why use the lookup transformation ? To perform the following tasks. Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target. What r the types of lookup? Connected lookup Unconnected lookup Persistent cache Re-cache from database Static cache Dynamic cache Shared cache

Differences between connected and unconnected lookup? Connected lookup


Connected lookup
Receives input values diectly from the pipe line. U can use a dynamic or static cache Cache includes all lookup columns used in the maping Support user defined default values

Unconnected lookup
Receives input values from the result of a lkp expression in a another transformation. U can use a static cache. Cache includes all lookup out put ports in the lookup condition and the lookup/return port. Does not support user defiend default values

In addition: Connected Lookip can return/pass multiple rows/groups of data whereas unconnected can return only one port. In addition to this: In Connected lookup if the condition is not satisfied it returns '0'. In UnConnected lookup if the condition is not satisfied it returns 'NULL'. what is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the amount u configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache. Difference between static cache and dynamic cache? lets say for example your lookup table is your target table. So when you create the Lookup selecting the dynamic cache what It does is it will lookup values and if there is no match it will insert the row in both the target and the lookup cache (hence the word dynamic cache it builds up as you go along), or if there is a match it will update the row in the target. On the other hand Static caches dont get updated when you do a lookup. Dynamic cache U can insert rows into the cache as u pass U can not insert or update the cache to the target The informatic server returns a value from the lookup table The informatic server inserts rows into or cache when the condition is true.When the condition is cache when the condition is false.This not true, informatica server returns the default value for indicates that the the row is not in the connected transformations and null for unconnected cache or target table. U can pass these transformations. rows to the target table Static cache

Which transformation should we use to normalize the COBOL and relational sources? The Normalizer transformation normalizes records from COBOL and relational sources, allowing you to organize the data according to your own needs. A Normalizer transformation can appear anywhere in a data flow when you normalize a relational

source. Use a Normalizer transformation instead of the Source Qualifier transformation when you normalize a COBOL source. When you drag a COBOL source into the Mapping Designer workspace, the Normalizer transformation automatically appears, creating input and output ports for every column in the source How the informatica server sorts the string values in Ranktransformation? When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If U configure the seeion to use a binary sort order,the informatica server caluculates the binary value of each string and returns the specified number of rows with the higest binary values for the string. What r the rank caches? During the session ,the informatica server compares an inout row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The informatica server stores group information in an index cache and row data in a data cache. What is the Rankindex in Ranktransformation? The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5: Based on which port you want generate Rank is known as rank port, the generated values are known as rank index. What is the Router transformation? A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. What r the types of groups in Router transformation? Input group Output group The designer copies property information from the input ports of the input group to create a set of output ports for each output group. Two types of output groups User defined groups Default group U can not modify or delete default groups. A Router transformation has the following types of groups: Input Output

Input Group The Designer copies property information from the input ports of the input group to create a set of output ports for each output group. Output Groups There are two types of output groups: User-defined groups Default group You cannot modify or delete output ports or their properties. Why we use stored procedure transformation? A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements. For populating and maintaining data bases What r the types of data that passes between informatica server and stored procedure? 3 types of data Input/Out put parameters Return Values Status code. What is the status code? Status code provides error handling for the informatica server during the session.The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user.It only used by the informatica server to determine whether to continue running the session or stop. What is source qualifier transformation? When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session. Join data originating from the same source database. You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier. Filter records when the Informatica Server reads source data. If you include a filter condition, the Informatica Server adds a WHERE clause to the default query. Specify an outer join rather than the default inner join. If you include a user-defined join, the Informatica Server replaces the join information specified by the metadata in the SQL query. Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query. Select only distinct values from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query.

Create a custom query to issue a special SELECT statement for the Informatica Server to read source data. For example, you might use a custom query to perform aggregate calculations or execute a stored procedure. What is the target load order? U specify the target loadorder based on source qualifiers in a maping.If u have the multiple source qualifiers connected to the multiple targets,U can designatethe order in which informatica server loads data into the targets. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. What is the default join that source qualifier provides? The Joiner transformation supports the following join types, which you set in the Properties tab: Normal (Default) Master Outer Detail Outer Full Outer What r the basic needs to join two sources in a source qualifier? The both the table should have a common feild with same datatype. Its not neccessary both should follow primary and foreign relationship. If any relation ship exists that will help u in performance point of view. Also of you are using a lookup in your mapping and the lookup table is small then try to join that looup in Source Qualifier to improve perf. what is update strategy transformation ? The model you choose constitutes your update strategy, how to handle changes to existing rows. In PowerCenter and PowerMart, you set your update strategy at two different levels: Within a session. When you configure a session, you can instruct the Informatica Server to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject. Describe two levels in which update strategy transformation sets? Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use

instructions coded into the session mapping to flag records for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject. What is the default source option for update stratgey transformation? Data driven What is Datadriven? The Informatica Server follows instructions coded into Update Strategy transformations within the session mapping to determine how to flag rows for insert, delete, update, or reject. If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven by default. When Data driven option is selected in session properties it the code will consider the update strategy (DD_UPDATE,DD_INSERT,DD_DELETE,DD_REJECT) used in the mapping and not the options selected in the session properties. What r the options in the target session of update strategy transsformatioin? Update as Insert: This option specified all the update records from source to be flagged as inserts in the target. In other words, instead of updating the records in the target they are inserted as new records. Update else Insert: This option enables informatica to flag the records either for update if they are old or insert, if they are new records from source. Insert Delete Update Update as update Update as insert Update esle insert Truncate table What r the types of maping wizards that r to be provided in Informatica? Simple Pass through

Slowly Growing Target

Slowly Changing the Dimension

Type1 Most recent values

Type2 Full History Version Flag Date Type3 Current and one previous The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table. Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables. Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data. What r the types of maping in Getting Started Wizard? Simple Pass through maping : Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data from your table before loading new data. Slowly Growing target : Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not require updates. What r the mapings that we use for slowly changing dimension table?

Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data. Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table. Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table. Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension. Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updates What r the different types of Type2 dimension maping? Type2 Dimension/Version Data Maping: In this maping the updated dimension in the source will gets inserted in target along with a new version number.And newly added dimension in source will inserted into target with a primary key. Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension. Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions r saved with the value 0. Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly changing dimensions.This maping also inserts both new and changed dimensions in to the target.And changes r tracked by the effective date range for each version of each dimension. How can u recognise whether or not the newly added rows in the source r gets insert in the target ? In the Type2 maping we have three options to recognise the newly added rows Version number Flagvalue Effective date Range What r two types of processes that informatica runs the session? Load manager Process: Starts the session, creates the DTM process, and sends postsession email when the session completes.

The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations. What r the new features of the server manager in the informatica 5.0? U can use command line arguments for a session or batch.This allows U to change the values of session parameters,and mapping parameters and maping variables. Parallel data processing: This feature is available for powercenter only.If we use the informatica server on a SMP system,U can use multiple CPUs to process a session concurently. Process session data using threads: Informatica server runs the session in two processes.Explained in previous question. Can u generate reports in Informatcia? Yes. By using Metadata reporter we can generate reports in informatica. It is a ETL tool, you could not make reports from here, but you can generate metadata report, that is not going to be used for business analysis What is metadata reporter? It is a web based application that enables you to run reports againist repository metadata. With a meta data reporter,u can access information about Ur repository with out having knowledge of sql,transformation language or underlying tables in the repository Define maping and sessions? Maping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation. Session : It is a set of instructions that describe how and when to move data from source to targets. Which tool U use to create and manage sessions and batches and to monitor and stop the informatica server? Informatica Workflow Managar and Informatica Worlflow Monitor Why we use partitioning the session in informatica? Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline. Informatica server can achieve high performance by partitioning the pipleline and performing the extract , transformation, and load for each partition in parallel. To achieve the session partition what r the necessary tasks u have to do? Configure the session to partition source data. Install the informatica server on a machine with multiple CPUs.

How the informatica server increases the session performance through partitioning the source? For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently. For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file).U can choose to merge the targets. Why u use repository connectivity? When u edit,schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users r valid.All the metadata of sessions and mappings will be stored in repository What r the tasks that Loadmanger process will do? Manages the session and batch scheduling: Whe u start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run on the informatica server.When u configure the session the loadmanager maintains list of list of sessions and session start times.When u sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process. Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents U starting the session again and again. Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session. Creating log files: Loadmanger creates logfile contains the status of session. What is DTM process? After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.I creates the master thread.Master thread creates and manges all the other threads. What r the different threads in DTM process? Master thread: Creates and manages all other threads

Maping thread: One maping thread will be creates for each session.Fectchs session and maping information. Pre and post session threads: This will be created to perform pre and post session operations. Reader thread: One thread will be created for each partition of a source.It reads data from source. Writer thread: It will be created to load data to the target. Transformation thread: It will be created to tranform data. What r the out put files that the informatica server creates during the session running? Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log).It also creates an error log for error messages.These files will be created in informatica home directory. Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that u set. Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.U can view this file by double clicking on the session in monitor window Performance detail file: This file contains information known as session performance details which helps U where performance can be improved.To genarate this file select the performance detail option in the session property sheet. Reject file: This file contains the rows of data that the writer does notwrite to targets. Control file: Informatica server creates control file and a target file when U run a session that uses the external loader.The control file contains the information about the target flat file such as data format and loading instructios for the external loader. Post session email: Post session email allows U to automatically communicate information about a session run to designated recipents.U can create two different messages.One if the session completed sucessfully the other if the session fails. Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate whether the row was marked for insert,update,delete or reject.

output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet. Cache files: When the informatica server creates memory cache it also creates cache files.For the following circumstances informatica server creates index and datacache files. Aggreagtor transformation Joiner transformation Rank transformation Lookup transformation In which circumstances that informatica server creates Reject files? When it encounters the DD_Reject in update strategy transformation. Violates database constraint Filed in the rows was truncated or overflowed. What is polling? \It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when U poll the informatica server Can u copy the session to a different folder or repository? Yes. By using copy session wizard u can copy a session in a different folder or repository.But that target folder or repository should consists of mapping of that session. If target folder or repository is not having the maping of copying session , u should have to copy that maping first before u copy the session In addition, you can copy the workflow from the Repository manager. This will automatically copy the mapping, associated source,targets and session to the target folder. What is batch and describe about types of batches? Grouping of session is known as batch.Batches r two types Sequential: Runs sessions one after the other Concurrent: Runs session at same time. If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another.If u have several independent sessions u can use concurrent batches. Whch runs all the sessions at the same time. Batch--- is a group of any thing Different batches ----Different groups of different things Can u copy the batches? NO

How many number of sessions that u can create in a batch? Any number of sessions. When the informatica server marks that a batch is failed? If one of session is configured to "run if previous completes" and that previous session fails. What is a command that used to run a batch? pmcmd is used to start a batch. What r the different options used to configure the sequential batches? Two options Run the session only if previous session completes sucessfully. Always runs the session. In a sequential batch can u run the session if previous session fails? Yes.By setting the option always runs the session. Can u start a batches with in a batch? U can not. If u want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch. Logically, Yes Can u start a session inside a batch idividually? We can start our required session only in case of sequential batch.in case of concurrent batch we cant do like this. How can u stop a batch? By using server manager or pmcmd. What r the session parameters? Session parameters r like maping parameters,represent values U might want to change between sessions such as database connections or source files. Server manager also allows U to create userdefined session parameters.Following r user defined session parameters. Database connections Source file names: use this parameter when u want to change the name or location of session source file between session runs Target file name : Use this parameter when u want to change the name or location of session target file between session runs. Reject file name : Use this parameter when u want to change the name or location of

session reject files between session runs. What is parameter file? Parameter file is to define the values for parameters and variables used in a session.A parameter file is a file created by text editor such as word pad or notepad. U can define the following values in parameter file Maping parameters Maping variables session parameters When you start a workflow, you can optionally enter the directory and name of a parameter file. The Informatica Server runs the workflow using the parameters in the file you specify. For UNIX shell users, enclose the parameter file name in single quotes: -paramfile '$PMRootDir/myfile.txt' For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes: -paramfile $PMRootDir\my file.txt Note: When you write a pmcmd command that includes a parameter file located on another machine, use the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable. pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '\$PMRootDir/myfile.txt' How can u access the remote source into Ur session? Relational source: To acess relational source which is situated in a remote place ,u need to configure database connection to the datasource. FileSource : To access the remote source file U must configure the FTP connection to the host machine before u create the session. Hetrogenous : When Ur maping contains more than one source type,the server manager creates a hetrogenous session that displays source options for all types. What is difference between partioning of relatonal target and partitioning of file targets?

If u parttion a session with a relational target informatica server creates multiple connections to the target database to write target data concurently.If u partition a session with a file target the informatica server creates one target file for each partition.U can configure session properties to merge these target files. Partition's can be done on both relational and flat files. Informatica supports following partitions 1.Database partitioning 2.RoundRobin 3.Pass-through 4.Hash-Key partitioning 5.Key Range partitioning All these are applicable for relational targets.For flat file only database partitioning is not applicable. Informatica supports Nway partitioning.U can just specify the name of the target file and create the partitions, rest will be taken care by informatica session. what r the transformations that restricts the partitioning of sessions? Advanced External procedure tranformation and External procedure transformation: This transformation contains a check box on the properties tab to allow partitioning. Aggregator Transformation: If u use sorted ports u can not parttion the assosiated source Joiner Transformation : U can not partition the master source for a joiner transformation Normalizer Transformation XML targets. Performance tuning in Informatica? The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following.

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections. Flat files: If ur flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server. Relational datasources: Minimize the connections to sources ,targets and informatica server to improve session performance.Moving target database into server system may improve session performance. Staging areas: If u use staging areas u force informatica server to perform multiple datapasses. Removing of staging areas may improve session performance. U can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance. Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character. If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. We can improve the session performance by configuring the network packet size,which allows data to cross the network at one time.To do this go to server manger ,choose server configure database connections. If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session. Running a parallel sessions by using concurrent batches will also reduce the time of loading the data.So concurent batches may also increase the session performance. Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines. In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session performance.

Aviod transformation errors to improve the session performance. If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache. If Ur session contains filter transformation ,create that filter transformation nearer to the sources or u can use filter condition in source qualifier. Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session performance in this case use sorted ports option. What is difference between maplet and reusable transformation? Maplet consists of set of transformations that is reusable.A reusable transformation is a single transformation that can be reusable. If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike the variables that r created in a reusable transformation can be usefull in any other maping or maplet. We can not include source definitions in reusable transformations.But we can add sources to a maplet. Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusable transformation. We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Where as we can make them as a reusable transformations. Maplet: one or more transformations Reusable transformation: only one transformation Define informatica repository? The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.Thsea tables

stores metadata in specific format the informatica server,client tools use. Infromatica Repository:The informatica repository is at the center of the informatica suite. You create a set of metadata tables within the repository database that the informatica application and tools access. The informatica client and server access the repository to save and retrieve metadata. What r the types of metadata that stores in repository? Following r the types of metadata that stores in the repository Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target defintions Transformations Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data. Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions. Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data. Reusable transformations. Transformations that you can use in multiple mappings. Mapplets. A set of transformations that you can use in multiple mappings. Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. What is power center repository? Standalone repository. A repository that functions individually, unrelated and unconnected to other repositories. Global repository. (PowerCenter only.) The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts. Local repository. (PowerCenter only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders

How can u work with remote database in informatica?did u work directly by using remote connections? You can work with remote,

But you have to

Configure FTP Connection details IP address User authentication To work with remote datasource u need to connect it with remote connections.But it is not preferable to work with that remote source directly by using remote connections .Instead u bring that source into U r local machine where informatica server resides.If u work directly with remote source the session performance will decreases by passing less amount of data across the network in a particular time. what is incremantal aggregation? When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. What r the scheduling options to run a sesion? U can shedule a session to run at a given time or intervel,or u can manually run the session. Different options of scheduling Run only on demand: server runs the session only when user starts session explicitly Run once: Informatica server runs the session only once at a specified date and time. Run every: Informatica server runs the session at regular intervels as u configured. Customized repeat: Informatica server runs the session at the dats and times secified in the repeat dialog box.

What is tracing level and what r the types of tracing level? Tracing level represents the amount of information that informatcia server writes in a log file. Types of tracing level Normal Verbose Verbose init Verbose data What is difference between stored procedure transformation and external procedure transformation? In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.U need data base connection to import the stored procedure in to ur maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r maping.No need to have data base connection in case of external procedure transformation. Explain about Recovering sessions? If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration. Use one of the following methods to complete the session: Run the session again if the Informatica Server has not issued a commit. Truncate the target tables and run the session again if the session is not recoverable. Consider performing recovery if the Informatica Server has issued at least one commit. If a session fails after loading of 10,000 records in to the target.How can u load the records from 10001 th record when u run the session next time? As explained above informatcia server has 3 methods to recovering the sessions.Use performing recovery to load the records from where the session fails. Explain about perform recovery? When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table. How to recover the standalone session?

A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions. To recover sessions using the menu: 1. In the Server Manager, highlight the session you want to recover. 2. Select Server Requests-Stop from the menu. 3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu. To recover sessions using pmcmd: 1.From the command line, stop the session. 2. From the command line, start recovery. How can u recover the session in sequential batches? If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property To recover sessions in sequential batches configured to stop on failure: 1.In the Server Manager, open the session property sheet. 2.On the Log Files tab, select Perform Recovery, and click OK. 3.Run the session. 4.After the batch completes, open the session property sheet. 5.Clear Perform Recovery, and click OK. If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session. If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session. How to recover sessions in concurrent batches? If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session. To recover a session in a concurrent batch: 1.Copy the failed session using Operations-Copy Session. 2.Drag the copied session outside the batch to be a standalone session. 3.Follow the steps to recover a standalone session. 4.Delete the standalone copy. How can u complete unrcoverable sessions? Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning. Run the session from the beginning

when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data. What r the circumstances that infromatica server results an unreciverable session? The source qualifier transformation does not use sorted ports. If u change the partition information after the initial session fails. Perform recovery is disabled in the informatica server configuration. If the sources or targets changes after initial session fails. If the maping consists of sequence generator or normalizer transformation. If a concuurent batche contains multiple failed sessions. If i done any modifications for my table in back end does it reflect in informatca warehouse or maping desginer or source analyzer? NO. Informatica is not at all concern with back end data base.It displays u all the information that is to be stored in repository.If want to reflect back end changes to informatica screens, again u have to import from back end to informatica by valid connection.And u have to replace the existing files with imported files. Yes, It will be reflected once u refresh the mapping once again. After draging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target? u drag three hetrogenous sources and populated to target without any join means you are entertaining Carteisn product. If you don't use join means not only diffrent sources but homegeous sources are show same error. If you are not interested to use joins at source qualifier level u can add some joins sepratly. What is Data cleansing? This is nothing but polising of data. For example of one of the sub system store the Gender as M and F. The other may store it as MALE and FEMALE. So we need to polish this data, clean it before it is add to Datawarehouse. Other typical example can be Addresses. The all sub systesms maintinns the customer address can be different. We might need a address cleansing to tool to have the customers addresses in clean and neat form. what is a time dimension? give an example In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week. Diff between informatica repositry server & informatica server Informatica Repository Server:It's manages connections to the repository from client application. Informatica Server:It's extracts the source data,performs the data transformation,and loads the transformed data into the target Discuss the advantages & Disadvantages of star & snowflake schema? star schema consists of single fact table surrounded by some dimensional table.In snowflake schema the dimension tables are connected with some subdimension table. In starflake dimensional ables r denormalized,in snowflake dimension tables r normalized. star schema is used for report generation ,snowflake schema is used for cube. The advantage of snowflake schema is that the normalized tables r easier to maintain.it also saves the storage space. The disadvantage of snowflake schema is that it reduces the effectiveness of navigation across the tables due to large no of joins between them. Waht are main advantages and purpose of using Normalizer Transformation in Informatica? Narmalizer Transformation is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data How to read rejected data or bad data from bad file and reload it to target? correction the rejected data and send to target relational tables using loadorder utility. Find out the rejected data by using column indicatior and row indicator. How do you transfert the data from data warehouse to flatfile? You can write a mapping with the flat file as a target using a DUMMY_CONNECTION. A flat file target is built by pulling a source into target space using Warehouse Designer tool. At the max how many tranformations can be us in a mapping? 22 transformation, expression, joiner, aggregator, router, stored procedure etc. You can find on Informatica transformation tool bar. There is no such limitation to use this number of transformations. But in performance point of view using too many transformations will reduce the session performance.

My idea is "if needed more tranformations to use in a mapping its better to go for some stored procedure." Always remember when designing a mapping: less for more design with the least number of transformations that can do the most jobs. What is the difference between Narmal load and Bulk load? what is the difference between powermart and power center? when we go for unconnected lookup transformation? bulk load is faster than normal load. In case of bulk load informatica server by passes the data base log file so we can not roll bac the transactions. Bulk load is also called direct loading. Normal Load: Normal load will write information to the database log file so that if any recorvery is needed it is will be helpful. when the source file is a text file and loading data to a table,in such cases we should you normal load only, else the session will be failed. Bulk Mode: Bulk load will not write information to the database log file so that if any recorvery is needed we can't do any thing in such cases. compartivly Bulk load is pretty faster than normal load. Rule of thumb For small number of rows use Normal load For volume of data use bulk load what is a junk dimension? A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades. can we lookup a table from a source qualifer transformation-unconnected lookup No. we can't do. I will explain you why. 1) Unless you assign the output of the source qualifier to another transformation or to target no way it will include the feild in the query. 2) source qualifier don't have any variables feilds to utalize as expression. how to get the first 100 rows from the flat file into the target?

please check this one, task ----->(link) session (workflow manager) double click on link and type $$source sucsess rows(parameter in session variables) = 100 it should automatically stops session. I'd copy first 100 records to new file and load. Just add this Unix command in session properties --> Components --> Pre-session Command head -100 <source file path> > <new file name> Mention new file name and path in the Session --> Source properties. 1. Use test download option if you want to use it for testing. 2. Put counter/sequence generator in mapping and perform it. can we modify the data in flat file? Just open the text file with notepad, change what ever you want (but datatype should be the same) difference between summary filter and details filter? Summary Filter --- we can apply records group by that contain common values. Detail Filter --- we can apply to each and every record in a database. what are the difference between view and materialized view? Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse. A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data view- the select query is stored in the db. whenever u use select from view, the stored query is executed. Effectively u r calling the stored query. In case u want use the query repetadly or complex queries, we store the queries in Db using View. where as materialized view stores the data as well. like table. here storage parameters are required. In case of materialised view we can perform DML but reverse is not true in case of simple view. Compare Data Warehousing Top-Down approach with Bottom-up approach? bottom approach is the best because in 3 tier architecture datatier is the bottom one.

At the time of software intragartion buttom/up is good but implimentatino time top/down is good top down ODS-->ETL-->Datawarehouse-->Datamart-->OLAP Bottom up ODS-->ETL-->Datamart-->Datawarehouse-->OLAP in top down approch: first we have to build dataware house then we will build data marts. which will need more crossfunctional skills and timetaking process also costly. in bottom up approach: first we will build data marts then data warehuse. the data mart that is first build will remain as a proff of concept for the others. less time as compared to above and less cost. Nothing wrong with any of these approaches. It all depends on your business requirements and what is in place already at your company. Lot of folks have a hybrid approach. For more info read Kimball vs Inmon.. Discuss which is better among incremental load, Normal Load and Bulk load? It depends on the requirement. Otherwise Incremental load which can be better as it takes onle that data which is not available previously on the target. If the database supports bulk load option from Infromatica then using BULK LOAD for intial loading the tables is recommended. Depending upon the requirment we should choose between Normal and incremental loading strategies. What is the difference between connected and unconnected stored procedures. Unconnected: The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping. connected: The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.

Differences between Informatica 6.2 and Informatica 7.0 in 7.0 intorduce custom transfermation and union transfermation and also flat file lookup condition Features in 7.1 are : 1.union and custom transformation 2.lookup on flat file 3.grid servers working on different operating systems can coexist on same server 4.we can use pmcmdrep 5.we can export independent and dependent rep objects 6.we ca move mapping in any web application 7.version controlling 8.data profilling What are the Differences between Informatica Power Center versions 6.2 and 7.1, also between Versions 6.2 and 5.1? graphical enhecements and xml files the main difference between informatica 5.1 and 6.1 is that in 6.1 they introduce a new thing called repository server and in place of server manager(5.1), they introduce workflow manager and workflow monitor. In ver 7x u have the option of looking up (lookup) on a flat file. U can write to XML target. Versioning LDAP authentication Support of 64 bit architectures whats the diff between Informatica powercenter server, repositoryserver and repository? Repository is nothing but a set of tables created in a DB.it stores all metadata of the infa objects. Repository server is one which communicates with the repository i.e DB. all the metadata is retrived from the DB through Rep server.All the client tools communicate with the DB through Rep server.

Infa server is one which is responsible for running the WF, tasks etc... Infa server also communicates with the DB through Rep server. power center server-power center server does the extraction from the source and loaded it to the target. repository server-it takes care of the connection between the power center client and repository. repository-it is a place where all the metadata information is stored.repository server and power center server access the repository for managing the data. how to create the staging area in your database creating of staging tables/area is the work of data modellor/dba.just like " create table <tablename>......" the tables will be created. they will have some name to identified as staging like dwc_tmp_asset_eval. tmp-----> indicate temparary tables nothing but staging A Staging area in a DW is used as a temporary space to hold all the records from the source system. So more or less it should be exact replica of the source systems except for the laod startegy where we use truncate and reload options. So create using the same layout as in your source tables or using the Generate SQL option in the Warehouse Designer tab. what does the expression n filter transformations do in Informatica Slowly growing target wizard? EXPESSION transformation detects and flags the rows from source. Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update strategy transformation Expression finds the Primary key is or not, and calculates new flag Based on that New Flag, filter transformation filters the Data You can use the Expression transformation to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. Briefly explian the Versioning Concept in Power Center 7.1? In power center 7.1 use 9 Tem server i.e add in Look up. But in power center 6.x use only 8 tem server.and add 5 transformation . in 6.x anly 17 transformation but 7.x use 22 transformation. When you create a version of a folder referenced by shortcuts, all shortcuts continue to reference their original object in the original version. They do not automatically update to the current folder version.

For example, if you have a shortcut to a source definition in the Marketing folder, version 1.0.0, then you create a new folder version, 1.5.0, the shortcut continues to point to the source definition in version 1.0.0. Maintaining versions of shared folders can result in shortcuts pointing to different versions of the folder. Though shortcuts to different versions do not affect the server, they might prove more difficult to maintain. To avoid this, you can recreate shortcuts pointing to earlier versions, but this solution is not practical for much-used objects. Therefore, when possible, do not version folders referenced by shortcuts. How to join two tables without using the Joiner Transformation. Itz possible to join the two or more tables by using source qualifier.But provided the tables should have relationship. When u drag n drop the tables u will getting the source qualifier for each table.Delete all the source qualifiers.Add a common source qualifier for all.Right click on the source qualifier u will find EDIT click on it.Click on the properties tab,u will find sql query in that u can write ur sqls can do using source qualifer, but some limitations are there. Identifying bottlenecks in various components of Informatica and resolving them. Can Informatica be used as a Cleansing Tool? If Yes, give example of transformations that can implement a data cleansing routine. Yes, we can use Informatica for cleansing data. some time we use stages to cleansing the data. It depends upon performance again else we can use expression to cleasing data. For example an feild X have some values and other with Null values and assigned to target feild where target feild is notnull column, inside an expression we can assign space or some constant value to avoid session failure. The input data is in one format and target is in another format, we can change the format in expression. we can assign some default values to the target to represent complete set of data in the target. info can be used as a cleansing tool. usually EXP is used for cleaning data using data cleansing functions and other functions present in infa. eg, assign default values for a not null fields. remove spaces if any in flat file sources by usning LTRIM and RTRIM.

if date has some char in between then remove that char by using REPLACECHAR. u can use SUBSTR to remove extra char. there are many more functions and uses to be explored. How do you decide whether you need ti do aggregations at database level or at Informatica level? It depends upon our requirment only.If you have good processing database you can create aggregation table or view at database level else its better to use informatica. Here i'm explaing why we need to use informatica. what ever it may be informatica is a thrid party tool, so it will take more time to process aggregation compared to the database, but in Informatica an option we called "Incremental aggregation" which will help you to update the current values with current values +new values. No necessary to process entire values again and again. Unless this can be done if nobody deleted that cache files. If that happend total aggregation we need to execute on informatica also. In database we don't have Incremental aggregation facility see informatica is basically a integration tool.it all depends on the source u have and ur requirment.if u have a EMS Q, or flat file or any source other than RDBMS, u need info to do any kind of agg functions. if ur source is a RDBMS, u r not only doing the aggregation using informatica right?? there will be a bussiness logic behind it. and u need to do some other things like looking up against some table or joining the agg result with the actual source. etc... if in informatica if u r asking whether to do it in the mapping level or at DB level, then fine its always better to do agg at the DB level by using SQL over ride in SQ, if only aggr is the main purpose of ur mapping. it definetly improves the performance. How do we estimate the depth of the session scheduling queue? Where do we set the number of maximum concurrent sessions that Informatica can run at a given time? How do we estimate the number of partitons that a mapping really requires? Is it dependent on the machine configuration? it depends upon the informatica version we r using. suppose if we r using informatica 6 it supports only 32 partitions where as informatica 7 supports 64 partitions. Suppose session is configured with commit interval of 10,000 rows and source has 50,000 rows. Explain the commit points for Source based commit and Target based commit. Assume appropriate value wherever required.

Source based commit will commit the data into target based on commit interval.so,for every 10,000 rows it will commit into target. Target based commit will commit the data into target based on buffer size of the target.i.e., it commits the data into target when ever the buffer fills.Let us assume that the buffer size is 6,000.So,for every 6,000 rows it commits the data. We are using Update Strategy Transformation in mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica.
Operation Insert Update Delete Reject Constant DD_INSERT DD_UPDATE DD_DELETE DD_REJECT Numeric value 0 1 2 3

if u r using an update strategy in ur mapping, there is no such oprtion to check or uncheck for these operations.when u have to perform any of the DML or database operations u have to code it in the UPD manually. so there is no chance for checking this. if u have used DD_UPDATE or DD_REJECT, u can only know it by querying the target table.if its rejected then through session log. what is the procedure to write the query to list the highest salary of three employees? SELECT sal FROM (SELECT sal FROM my_table ORDER BY sal DESC) WHERE ROWNUM < 4; which objects are required by the debugger to create a valid debug session? Intially the session should be valid session. source, target, lookups, expressions should be availble, min 1 break point should be available for debugger to debug your session. We can create a valid debug session even without a single break-point. But we have to give valid database connection details for sources, targets, and lookups used in the mapping and it should contain valid mapplets (if any in the mapping). What is the limit to the number of sources and targets you can have in a mapping there is one formula.. no.of bloccks=0.9*( DTM buffer size/block size)*no.of partitions.

here no.of blocks=(source+targets)*2 As per my knowledge there is no such restriction to use this number of sources or targets inside a mapping. Question is " if you make N number of tables to participate at a time in processing what is the position of your database. I orginzation point of view it is never encouraged to use N number of tables at a time, It reduces database and informatica server performance" which is better among connected lookup and unconnected lookup transformations in informatica or any other ETL tool? Its not a easy question to say which is better out of connected, unconnected lookups. Its depends upon our experience and upon the requirment. When you compared both basically connected lookup will return more values and unconnected returns one value. conn lookup is in the same pipeline of source and it will accept dynamic caching. Unconn lookup don't have that faclity but in some special cases we can use Unconnected. if o/p of one lookup is going as i/p of another lookup this unconnected lookups are favourable. If you are having defined source you can use connected, source is not well defined or from different database you can go for unconnected We are using like that only

Asking question is very easy, but it all depends on the requirments. dynamic cache is usually not preffered as a performance issue.it reduces. only when u want to track the changes to the target table records in that perticular run, u will be using the dynamic cache. when u want to use the same lookup table many times in the same mapping, then obviously u go for a unconnected lookup rather than creating same connected lookup many times. as i said earlier each has its own advan and disadvan and it depends on the requirment. in Dimensional modeling fact table is normalized or denormalized?in case of star schema and incase of snow flake schema? Star schmea--De-Normalized Dimesions Snow Flake Schema-- Normalized dimesions In Dimensional modeling, Star Schema: A Single Fact table will be surrounded by a group of Dimensional tables comprise of de- normalized data Snowflake Schema: A

Single Fact table will be surrounded by a group of Dimensional tables comprised of normalized dataThe Star Schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The Star Schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.The Snowflake Schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a Product-category table, and a product-manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Fact tables are always in de-nomalized irrespective of whether it is a Snow Flake or Star Schema. Fact tables are nothing but OLAP tables and so they are denormalised. In general In Star Schema Fact table is normalized & Dimension tables are denormalized. In Snow flake star shema Fact table is normalized & dimensional tables are also normalized. where we have to use more no of joins. So Star schema is better way of representing the data. What is difference between IIF and DECODE function? You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative: IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3, BONUS))), 0 ) You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF :

SALES > 0 and SALES < 50, SALARY1, SALES > 49 AND SALES < 100, SALARY2, SALES > 99 AND SALES < 200, SALARY3, SALES > 199, BONUS) What are variable ports and list two situations when they can be used? We have mainly tree ports Inport, Outport, Variable port. Inport represents data is flowing into transformation. Outport is used when data is mapped to next transformation. Variable port is used when we mathematical caluculations are required. If any addition i will be more than happy if you can share. For example if you are trying to calculate bonus from emp table

Bonus=sal*0.2 Totalsal=sal+comm.+bonus variable port is used to break the complex expression into simpler and also it is used to store intermediate values How does the server recognise the source and target databases? by using ODBC connection.if it is relational.if is flat file FTP connection..see we can make sure with connection in the properties of session both sources && targets. How to retrive the records from a rejected file. explane with syntax or example During the execution of workflow all the rejected rows will be stored in bad files(where your informatica server get installed;C:\Program Files\Informatica PowerCenter 7.1\Server) These bad files can be imported as flat a file in source then thro' direct maping we can load these files in desired format. How to lookup the data on multiple tabels. if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then lookup that joined table. informatica provieds lookup on joined tables, hats off to informatica When you create lookup transformation that time INFA asks for table name so you can choose either source, target , import and skip. So click skip and the use the sql overide property in properties tab to join two table for lookup.

what ever my friends have answered earlier is correct. to be more specific, if the two tables are relational, then u can use the SQL lookup over ride option to join the two tables in the lookup properties.u cannot join a flat file and a relatioanl table. eg: lookup default query will be select lookup table column_names from lookup_table. u can now continue this query. add column_names of the 2nd table with the qualifier, and a where clause. if u want to use a order by then use -- at the end of the order by. What is the procedure to load the fact table.Give in detail? Based on the requirement to your fact table, choose the sources and data and transform it based on your business needs. For the fact table, you need a primary key so use a sequence generator transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys from the source tables. we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to load the fact and dimension tables,by using these 2 wizards we can create different types of mappings according to the business requirements and load into the star schemas(fact and dimension tables). usually source records are looked up with the records in the dimension table.DIM tables are called lookup or reference table. all the possible values are stored in DIM table. e.g product, all the existing prod_id will be in DIM table. when data from source is looked up against the dim table, the corresponding keys are sent to the fact table.this is not the fixed rule to be followed, it may vary as per ur requirments and methods u follow.some times only the existance check will be done and the prod_id itself will be sent to the fact. What is the use of incremental aggregation? Explain me in brief with an example. its a session option. when the informatica server performs incremental aggr. it passes new source data through the mapping and uses historical chache data to perform new aggregation caluculations incrementaly. for performance we will use it. How to delete duplicate rows in flat files source is any option in informatica use a sorter transformation , in that u will have a "distinct" option make use of it can use a dynamic lookup or an aggregator or a sorter for doing this how to use mapping parameters and what is their use mapping parameters and variables make the use of mappings more flexible.and also it avoids creating of multiple mappings. it helps in adding incremental data.mapping parameters and variables has to create in the mapping designer by choosing the menu option as Mapping ----> parameters and variables and the enter the name for the variable or parameter but it has to be preceded by $$. and choose type as parameter/variable, datatypeonce defined the variable/parameter is in the any expression for example in SQ transformation in the source filter prop[erties tab. just enter filter condition and finally create a parameter file to assgn the value for the variable / parameter and configigure the

session properties. however the final step is optional. if ther parameter is npt present it uses the initial value which is assigned at the time of creating the variable hi all, recently i was asked by the interviewer like, in the concept of mapping parameters and variables, the variable value will be saved to the repository after the completion of the session and the next time when u run the session, the server takes the saved variable value in the repository and starts assigning the next value of the saved value. for example i ran a session and in the end it stored a value of 50 to the repository.next time when i run the session, it should start with the value of 70. not with the value of 51. how to do this. u can do onething after running the mapping,, in workflow manager start-------->session. right clickon the session u will get a menu, in that go for persistant values, there u will find the last value stored in the repository regarding to mapping variable. then remove it and put ur desired one, run the session... i hope ur task will be done it takes value of 51 but u can override the saved variable in the repository by defining the value in the parameter file.if there is a parameter file for the mapping variable it uses the value in the parameter file not the value+1 in the repositoryfor example assign the value of the mapping variable as 70.in othere words higher preference is given to the value in the parameter file mapping parameter represents a constant value that u can define before running session and it returns same value. mapping variable represents avalue that can change throughout the session. can any one comment on significance of oracle 9i in informatica when compared to oracle 8 or 8i. i mean how is oracle 9i advantageous when compared to oracle 8 or 8i when used in informatica Actually oracle 8i not allowed userdefined data types but 9i allows and then blob,clob allow only 9i not 8i and more over list partinition is there in 9i only Can we use aggregator/active transformation after update strategy transformation

You can use aggregator after update strategy. The problem will be, once you perform the update strategy, say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted from this aggregator transformation. why dimenstion tables are denormalized in nature ? Because in Data warehousing historical data should be maintained, to maintain historical data means suppose one employee details like where previously he worked, and now where he is working, all details should be maintain in one table, if u maintain primary key it won't allow the duplicate records with same employee id. so to maintain historical data we are all going for concept data warehousing by using surrogate keys we can achieve the historical data(using oracle sequence for critical column). so all the dimensions are marinating historical data, they are de normalized, because of duplicate entry means not exactly duplicate record with same employee number another record is maintaining in the table. dear reham thanks for ur responce,,,,, First of all i want to tell one thing to all users who r using this site. please give answers only if u r confident about it. refer it once again in the manual its not wrong. If we give wrong answers lot of people who did't know the answer thought it as the correct answer and may fail in the interview. the site must be helpfull to other , please keep that in the mind. regarding why dimenstion tables r in denormalised in nature. i had discussed with my project manager about this. what he told is :-> The attributes in a dimension tables are used over again and again in queries. for efficient query performance it is best if the query picks up an attribute from the dimension table and goes directly to the fact table and do not thru the intermediate tables. " if we normalized the dimension table we will create such intermediate tables and that will not be efficient Yes, what your manager told is correct. Apart from this, we maintain Hierarchy in these tables. Maintaining Hierarchy is pretty important in the dwh environment. For example, if there is a child table and then a parent table. if both child and parent are kept in different tables, one has to every time join or query both these tables to get the parent child relation. so if we have both child and parent in the same table, we can always refer immediately. this may be a case. Similary, if we have a hierarchy something like this...county > city > state > territory > division > region > nation If we have different tables for all, it would be a waste of database space and also, we need to query all these tables everytime. Thats why we maintain hierarchy in dimension tables

and based on the business, we decide whether to maintain in the same table or different tables. In a sequential Batch how can we stop single session? we can stop it using PMCMD command or in the monitor right click on that perticular session and select stop.this will stop the current session and the sessions next to it. How do you handle decimal places while importing a flatfile into informatica? while geeting the data from flat file in informatica ie import data from flat file it will ask for the precision just enter that while importing the flat file, the flat file wizard helps in configuring the properties of the file so that select the numeric column and just enter the precision value and the scale. precision includes the scale for example if the number is 98888.654, enter precision as 8 and scale as 3 and width as 10 for fixed width flat file while importing flat file definetion just specify the scale for a neumaric data type. in the mapping, the flat file source supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source. source ->number datatype port ->SQ -> decimal datatype.Integer is not supported. hence decimal is taken care. If you are workflow is running slow in informatica. Where do you start trouble shooting and what are the steps you follow? when the work flow is running slowly u have to find out the bottlenecks in this order target source mapping session system work flow may be slow due to different reasons one is alpa characters in decimal data check it out this and due to insufficient length of strings check with the sql override If you have four lookup tables in the workflow. How do you troubleshoot to improve performance? Use shared cache When a workflow has multiple lookup tables use shered cache there r many ways to improve the mapping which has multiple lookups.

1) we can create an index for the lookup table if we have permissions(staging area). 2) divide the lookup mapping into two (a) dedicate one for insert means: source - target,, these r new rows . only the new rows will come to mapping and the process will be fast . (b) dedicate the second one to update : source=target,, these r existing rows. only the rows which exists allready will come into the mapping. 3)we can increase the chache size of the lookup. can anyone explain error handling in informatica with examples so that it will be easy to explain the same in the interview? go to the session log file there we will find the information regarding to the session initiation process, errors encountered. load summary. so by seeing the errors encountered during the session running, we can resolve the errors. You can create some generalized transformations to handle the errors and use them in your mapping. For example for data types create one generalized transformation and include in your mapping then you will know the errors where they are occuring. Also you can setup bad files and rejected files which you can use to see error records. There is a way also to reload bad/rejected records using information tools. What is the procedure or steps implementing versioning if you are already in version7.X. Any gotcha\'s or precautions.. For version control in ETL layer using informatica, first of all after doing anything in your designer mode or workflow manager, do the following steps..... 1> First save the changes or new implementations. 2>Then from navigator window, right click on the specific object you are currently in. There will be a pop up window. In that window at the lower end side, you will find versioning->Check In. A window will be opened. Leave the information you have done like "modified this mapping" etc. Then click ok button. OK......Still there is also another shortcut method for this...But I'll give that some other time. How do I import VSAM files from source to target. Do I need a special plugin As far my knowledge by using power exchange tool convert vsam file to oracle tables then do mapping as usual to the target table.

In mapping Designer we have direct option to import files from VSAM Navigation : Sources => Import from file => file from COBOL. Differences between Normalizer and Normalizer transformation Normalizer: It is a transormation mainly using for cobol sources, it's change the rows into coloums and columns into rows Normalization:To remove the retundancy and inconsitecy What is IQD file? IQD file is nothing but Impromptu Query Definetion,This file is maily used in Cognos Impromptu tool after creating a imr( report) we save the imr as IQD file which is used while creating a cube in power play transformer.In data source type we select Impromptu Query Definetion. What is data merging, data cleansing, sampling? Cleansing:---TO identify and remove the retundacy and inconsistency sampling: just smaple the data throug send the data from source to target How to import oracle sequence into Informatica. CREATE ONE PROCEDURE AND DECLARE THE SEQUENCE INSIDE THE PROCEDURE,FINALLY CALL THE PROCEDURE IN INFORMATICA WITH THE HELP OF STORED PROCEDURE TRANSFORMATION. Could anyone please tell me what are the steps required for type2 dimension/version data mapping. how can we implement it go to mapping designer in it go for mapping select wizard in it go for slowly changind dimension here u will find a new window their u need to give the mapping name source table target table and type of scd then if select finish scd 2 mapping is craeted go to waredesigner and generate the table then validate the mapping in mapping designer save it to repository run the session in workflow manager later update the source table and re run agian u will find the difference in target table 1. Determine if the incoming row is 1) a new record 2) an updated record or 3) a record that already exists in the table using two lookup transformations. Split the mapping into 3 seperate flows using a router transformation. 2. If 1) create a pipe that inserts all the rows into the table.

3. If 2) create two pipes from the same source, one updating the old record, one to insert the new. With out using Updatestretagy and sessons options, how we can do the update our target table? In session properties, There is an option insert update insert as update update as update like that by using this we will easily solve By default all the rows in the session is set as insert flag ,you can change it in the session general properties -- Treate source rows as :update so, all the incoming rows will be set with update flag.now you can update the rows in the target table if ur database is teradata we can do it with a tpump or mload external loader. update override in target properties is used basically for updating the target table based on a non key column.e.g update by ename.its not a key column in the EMP table.But if u use a UPD or session level properties it necessarily should have a PK. Two relational tables are connected to SQ Trans,what are the possible errors it will be thrown? The only two possibilities as of I know is Both the table should have primary key/foreign key relation ship Both the table should be available in the same schema or same database what is the best way to show metadata(number of rows at source, target and each transformation level, error related data) in a report format When your workflow get completed go to workflow monitor right click the session .then go to transformation statistics ther we can see number of rows in source and target.if we go for session properties we can see errors related to data

You can select these details from the repository table. you can use the view REP_SESS_LOG to get these data If u had to split the source level key going into two seperate tables. One as surrogate and other as primary. Since informatica does not gurantee keys are loaded properly(order!) into those tables. What are the different ways you could handle this type of situation? foreign key How to append the records in flat file(Informatica) ? Where as in Datastage we have the options i) overwrite the existing file ii) Append existing file this is not there in Informatica v 7. but heard that its included in the latest version 8.0 where u can append to a flat file. Its about to be shipping in the market. what are partition points? Partition points mark the thread boundaries in a source pipeline and divide the pipeline into stages. what are cost based and rule based approaches and the difference Cost based and rule based approaches are the optimization techniques which are used in related to databases, where we need to optimize a sql query. Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques., bcz the third has some disadvantages.) When ever you process any sql query in Oracle, what oracle engine internally does is, it reads the query and decides which will the best possible way for executing the query. So in this process, Oracle follows these optimization techniques. 1. cost based Optimizer(CBO): If a sql query can be executed in 2 different ways ( like may have path 1 and path2 for same query),then What CBO does is, it basically calculates the cost of each path and the analyses for which path the cost of execution is less and then executes that path so that it can optimize the quey execution. 2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query. So depending on the number of rules which are to be applied, the optimzer runs the query. Use: If the table you are trying to query is already analysed, then oracle will go with CBO. If the table is not analysed , the Oracle follows RBO.

For the first time, if table is not analysed, Oracle will go with full table scan. what is mystery dimention? using Mystery Dimension ur maitaining the mystery data in ur Project

Mystery Dimensions
Ralph Kimball

Often, we data warehouse architects drive a fact tables design from a specific data source. A typical complex example might be a set of records describing investment transactions. A recent example I studied had more than 50 fields in the raw data, and the end users assured me that all the data was relevant and valuable. Because every record in the data represented an investment transaction, and all the investments were somewhat similar, I hoped that the data source would generate only one fact table, where the grain was the individual transaction. But the 50 fields intimidated me. What on earth was all that stuff? Investment transactions are good examples of complex, messy data. The complexity isnt the database designers fault. These transactions are complex because there are a lot of context parameters and many special parameters describing modern financial investments. When a design challenge such as this confronts me, I try to stand back from the details and perform a kind of triage.

Find the Obvious Dimension-Related Fields For the first step of triage, I find fields in the source data that are obviously parts of dimensions. Timestamps are straightforward. Maybe four separate timestamps describe our investment transaction. Each of these can be a time dimension, where we ask a single underlying calendar dimension to play four roles. We can accomplish this task by creating four views on the single underlying calendar table. I discussed the details of this approach in my DBMS column Data Warehouse Role Models (August 1997; see Resources). Other straightforward dimension-related fields in our investment transaction include account numbers, account types, portfolio numbers, transaction types and codes, customer names and numbers, broker names and numbers, and location-specific information. A typical raw source data record is likely to be a kind of flat record containing both keys for these entities as well as descriptive text such as account type and customer name. In the case of the 50 investment transaction fields Ive described, we can quickly identify no fewer than 20 of the fields as dimension related. We need to place a lot of redundant textual information in conventional dimensions, but after the dust clears, there are still 12 independent dimensions, of which four were roles the time dimension played. (See Figure 1, page 34.) Find the Fact- Related Fields The second step of the triage is to look for the numeric measurements. Anything that is a floating-point number or a scaled integer (such as a currency value) is likely to be a measurement. If the value varies in a seemingly random way between records and takes on a very large number of different values, then it is almost surely a measurement. In the case of the 50 investment transaction fields, 20 of the fields clearly fit the characteristics of measurements. But five of the fields turned out to be cumulative measures that are not appropriate to the grain of an individual transaction. We excuse these five fields from the design and keep the remaining 15 fields, which we model as facts. Now you may be thinking, what kind of weird transaction could possibly have 15 simultaneous facts? Thats a good question, because none of the source data transaction

records actually had all 15 facts. Certain kinds of transactions gave rise to one set of facts and other transactions gave rise to an overlapping, but different set of facts. Most important, there was no disjoint partitioning of all the transactions that would separate the clumps of facts into nice groups. There were many transaction types and many investment account types; the pattern of measurements across these types and accounts was too complex to describe or neatly segment. In this sense, we could vindicate the raw datas design because it had to be flexible enough to handle many different investment transactions situations, including future types of investments that the records had not yet described.

Decide What to Do With the Rest So far we have accounted for 40 of the 50 fields in our original data. But there are still 10 mystery fields left over. (See Figure 1.) These fields arent obvious textual dimension attributes or obvious foreign key values, so they may not feel like dimensions. The fields do not appear to be numeric measurements. When the fields are present, they seem to take on a small range of discrete values. Some of them are designated as codes, but no one is entirely sure of their significance. At this point, I ask an obligatory, but pointless question: If we dont know what the field means, why dont we leave it out of the design? The answer, of course, is that someone may need it, so we will leave it in. Actually, in spite of this frustrating third step of the triage, we have proceeded correctly. The value of the triage approach is to quickly identify the easy choices (in this case the obvious dimensions and facts), and to isolate a hopefully small subset of difficult data elements that require individual attention.
FIGURE 1 The logical progression of transforming a complex single data source into its corresponding dimensional model.

Also, perhaps at this point you are thinking that if we had a proper enterprise data model, then all these problems would have been sorted out and we wouldnt have to pursue such an ad hoc approach. Well, I couldnt agree more. If an enterprise data model is a model of real data, then I am its biggest fan. In that case, this article probably describes a specific episode in building that very useful enterprise data model. But if the enterprise data model describes a kind of abstract, ideal data world, describing how data should be if only it were designed correctly, then I have very little patience. Idealized enterprise data models are only of marginal use when we try to take real data and deliver it to end users on a tight budget and time frame. Idealized enterprise data models arent populated with data.

Transform Mystery Fields Into Mystery Dimensions Returning to our problem of 10 rogue fields that seem to be neither dimensions nor facts, we may be tempted to just leave them in the fact table. This is almost certainly a bad idea. Our goal should be to make these fields into dimensions. Many of the codes or alphanumeric fields would otherwise take up too much room, and we could drastically compress them if we could make them into dimensions.

Another easy approach is to just make 10 more dimensions, one for each mystery field. While this does place these low cardinality codes and textual values in dimension tables where we can easily index and constrain upon them, we now have 22 dimensions in our design, and that should raise a warning flag. More important, many of these fields may be mildly or strongly correlated with each other, even if we are not completely sure what they mean. We need to make a significant effort to find the correlated mystery fields and group them together into a smaller number of new dimensions. Grouping correlated fields together has a couple of attractive benefits. First, it will be interesting to browse the correlated fields against each other. Maybe some of them will turn out to have hierarchical relationships. These relationships can be revealed when the fields are compressed into dimension tables where only the unique combinations are presented. Second, the number of new dimensions required will be reduced.

When Should Two Fields Be in a Single Dimension? We are almost within reach of our final goal. We have separated off the obvious original dimensions and facts with our triage decisions. We have decided that all that remains is to package the rest of the mystery fields into a few more dimensions. Should we just make one huge mystery dimension for all these remaining fields? That would seem to solve a number of problems. All the fields would go away to be replaced by a single key. But this approach is likely to produce a dimension with as many records as the fact table itself. If the dimension contains several uncorrelated fields, then there will be very few repeated values for the whole dimension record, and every transaction would produce a new mystery dimension record. The secret to this last step of the design is to group the mystery fields together into correlated groups. Each of these correlated groups becomes a new dimension. It is wise to be flexible when searching for these correlations. Suppose FieldX has 100 discrete values and FieldY has 1,000 discrete values. The key question is: How many unique FieldX + FieldY combinations exist in the data? If there are exactly 1,000 such combinations, then FieldX is a hierarchical parent of FieldY and they should absolutely be in the same dimension table. If the number of FieldX + FieldY combinations approaches 100,000, then the two fields are virtually independent and we would gain very little by placing them in the same dimension. But the situation is rarely so extreme. The number of FieldX + FieldY combinations might be 5,000 or 10,000. Even this correlation is pretty interesting, and the two fields should be part of the same dimension. To discover this case, you may have to comb the data, counting combinations of values in order to figure out what to do. Finally, try to keep perspective. If you have five uncorrelated fields, but they each have only three values, then it would be reasonable to package them all in a single mystery dimension. Yes, we end up with the Cartesian product of the fields, but there are only 35 = 243 possible combinations, a small and convenient mystery dimension. Ultimately, you should not be striving for mathematical elegance; rather, you should be making pragmatic packaging decisions that best fit your data and your tools.

what is difference b/w Informatica 7.1 and Abinitio in Informatica there is the concept of co-operating system, which makes the mapping in parallel fashion which is not in Informatica There is a lot of diffrence between informatica an Ab Initio In Ab Initio we r using 3 parllalisim but Informatica using 1 parllalisim In Ab Initio no scheduling option we can scheduled manully or pl/sql script but informatica contains 4 scheduling options Ab Inition contains co-operating system but informatica is not Ramp time is very quickly in Ab Initio campare than Informatica Ab Initio is userfriendly than Informatica Can i start and stop single session in concurent bstch? ya shoor,Just right click on the particular session and going to recovery option or by using event wait and event rise want to prepare a questionnaire. The details about it are as follows: 1. Identify a large company/organization that is a prime candidate for DWH project. (For example Telecommunication, an insurance company, banks, may be the prime candidate for this) 2. Give at least four reasons for the selecting the organization. 3. Prepare a questionnaire consisting of at least 15 non-trivial questions to collect requirements/information about the organization. This information is required to build data warehouse. Can you please tell me what should be those 15 questions to ask from a company, say a telecom company?

First of all meet your sponsors and make a BRD(business requirement document) about their expectation from this datawarehouse(main aim comes from them).For example they need :customer billing process.Now goto business managment team :they can ask for metrics out of billing process for their use.Now magament people :monthly usage,billing metrics,sales organization,rate plan to perform sales rep and channel performance analysis and rate plan analysis. So your dimension tables can be:Customer (customer id,name,city,state etc)Sales rep;sales rep number,name,idsales org:sales ord idBill dimension: Bill #,Bill date,Numberrate plan:rate plan codeAnd Fact table can be:Billing details(bill #,customer id,minutes used,call details etc)you can follow star and snow flake schema in this case.Depend upon the granualirty of your data. what is difference between lookup cashe and unchashed lookup? Can i run the mapping with out starting the informatica server? the difference between cache and uncacheed lookup iswhen you configure the lookup transformation cache lookup it stores all the lookup table data in the cache when the first input record enter into the lookup transformation, in cache lookup the select statement executes only once and compares the values of the input record with the values in the cachebut in uncache lookup the the select statement executes for each input record entering into the lookup transformation and it has to connect to database each time entering the new record what is the difference between stop and abort stop: _______If the session u want to stop is a part of batch you must stop the batch, if the batch is part of nested batch, Stop the outer most bacth\ Abort:---You can issue the abort command , it is similar to stop command except it has 60 second time out . If the server cannot finish processing and commiting data with in 60 sec Stop: In this case data query from source databases is stopped immediately but whatever data has been loaded into buffer, there transformations and loading contunes.Abort: Same as Stop but in this case maximum time allowed for buffered data is 60 Seconds. The PowerCenter Server handles the abort command for the Session task like the stop command, except it has a timeout period of 60 seconds. If the PowerCenter Server cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.

What about rapidly changing dimensions?Can u analyze with an example? a rapidly changing dimensions are those in which values are changing continuously and giving lot of difficulty to maintain them. i am giving one of the best real world example which i found in some website while browsing. go through it. i am sure you like it. description of a rapidly changing dimension by that person: I'm trying to model a retailing case . I'm having a SKU dimension of around 150,000 unique products which is already a SCD Type 2 for some attributes. In addition I'm willing to track changes of the sales and purchase price. However these prices change almost daily for quite a lot of these products leading to a huge dimensional table and requiring continuous updations. so a better option would be shift those attributes into a fact table as facts, which solves the problem. how to write a filter condition to get all the records of employees hired between any two given dates. Target file has duplicate records, eventhough the source tables has single records. Selecting data from 5 different SQL tables using the user defined join. Generate the SQL in SQ Transformation and try to run the same in TOAD. can any one tell why we are populating time dimension only with scripts not with mapping? How do we load from PL/SQL script into Informatica mapping? You can use StoredProcedure transformation. There you can specify the pl/sql procedure name. when we run the session containing this transformation the pl/sql procedure will get executed. In workflow can we send multiple email ? yes, we can send multiple e-mail in a workflow What is the difference between materialized view and a data mart? Are they same? what is incremental loading in informatica(that is used to load only updated information in the source)?how and where u use it in informatica?

you got it correctly. incremental loading is loading updating old rows and inserting newly arrived rows. for this we use UpdateStrategy transformation. In each and every real time datawarehouse project this incremental loading is important. so UpdateStrategy as well. if you want detail inf on UpdateStrategy refer to the answer i gave to one question on UpdateStrategy. The above answer is slightly correct. The incremental loading is done in 3 ways by using Transformations. 1.Aggregate Transformation, 2.Dynamic Lookup Transformation, 3.Update Strategy Transformation, In the mapping we can use either Aggregate Transformation or Dynamic Lookup Transformation with Updatestrategy or filter Transformtion for performing update or insert the newly captured records. Session level in the property tab in the performane have option like "incremantal Aggragetion".If u enable this property the seession captures only newly records from the source. ) How to load the data (like refresh load, quarterly load, hourly load etc) 2) what is lookup ? (Lookup is a join is it wright or wrong) ub Lookup transformation is not a join operation. In simple terms you will check in a table whether any column satisfies the condition you are specifying. If it satisfies one or more columns of the satisfied row are used in the mapping. please follow this example carefully by imagining structures of the tables. ex: take source table REWARDS which contains empids who got rewards.

import the already existed EMP table into lookup transformation. then lookup or match empid fields of both tables. If you find a match means lookup is succeded. You can pass on required columns of matched rows for further processing. Here take sal attribute and by using expression attribute increment salary by Rs1000/-. Lookup transformation is mainly used when the informatica server does not have enough considerable information. It is used to lookup data in a relational table, synonym or view. Informatica server quieries the lookup table based on the lookup ports in the transformation. Main use of Lookup transformation is 1. Get related values 2. Perform caliculations 3. To Update slowly changing dimensions( here we use target table as lookup table and again connect to target table) It is often used to find present values like currency exchange rates, stocks.. what is the architecture of any Data warehousing project? what is the flow? 1)The basic step of datawarehousing starts with datamodelling. i.e creation dimensions and facts. 2)datawarehouse starts with collection of data from source systems such as OLTP,CRM,ERPs etc 3)Cleansing and transformation process is done with ETL(Extraction Transformation Loading) tool. 4)by the end of ETL process target databases(dimensions,facts) are ready with data which accomplishes the business rules. 5)Now finally with the use of Reporting tools(OLAP) we can get the information which is used for decision support. How to load time dimension? We can use SCD Type 1/2/3 to load any Dimensions based on the requirement. U can load time dimension manually by writing scripts in PL/SQL to load the time dimension table with values for a period. Ex:- M having my business data for 5 years from 2000 to 2004, then load all the date starting from 1-1-2000 to 31-12-2004 its around 1825 records. Which u can do it fast writing scripts.

In update strategy target table or flat file which gives more performance ? why? we use stored procedure for populating and maintaing databases in our mapping Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data will be in ASCII mode. Cons: There is no concept of updating existing records in flat file. As there is no indexes, while lookups speed will be lesser. How do you create single lookup transformation using multiple tables? Write a override sql query. Adjust the ports as per the sql query. why did u use update stategy in your application? Update Strategy is used to drive the data to be Inert, Update and Delete depending upon some condition. You can do this on session level tooo but there you cannot define any condition.For eg: If you want to do update and insert in one mapping...you will create two flows and will make one as insert and one as update depending upon some condition.Refer : Update Strategy in Transformation Guide for more information Update Strategy is the most important transformation of all Informatica transformations. The basic thing one should understand about this is , it is essential transformation to perform DML operations on already data populated targets(i.e targets which contain some records before this mapping loads data) It is used to perform DML operations. Insertion,Updation,Deletion,Rejection when records come to this transformation depending on our requirement we can decide whether to insert or update or reject the rows flowing in the mapping. For example take an input row , if it is already there in the target(we find this by lookup transformation) update it otherwise insert it. We can also specify some conditions based on which we can derive which update strategy we have to use. eg: iif(condition,DD_INSERT,DD_UPDATE) if condition satisfies do DD_INSERT otherwise do DD_UPDATE DD_INSERT,DD_UPDATE,DD_DELETE,DD_REJECT are called as decode options which can perform the respective DML operations.

There is a function called DECODE to which we can arguments as 0,1,2,3 DECODE(0) , DECODE(1) , DECODE(2) ,DECODE(3) for insertion updation deletion and rejection Why did you use stored procedure in your ETL Application? usage of stored procedure has the following advantages 1checks the status of the target database 2drops and recreates indexes 3determines if enough space exists in the database 4performs aspecilized calculation Stored procedure in Informatica will be useful to impose complex business rules. How can you improve the performance of Aggregate transformation? By using sorter transformation before the aggregator transformation. we can improve the agrregator performence in the following ways 1.send sorted input. 2.increase aggregator cache size.i.e Index cache and data cache. 3.Give input/output what you need in the transformation.i.e reduce number of input and output ports. 1.can u explain one critical mapping? 2.performance issue which one is better? whether connected lookup tranformation or unconnected one? it depends on your data and the type of operation u r doing. If u need to calculate a value for all the rows or for the maximum rows coming out of the source then go for a connected lookup. Or,if it is not so then go for unconnectd lookup. Specially in conditional case like, we have to get value for a field 'customer' from order tabel or from customer_data table,on the basis of following rule: If customer_name is null then ,customer=customer_data.ustomer_Id

otherwise customer=order.customer_name. so in this case we will go for unconnected lookup .how to enter same record twice in target table? give me syntax. 2. how to get particular record from the table in informatica? 3.how to create primary key only on odd numbers? 4. how to get the records starting with particular letter like A in informatica? Declere Target table twice in the mapping and move the output to both the target tables. don't choose the select distinct option and run the session again.u will find duplicate rows in u r target. But my querie is different means suppose if u hv 10 records in source thn i want same 10 records should b loaded twise such tht i should get all records twise means 20 records n only in single target table . use this syntex insert into table1 select * from table1 (table1 is the name of the table) when we create a target as flat file and source as oracle.. how can i specify first rows as column names in flat files... use a pre sql statement....but this is a hardcoding method...if you change the column names or put in extra columns in the flat file, you will have to change the insert statement For getting the column names in the flat files, you need to make changes in the Informatica server setup. Under Informatica Server setup-> under configuration there is a option you need to check, Output Metadata for target flat files. how many types of dimensions are available in informatica? hi there r 3 types of dimensions 1.star schema 2.snowflake schema 3.glaxy schema i think there r 3 types of dimension table 1.stand-alone

2,local. 3.global There are 3 types of dimensions available according to my knowledge. That are 1. General dimensions 2. Confirmed dimensions 3. Junk Dimensions I want each and every one of you who is answering please don't make fun out of this. some one gave the answer "no". some one gave the answer "star flake schema, snow flake schema etc"how can a schema come under a type of dimension ANSWER: One major classification we use in our real time modelling is Slowly Changing Dimensions type1 SCD: If you want to load an updated row of previously existed row the previous data will be replaced. So we lose historical data. type2 SCD: Here we will add a new row for updated data. So we have both current and past records, which aggrees with the concept of datawarehousing maintaining historical data. type3 SCD: Here we will add new columns. but mostly used one is type2 SCD. we have one more type of dimension that is CONFORMED DIMENSION: The dimension which gives the same meaning across different star schemas is called Conformed dimension. ex: Time dimension. where ever it was , gives the same meaning How can you say that union Transormation is Active transformation. We can merge multiple source qualier query records in union trans at same time , its not like expresion trans (each row). so we can say it is active.

Union Transformation is a active transformation because it changes the number of rows through the pipeline. It normally has multiple input groups to add on it compare to other transformation.Before union transformation was implement the funda on number of rows was right i.e before 7.0 but now its not the exact benchmark to determine the active transformation When do u use a unconnected lookup and connected lookup.... what is the difference between dynamic and static lookup...y and when do v use these types of lookups ( ie...dynamic and static ) Connected lookup to lookup only once and Unconnected to avoid multiple lookups of same table Dynamic lookup cahe,you can insert rows as you pass to target. Static lookup cache can not be inserted or updated in static lookup cache, you cache all the lookup data at the starting of the session. in dynamic lookup cache, you go and query the database to get the lookup value for each record which needs the lookup. static lookup cache adds to the session run time....but it saves time as informatica does not need to connect to your databse every time it needs to lookup. depending on how many rows in your mapping needs a lookup, you can decide on this...also remember that static lookup eats up space...so remember to select only those columns which are needed How many types of facts and what are they? I know some, there are Additive Facts,Semi-Additive, Non-Additive, Accumulating Facts,Factless facts,Periodic fact table, Transaction Fact table. There are Factless Facts:Facts without any measures. Additive Facts:Fact data that can be additive/aggregative. Non-Additive facts: Facts that are result of non-additon Semi-Additive Facts: Only few colums data can be added. Periodic Facts: That stores only one row per transaction that happend over a period of time. Accumulating Fact: stores row for entire lifetime of event. how do we load data by using period dimension? how do we do unit testing in informatica? how do we load data in informatica Unit testing are of two types 1. Quantitaive testing 2.Qualitative testing Steps.

1.First validate the mapping 2.Create session on themapping and then run workflow. Once the session is succeeded the right click on session and go for statistics tab. There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected.This is called Quantitative testing. If once rows are successfully loaded then we will go for qualitative testing. Steps 1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded according to the DATM then go and check in the code and rectify it. This is called Qualitative testing. This is what a devloper will do in Unit Testing. Why and where we are using factless fact table? Hi Iam not sure but you can confrirm with other people. Factless fact is nothing but Non-additive measures. EX: Temperature in fact table will note it as Moderate,Low,High. This type of things are called Non-additive measures. Factless Fact Tables are the fact tables with no facts or measures(numerical data). It contains only the foriegn keys of corresponding Dimensions. such fact tables are required to avoid flaking of levels within dimension and to define them as a separate cube connected to the main cube. what transformation you can use inplace of lookup? You can use the joiner transformation by setting as outer join of either master or detail. lookup's either we can use first or last value. for suppose lookup have more than one record matching, we need all matching records, in that situation we can use master or detail outer join instead of lookup.(according to logic) How can I get distinct values while mapping in Informatica in insertion?

You can add an aggregator before insert and group by the feilds that need to be distinct There are two methods to get distinct values: If the sources are databases then we can go for SQL-override in source qualifier by changing the default SQL query. I mean selecting the check box called [] select distinct. and if the sources are heterogeneous, i mean from different file systems, then we can use SORTER Transformation and in transformation properties select the check box called [] select distinct same as in source qualifier, we can get distinct values. What is change data capture? Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC, data extraction takes place at the same time the insert, update, or delete operations occur in the source tables, and the change data is stored inside the database in change tables. The change data, thus captured, is then made available to the target systems in a controlled manner. how can we store previous session logs We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to save) How to call stored Procedure from Workflow monitor in Informatica 7.1 version How to use the unconnected lookup i.e., from where the input has to be taken and the output is linked? What condition is to be given? The unconnected lookup is used just like a function call. in an expression output/variable port or any place where an expression is accepted(like condition in update strategy etc..), call the unconnected lookup...something like :LKP.lkp_abc(input_port).......(lkp_abc is the name of the unconnected lookup...(plz check the exact syntax)).....give the input value just like we pass parameters to functions, and it'll return the output after looking up. How to define Informatica server? informatica server is the main server component in informatica product family..Which is resonsible for reads the data from various source system and tranforms the data according to business rule and loads the data into the target table How do we do complex mapping by using flatfiles / relational database? How to move the mapping from one database to another? Do you mean migration between repositories? There are 2 ways of doing this. 1. Open the mapping you want to migrate. Go to File Menu - Select 'Export Objects' and give a name - an XML file will be generated. Connect to the repository where you

want to migrate and then select File Menu - 'Import Objects' and select the XML file name. 2. Connect to both the repositories. Go to the source folder and select mapping name from the object navigator and select 'copy' from 'Edit' menu. Now, go to the target folder and select 'Paste' from 'Edit' menu. Be sure you open the target folder. u can also do it this way. connect to both the repositories, open the respective folders. keep the destination repository as active. from the navigator panel just drag and drop the mapping to the work area. it will ask whether to copy the mapping,say YES. its done. if we go by the direct meaning of your question....there is no need for a new mapping for a new databse, you just need to change the connections in the workflow manager to run the mapping on another database my source is having 1000 rows. i have brought 300 records into my ODS. so next time i want to load the remaining records. so i need to load from 301 th record. when ever i start the work flow again it will load from the begining. how do we solve this problem. You can do a lookup on the target table and check for the rows already present there. Hence the first 300 records will not be reloaded to the target. You can also use variable to store the rownum of the final row you loaded in the target. From next time you can use this variable to load the rest of the data to the target You can use a filter transformation and set a condition there as rownum>300 if ur source is a flatfile, there is an option to specify from which row we want to read the data(used actually to eliminate reading headers)....i don't remeber where ths option is..try the source definition in the source anayser, or in the session properties... By using Sequence GeneratorTransformation u can do it ie by chaging the RESET option in the properties tab of your SequenceGeneratorTransformation. then it will work thanks for ur reply, but here the problem is even though u are storing the rownumber in a variable, and again when ever u start the session the second time it will again read from the begening of the table. it will be waste of server resource. so my point is, is there any solution where in the server can read the from the exact point where it has stopped the previous time. exact solution of the problem is in the session properties have one check box option "recovery to load target".when it is enable ,it loaded the remaining rows . u can solve this problem by using mapping variable where u need to mention the start value and end value.

when u end ur session then that variable will take the next value as the start value next time.as u said if suppose first value is 1 and u ended it up at 300 then next time it will take start value as 301.got it!! What is the difference between PowerCenter 6 and powercenter 7? 1)lookup the flat files in informatica 7.X but we cann't lookup flat files in informatica 6.X 2) External Stored Procedure Transformation is not available in informatica 7.X but this transformation included in informatica 6.X Also custom transformation is not available in 6.x The main difference is the version control available in 7.x Session level error handling is available in 7.x XML enhancements for data integration in 7.x What is the difference between PowerCenter 7 and PowerCenter 8? In Power Center 7.1.2 how can we take a Backup of a Versioned Repository such that I get a Non-Versioned Repository when i restore it. I need a way to get a non-versioned repository from a versioned one. Creating a new Repository(Non-Versioned) and then copying all the folders is an option.But is there any other better way...? I was working in SQL server, now i got an golden opertunity to work in INFORMATICA. I have lots of (silly) Questions to build my Career, so pls guide me properly. I will ask lots of questions.... What is the process flow of informatica, Informatica is a ETL tool.used for the Extraction,Transformation and Loadind of data.This tool is used to Extract the data from different Data Bases and then we can do the required transfermation like data type conversions,doing some aggregations,ordering,filtering and so on.After that we can load the transformed data into our database,which will be used for the Bussiness Decissions. Is a fact table normalized or de-normalized? Flat table is always Normalised since no redundants!! Well!! A fact table is always DENORMALISED table. It consists of data from dimension table (Primary Key's) and Fact table has Foreign keys and measures. the main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized... I read the above comments. I confused. then we should ask Kimball know. Here is the comment.. Fable August 3, 2005 Dimensional models are fully denormalized.

Fact Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzardlike conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised. How can we join 3 database like Flat File, Oracle, Db2 in Informatrica.. You have to use two joiner transformations.fIRST one will join two tables and the next one will join the third with the resultant of the first joiner. can we eliminate duplicate rows by using filter and router transformation ?if so explain me in detail We can eliminate the duplicate rows by checking the distinct option in the properties of the transformation. can use SQL query for uniqness if the source is Relational But if the source is Flat file then u should use Shorter or Aggregatot transformation what is tracing level? Ya its the level of information storage in session log. The option comes in the properties tab of transformations. By default it remains "Normal". Can be Verbose Initialisation Verbose Data Normal or Terse. there are 3 depts in dept table and one with 100 people and 2nd with 5 and 3rd with some 30 and so. i want to diplay those deptno where more than 10 people exists select count(*),deptno from dept group by deptno having count(*) >=10; If you want to perform it thru informatica, the Fire the same query in the SQL Override of Source qualifier transformation and make a simple pass thru mapping. Other wise, you can also do it by using a Filter.Router transformation by giving the condition there deptno>=10. how is the union transformation active transformation? Active Transformation is one which can change the number of rows i.e, input rows and output rows might not match. Number of rows coming out of Union transformation might not match the incoming rows.

Active Transformation: the transformation that change the no. of rows in the Target. Source (100 rows) ---> Active Transformation ---> Target (< or > 100 rows) Passive Transformation: the transformation that does not change the no. of rows in the Target. Source (100 rows) ---> Passive Transformation ---> Target (100 rows) Union Transformation: in Union Transformation, we may combine the data from two (or) more sources. Assume, Table-1 contains '10' rows and Table-2 contains '20' rows. If we combine the rows of Table-1 and Table-2, we will get a total of '30' rows in the Target. So, it is definetly an Active Transformation. Active transformation number of records passing through the transformation and their rowid will be different, it depends on rowid also. This is a type of passive transformation which is responsible for merging the data comming from different sources. the union transformation functions very similar to union all statement in oracle. Ya since the Union Trnsformation may lead a change to the no of rows incoming, it is definitely an active type. In the other case, Look-up in no way can change the no. of row that are passing thru it. The transformation just looks to the refering table. The no. of records increases or decreases by the transformations that follow the look-up transformation. why sorter transformation is an active transformation? It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct. then it will not return all the rows This is type of active transformation which is responsible for sorting the data either in the ascending order or descending order according to the key specifier. the port on which the sorting takes place is called as sortkeyport properties if u select distinct eliminate duplicates case sensitive valid for strings to sort the data null treated low null values are given least priority If any transformation has the distinct option then it will be a active one,bec active transformation is nothing but the transformation which will change the no. of o/p records.So distinct always filters the duplicate rows,which inturn decrease the no of o/p records when compared to i/n records.

One more thing is"An active transformation can also behave like a passive" How do we analyse the data at database level? Data can be viewed using Informatica's designer tool. If you want to view the data on source/target we can preview the data but with some limitations. We can use data profiling too. what is the difference between constraind base load ordering and target load plan Constraint based load ordering example: Table 1---Master Tabke 2---Detail If the data in table1 is dependent on the data in table2 then table2 should be loaded first.In such cases to control the load order of the tables we need some conditional loading which is nothing but constraint based load In Informatica this feature is implemented by just one check box at the session level. Target load order comes in the designer property..Click mappings tab in designer and then target load plan.It will show all the target load groups in the particular mapping. You specify the order there the server will loadto the target accordingly. A target load group is a set of source-source qulifier-transformations and target. Where as constraint based loading is a session proerty. Here the multiple targets must be generated from one source qualifier. The target tables must posess primary/foreign key relationships. So that, the server loads according to the key relation irrespective of the Target load order plan. If you have only one source, its loading into multiple target means you have to use Constraint based loading. But the target tables should have key relationships between them. If you have multiple source qualifiers, it has to be loaded into multiple target, you have to use Target load order. how u will create header and footer in target using informatica? If you are focus is about the flat files then one can set it in file properties while creating a mapping or at the session level in session properties you can always create a header and a trailer in the target file using an aggregator transformation.

Take the number of records as count in the aggregator transformation. create three separate files in a single pipeline. One will be your header and other will be your trailer coming from aggregator. The third will be your main file. Concatenate the header and the main file in post session command usnig shell script. How to export mappings to the production environment? In the designer go to the main menu and one can see the export/import options. Import the exported mapping in to the production repository with replace options. can any one tell me how to run scd1 bec it create two target tables in mapping window and there are only one table in warehouse designer(means target).. so if we create one new table in target it gives error.. Hii, have u impleted using wizards???? If so, create the target with the name u have given in wizard for target(table). No't create the target again for the second instance. It is just the virtual copy of the same target. i.e in warehouse designer create and execute the target definitions and run the session containing the mapping again.define the source& target locations in general properties of sessiontreat rows as: Data DrivenCheck this once and let me know.ByeMayee when do u we use dynamic cache and when do we use static cache in an connected and unconnected lookup transformation We use dynamic cache only for connected lookup. We use dynamic cache to check whether the record already exists in the target table are not. And depending on that, we insert,update or delete the records using update strategy. Static cache is the default cache in both connected and unconnected. If u select static cache on lookup table in infa, it own't update the cache and the row in the cache remain constant. We use this to check the results and also to update slowly changing records.Mayee Can u tell me how to go for SCD's and its types.Where do we use them mostly The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below: Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record: Customer Key Name State 1001 Christina IllinoisAt a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem. There are in general three ways to solve this type of problem, and they are categorized as follows: In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept. In our

example, recall we originally have the following table: Customer Key Name State 1001 Christina IllinoisAfter Christina moved from Illinois to California, the new information replaces the new record, and we have the following table: Customer Key Name State 1001 Christina CaliforniaAdvantages: - This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. Disadvantages: - All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before. Usage: About 50% of the time. When to use Type 1: Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes. In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. In our example, recall we originally have the following table: Customer Key Name State 1001 Christina IllinoisAfter Christina moved from Illinois to California, we add the new information as a new row into the table: Customer Key Name State 1001 Christina Illinois 1005 Christina CaliforniaAdvantages: - This allows us to accurately keep all historical information. Disadvantages: - This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. - This necessarily complicates the ETL process. Usage: About 50% of the time. When to use Type 2: Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes. In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active. In our example, recall we originally have the following table: Customer Key Name State1001 Christina IllinoisTo accomodate Type 3 Slowly Changing Dimension, we will now have the following columns: Customer Key Name Original State Current State Effective Date After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003): Customer Key Name Original State Current State Effective Date 1001 Christina Illinois California 15-JAN-2003Advantages: - This does not increase the size of the table, since new information is updated. - This allows us to keep some part of history. Disadvantages: - Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost. Usage: Type 3 is rarely used in actual practice. When to use Type 3: Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time. What is a view? How it is related to data independence?And what are the different types of views,and what is Materialize view views

view is a combination of one or more table.view does not stores the data,it just store the query in file format.If we excutes the query the query will fetch the data from the tables and just make it to view for us. types views materilized view As per definition, View is just an query which is parsed and stored in SGA. So whenever this view is referred in a query it can be executed with no lost of time for parsing. Through views we can hide the complex and big names of the tables. One bigger advantage, just by creating a view once we can use it at many places. Materialize View which is introduce in Oracle 8. It actually stores data like table. have a situation here to load the Table into informatica. I have 5 temporary tables as sources. They look like: Table1: (K - Key, N-Null, X,Y,Z - values) K1 X N N N N K2 X N N N N -------------------------Table2: K1 N X N N N K2 N X N N N -------------------------the other 3 tables are in the same way. But there can be a situation like any of the table can contain duplicates like: K1 X N N N N K1 Y N N N N -------------------------This kind of records should be errored out. Because of this, we can't use aggregator/group by as we are not sure which one should be removed.

How can we obtain this functionality in Informatica. Do the following Map1..Source instance s1---Source qualifieraggregator transformation---t1(target) | | In aggregator transformation Count_ port(output port)= key port --group by key port Now take your original port with eliminated duplicate key ports record in map2

Map2..Source instance( t1)---source qualifier---take your original port with eliminated duplicate key records--- and now do your requirement design . what are the different types of transformation available in informatica. and what are the mostly used ones among them Mainly there are two types of tranformation.1]Active TransformationAn active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.2]Passive TransformationA passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.Transformations can be Connected or UnConnected. Connected TransformationConnected transformation is connected to other transformations or directly to target table in the mapping.UnConnected TransformationAn unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.list of Transformations available in Informatica:1 source qualifier Tranformation2..Expression Transformation 3..Filter Transformation 4..Joiner Transformation 5..Lookup Transformation 6..Normalizer Transformation 7..Rank Transformation 8..Router Transformation 9..Sequence Generator Transformation 10..Stored Procedure Transformation 11..Sorter Transformation 12..Update Strategy Transformation .13...Aggregator Transformation 14..XML Source Qualifier Transformation 15..Advanced External Procedure Transformation 16..External Transformation 16.. custom tranformationMostly use of particular tranformation depend upon the requirement.In our project we are mostly using source qualifier ,aggregator,joiner,look up tranformation.Thanks--afzal

1]In certain mapping there are four targets tg1,tg2,tg3 and tg4. tg1 has a primary key,tg2 foreign key referencing the tg1's primary key,tg3 has primary key that tg2 and tg4 refers as foreign key,tg2 has foreign key referencing primary key of tg4 ,the order in which the informatica will load the target? 2]How can I detect aggregate tranformation causing low performance? T1 and T3 are being the master table and don't have any foreign key refrence to other table will be loaded first. Then T3 will be loaded as it's master table T3 is already been loaded. and at the end T2 will be loaded as it's all master table T1, T3, and T2 to witch it refers has been already loaded. To optimize the aggregator transformation, you can use the following options. Use incremental aggregation Sort the ports before you perform aggregation Avoid using aggregator transformation after update strategy, since it might be confusing. hwo can we eliminate duplicate rows from flat file? keep aggregator between source qualifier and target and choose group by field key, it will eliminate the duplicate records Before loading to target , use an aggregator transformation and make use of group by function to eleminate the duplicates on columns .Nanda Use Sorter Transformation. When you configure the Sorter Transformation to treat output rows as distinct, it configures all ports as part of the sort key. It therefore discards duplicate rows compared during the sort operation if u want to delete the duplicate rows in flat files then we go for rank transformation or oracle external procedure tranfornation select all group by ports and select one field for rank,its easily dupliuctee now to eliminate the duplicate in flatfiles, we have distinct property in sorter transformation. If we enable that property, automatically it will remove duplicate rows in flatfiles. How to view and Generate Metadata Reports(for a particular session would like to generate a report which shows source table , source column and related target table , target column) in informatica? what is Partitioning ? where we can use Partition? wht is advantages?Is it nessisary? In informatica we can tune performance in 5 different levels that is at source level,target level,mapping level,session level and at network level. So to tune the performance at session level we go for partitioning and again we have 4 types of partitioning those are pass through, hash,round robin,key range. pass through is the default one.

In hash again we have 2 types that is userdefined and automatic. round robin can not be applied at source level, it can be used at some transformation level key range can be applied at both source or target levels. In my source table 1000 rec's r there.I want to load 501 rec to 1000 rec into my Target table ? how can u do this ? select * from (select col1, col2,rownum rn from table_name ) where rn > ( select (max(rownum) -n) from table_name );Nanda in db2 we write statement as fetch first 500 rows only. in informatica we can do by using sequence generator and filter out the row when exceds 500 You can overide the sql Query in Wofkflow Manager. LIke select * from tab_name where rownum<=1000 minus select * from tab_name where rownum<=500; This will work fine. Try it and get back to me if u have any issues about the same what is surrogatekey ? In ur project in which situation u has used ? explain with example ? A surrogate key is system genrated/artificial key /sequence number or A surrogate key is a substitution for the natural primary key.It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the tableI it is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.but In my project, I felt that the primary reason for the surrogate keys was to record the changing context of the dimension attributes.(particulaly for scd ) The reason for them being integer and integer joins are faster. Unlike other Surrogate key is a Unique identifier for eatch row , it can be used as a primary key for DWH.The DWH does not depends on primary keys generated by OLTP systems for internally identifying the recods.When the new record is inserting into DWH primary keys are autimatically generated such type od keys are called SURROGATE KEY.Advantages1. Have a flexible mechanisam for handling S.C.D's2. we can save substantial storage space with integer valued surrogate keys. What is the diff b/w Stored Proc (DB level) & Stored proc trans (INFORMATICA level) ? again why should we use SP trans ?

First of all stored procedures (at DB level) are series of SQL statement. And those are stored and compiled at the server side.In the Informatica it is a transformation that uses same stored procedures which are stored in the database. Stored procedures are used to automate time-consuming tasks that are too complicated for standard SQL statements.if you don't want to use the stored procedure then you have to create expression transformation and do all the coding in it. what is the diff b/w STOP & ABORT in INFORMATICA sess level ? Stop:We can Restart the session Abort:WE cant restart the session.We should truncate all the pipeline after that start the session if the workflow has 5 session and running sequentially and 3rd session hasbeen failed how can we run again from only 3rd to 5th session? If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.To recover a session in a concurrent batch:1.Copy the failed session using Operations-Copy Session.2.Drag the copied session outside the batch to be a standalone session.3.Follow the steps to recover a standalone session.4.Delete the standalone copy. u can start from the 3rd session by right clicking on that session and select "start workflow from this task. This will ensure that the workflow has started to run from 3rd session and continues till the last one. how can we join the tables if the tables have no primary and forien key relation and no matchig port to join? without common fields or ports in two diffrent tables, its not possible to join two tables.its need atleast one common field or port to join two diffrent tables. without common column or common data type we can join two sources using dummy ports. 1.Add one dummy port in two sources. 2.In the expression trans assing '1' to each port. 2.Use Joiner transformation to join the sources using dummy port(use join conditions). what ever is said above is correct. Just add 2 dummy ports in the joiner transformation with the same datatypes(Integer). one dummy port as detail and other as master.Then assign both the ports default values as 1, i guess u know that how to assign a default value. Then use these 2 ports in the join condition. It ensures that all the records are matched and basically it will be a cartician product.In oracle it is same as a select query without a where clause. if the workflow has 5 session and running sequentially and 3rd session hasbeen failed how can we run again from only 3rd to 5th session?

If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.To recover a session in a concurrent batch:1.Copy the failed session using Operations-Copy Session.2.Drag the copied session outside the batch to be a standalone session.3.Follow the steps to recover a standalone session.4.Delete the standalone copy. u can start from the 3rd session by right clicking on that session and select "start workflow from this task".This will ensure that the workflow has started to run from 3rd session and continues till the last one. what are the measure objects? Aggregate calculation like sum,avg,max,min these are the measure objetcs. how to load the data from people soft hrm to people soft erm using informatica? Following are necessary 1.Power Connect license 2.Import the source and target from people soft using ODBC connections 3.Define connection under "Application Connection Browser" for the people soft source/target in workflow manager . select the proper connection (people soft with oracle,sybase,db2 and informix) and execute like a normal session. What is meant by EDW? Its a big data warehouses OR centralized data warehousing OR the old style of warehouse. Its a single enterprise data warehouse (EDW) with no associated data marts or operational data store (ODS) systems. If the warehouse is build across particular vertical of the company it is called as enterprise data warehouse.It is limited to particular verticle.For example if the warehouse is build across sales vertical then it is termed as EDW for sales hierachy EDW is Enterprise Datawarehouse which means that its a centralised DW for the whole organization. this apporach is the apporach on Imon which relies on the point of having a single warehouse/centralised where the kimball apporach says to have seperate data marts for each vertical/department. Advantages of having a EDW:

1. Golbal view of the Data 2. Same point of source of data for all the users acroos the organization. 3. able to perform consistent analysis on a single Data Warehouse. to over come is the time it takes to develop and also the management that is required to build a centralised database. what is hash table informatica? Hash partitioning is the type of partition that is supported by Informatica where the hash user keys are specified . Hash table is used to extract the data through Java Virtual MachineHash partitions are some what similar to database partitions. This will allow user to partition the data by rage, which is fetched from source. This will be handy while handling partitioned tables. In hash partitioning, the Informatica Server uses a hash function to group rows of data among partitions. The Informatica Server groups the data based on a partition key.Use hash partitioning when you want the Informatica Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number. What are the properties should be notified when we connect the flat file source definition to relational database target definition? 1.File is fixed width or delimited 2.Size of the file. If its can be executed without performance issues then normal load will work If its huge in GB they NWAY partitions can be specified at the source side and the target side. 3.File reader,source file name etc ..... we can check also line sequntial buffer insted of 1024 byes,we vcann add some more bytes of data how do you load the time dimension. Time Dimension will generally load manually by using PL/SQL , shell scripts, proc C etc...... Can Informatica load heterogeneous targets from heterogeneous sources? yes, , informatica for load the data form the heterogineous sources to heterogeneous target.

while running multiple session in parallel which loads data in the same table, throughput of each session becomes very less and almost same for each session. How can we improve the performance (throughput) in such cases? think this will be handled by the database we use. When the operations/loading on the table is in progress the table will be locked. If we are trying to load the same table with different partitions then we run into rowID erros if the database is 9i and we can apply a patch to reslove this issue how do u check the source for the latest records that are to be loaded into the target. i.e i have loaded some records yesterday, today again the file has been populated with some more records today, so how do i find the records populated today. Keep a timestamp in the target table and from that you can get the daily loaded record status u can check session properties from workflow monitar.. u can find new updated records a) Create a lookup to target table from Source Qualifier based on primary Key. b) Use and expression to evaluate primary key from target look-up. ( If a new source record look-up primary key port for target table should return null). Trap this with decode and proceed. You can use the incremental loading concept. Store the last_run_date as SYSDATE each time you run the session in an variable and use it next time you load your target. use a lookup for this. consider ur targer table as the lookup table.check for the existence. and use the session level properties. select treat source rows as update and in mapping tab->target view, select insert and update as insert. this ensures that only the new records gets inserted and the old records are unchanged. what is the mapping for unit testing in Informatica, are there any other testings in Informatica, and how we will do them as a etl developer. how do the testing people will do testing are there any specific tools for testing In informatica there is no method for unit testing. There are two methods to test the mapping. 1. But we have data sampling. set the ata sampling properties for session in workflow manager for specified number of rows and test the mapping. 2. Use the debugger and test the mapping for sample records. in realtime which one is better star schema or snowflake star schema the surrogate will be linked to which columns in the dimension table.

In real time only star schema will implement because it will take less time and surrogate key will there in each and every dimension table in star schema and this surrogate key will assign as foreign key in fact table. How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other ways using lookup delete the duplicate rows? For example u have a table Emp_Name and it has two columns Fname, Lname in the source table which has douplicate rows. In the mapping Create Aggregator transformation. Edit the aggregator transformation select Ports tab select Fname then click the check box on GroupBy and uncheck the (O) out port. select Lname then uncheck the (O) out port and click the check box on GroupBy. Then create 2 new ports Uncheck the (I) import then click Expression on each port. In the first new port Expression type Fname. Then second Newport type Lname. Then close the aggregator transformation link to the target table. What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow in Work flow monitor? . The system hangs when 'Online' server connect option. The Informatica is installed on a Personal laptop When the repo is up and the PMSERVER is also up, workflow monitor always will be connected on-line. When PMserver is down and the repo is still up we will be prompted for an off-line connection with which we can just monitor the workflows previously ran. what is rank transformation?where can we use this transformation? Rank transformation is used to find the status.ex if we have one sales table and in this if we find more employees selling the same product and we are in need to find the first 5 0r 10 employee who is selling more products.we can go for rank transformation. To arrange records in Hierarchical Order and to selecte TOP or BOTTOM records. It is same as START WITH and CONNECT BY PRIOR clauses. It is an active transformation which is used to identify the top and bottom values based on the numerics .by deafult it will create a rankindex port to caliculate the rank can batches be copied/stopped from server manager? yes, we can stop the batches using server manager or pmcmd commnad what are the real time problems generally come up while doing/running mapping/any transformation?can any body explain with example may be you will encounter with connection faliure, other then that i don't think so, cuzserver will handle all the syntex errors and Invalid mappings. What will happen if you are using Update Strategy Transformation and your session is configured for "insert"?

What are the types of External Loader available with Informatica? If you have rank index for top 10. However if you pass only 5 records, what will be the output of such a Rank Transformation? You can set the following update strategy options: Insert. Select this option to insert a row into a target table. Delete. Select this option to delete a row from a table. Update. You have three different options in this situation: Update as update:Update each row flagged for update if it exists in the target table. Update as insert: Insert each row flagged for update. Update else insert : Update the row if it exists. Otherwise, insert it. if u r using a update strategy in any of ur mapping, then in session properties u have to set treat source rows as Data Driven. if u select insert or udate or delete, then the info server will not consider UPD for performing any DB operations. ELSE u can use the UPD session level options. instead of using a UPD in mapping just select the update in treat source rows and update else insert option. this will do the same job as UPD. but be sure to have a PK in the target table. 2) for oracle : SQL loader for teradata:tpump,mload. 3) if u pass only 5 rows to rank, it will rank only the 5 records based on the rank port. how to get two targets T1 containing distinct values and T2 containing duplicate values from one source S1. Use filter transformation for loading the target with no duplicates. and for the other transformation load it directly from source. how you seperate distinct value using filter, where you write no dictinct value. please make it more clear. Where is the cache stored in informatica? cache stored in informatica is in informatica server Cache is stored in the Informatica server memory and over flowed data is stored on the disk in file format which will be automatically deleted after the successful completion of the session run. If you want to store that data you have to use a persistant cache. If you want to create indexes after the load process which transformation you choose? a) Filter Tranformation b) Aggregator Tranformation c) Stored procedure Tranformation d) Expression Tranformation

Its usually not done in the mapping(transformation) level. Its done in session level. Create a command task which will execute a shell script (if Unix) or any other scripts which contains the create index command. Use this command task in the workflow after the session or else, You can create it with a post session command. In a joiner trasformation, you should specify the source with fewer rows as the master source. Why? in joinner transformation informatica server reads all the records from master source builds index and data caches based on master table rows.after building the caches the joiner transformation reads records from the detail source and perform joins Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process what happens if you try to create a shortcut to a non-shared folder? It only creates a copy of it.. 1)What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA. 2) What are the common problems developers face while ETL development 1) Check in the workflow monitor status, whether no. of records in source and no. of actual records loaded are equal 2) Check for the duration for a workflow to suceed 3)Check in the session logs for data loaded. If you want to know the performance of a mapping at transformation level, then select the option in the session properties-> collect performance data. At the run time in the monitor you can see it in the performance tab or you can get it from a file. The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log. If there is no session-specific directory for the session log, the PowerCenter Server saves the file in the default log files directory. Quality of the data loaded depends on the quality of data in the source. If cleansing is required then have to perform some data cleansing operations in informatica. Final data will always be clean if followed. What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA. What is Transaction? A transaction can be define as DML operation.

means it can be insertion,modification or deletion of data performed by users/ analysts/applicators transaction is nothing but changing one window to another window during process what is polling? It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the Informatica server. how do we remove the staging area this question is logically not correct. staging area is just a set of intermediate tables.u can create or maintain these tables in the same database as of ur DWH or in a different DB.These tables will be used to store data from the source which will be cleaned, transformed and undergo some business logic.Once the source data is done with the above process, data from STG will be populated to the final Fact table through a simple one to one mapping. can any body write a session parameter file which will change the source and targets for every session. i.e different source and targets for each session run. You are supposed to define a parameter file. And then in the Parameter file, you can define two parameters, one for source and one for target. Give like this for example: $Src_file = c:\program files\informatica\server\bin\abc_source.txt $tgt_file = c:\targets\abc_targets.txt Then go and define the parameter file: [folder_name.WF:workflow_name.ST:s_session_name] $Src_file =c:\program files\informatica\server\bin\abc_source.txt $tgt_file = c:\targets\abc_targets.txt If its a relational db, you can even give an overridden sql at the session level...as a parameter. Make sure the sql is in a single line. Informatica Live Interview Questions here are some of the interview questions i could not answer, any body can help giving answers for others also. thanks in advance. Explain grouped cross tab? Explain reference cursor What are parallel query's and query hints What is meta data and system catalog What is factless fact schema

What is confirmed dimension Which kind of index is preferred in DWH Why do we use DSS database for OLAP tools confirmed dimension == one dimension that shares with two fact table factless means ,fact table without measures only contains foreign keys-two types of factless table ,one is event tracking and other is covergae table Bit map indexes preffered in the data ware housing Metedata is data about data, here every thing is stored examplemapping,sessions,privileges other data,in informatica we can see the metedata in the repository. System catalog that we used in the cognos,that also contains data,tables,privileges,predefined filter etc, using this catalog we generate reports group cross tab is a type of report in cognos,where we have to assign 3 measures for getting the result doubt your answer about the Grouped Cross Tab, where you said 3 measure are to be specified, which i feel is wrong. I think that grouped cross tab has only one measure, but the side and row headers are grouped like India Mah | Goa 2000 2001 20K 30K 39K 60K China XYZ | PQR 45K 34K 55K 66K

Here the cross tab is grouped on Country, and then State. Similary even we can go further to drill year to Quarters. And this is known gruped Cross tab. The cursor which is not declared in the declaration section but in executable section where we can give the table name dynamically there.so that the cursor can fetch the data from that table
grouped cross tab is the single report which contains number of crosstab report based on the grouped items. like

Here countries are groupe items.

Banglore Hyderabad Chennai INDIA

M1 542 255 45

M2 542 458 254

USA LA Chicago Washington DC PAKISTAN Lahore Karachi Islamabad M1 457 458 7894 M2 875 687 64 M1 578 4785 548 M2 5876 546 556

Rest of the answers are given by friends earlier. DSS->Decision Support System. The purpose of a DWH is to provide users data through which they can make their critical besiness decisions. DSS data base is nothing but a DWH. OLAP tools obviously use data from a DWH which is transformed to generate reports. These reports are used by the users, analysts to extract strategic information which helps in decision making.

Partitioning, Bitmap Indexing (when to use), how will the bitmap indexing will effect the performance Bitmap indexing a indexing technique to tune the performance of SQL queries. The default type is B-Tree indexers which is of high cardinality (normalized data). You can use bitmap indexers for de-normalized data or low cardinalities. The condition is the amount of DISTINCT rows should be less than 4% of the total rows. If it satisfies the given condition then bitmap indexers will optimize the performance for this kind of tables. Talk about the Kimball vs. Inmon approaches. Talk about the concepts of ODS and information factory. Talk about challenges of real-time load processing vs. batch. how to create a custom transformation, can u give a realtime example where exactly u have used it. just give some explanation why u used the custom transformation where do we use MQ series source qualifier, application multi group source qualifier. just give an example for a better understanding its like same as Source Qualifier but only for MQ Series Product from IBM. This product is used for Data Integration from different type of databases and to provide them as single source of data. We can use a MQSeries SQ when we have a MQ messaging system as source(queue). When there is need to extract data from a Queue, which will basically have messages in XML format, we will use a JMS or a MQ SQ depending on the messaging system. If you have a TIBCO EMS Queue, use a JMS source and JMS SQ and an XML Parser, or if you have a MQ series queue, then use a MQ SQ which will be associated with a Flat file or a Cobal file. What is meant by Junk Attribute in Informatica? Junk Dimension A Dimension is called junk dimension if it contains attribute which are rarely changed ormodified. example In Banking Domain , we can fetch four attributes accounting to a junk dimensions like from the Overall_Transaction_master table tput flag tcmp flag del flag advance flag all these attributes can be a part of a junk dimensions. thankxregards raghavendra In the requirment collection phase, all the attributes that are likely to be used in any dimension will be gathered. while creating a dimension we use all the related attributes of that dimesion from the gathered list. At the last a dimension will be created with all the left over attributes which is usually called as JUNK Dimension and the attributes are called JUNK Attributes. can anyone explain about incremental aggregation with an example? When you use aggregator transformation to aggregate it creates index and data caches to store the data 1.Of group By columns 2. Of aggreagte columns

the incremental aggreagtion is used when we have historical data in place which will be used in aggregation incremental aggregation uses the cache which contains the historical data and for each group by column value already present in cache it add the data value to its corresponding data cache value and outputs the row , in case of a incoming value having no match in index cache the new values for group by and output ports are inserted into the cache Incremental aggregation is specially used for tune the performance of the aggregator. It captures the change each time (incrementally) you run the transformation and then performs the aggregation function to the changed rows and not to the entire rows. This improves the performance because you are not reading the entire source, each time you run the session. Difference between Rank and Dense Rank? Rank: 1 2<--2nd position 2<--3rd position 4 5 Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game ususally Ranks this way. This is usually a Gold Ranking. Dense Rank: 1 2<--2nd position 2<--3rd position 3 4 Same ranks are assigned to same totals/numbers/names. the next rank follows the serial number. Somebody ca explain me the 3 points:I want to Know : 1) the differences between using native and ODBC server-side databaseConnections 2)Know the reason why to register a server to the repository is necessary 3)Know the rules associated with transferring and sharing objects between folders. 4) Know the rules associated with transferring and sharing objects between repositories 1> Native connection is something which is provided by the same vendor for that tool. eg: oracle warehouse builder has its own driver to connect to oracle DB which does not use a ODBC driver. here connection will be fast and hence performance. ODBC is basically a third party driver like Microsoft driver for Oracle, which can be used by any tool to connect to oracle.

2> Registering a server to a repository is necessary because the sessions will be using this server to run. If we have multiple servers, then we can use diff server to diff sessions to run. 1) I want to Know which mapping properties can be overridden on a Session Task level. You can override any properties other than the source and targets. Make sure the source and targets exists in ur db if it is a relational db. If it is a flat file, you can override its properties. You can override sql if its a relational db, session log, DTM buffer size, cache sizes etc. 2)Know what types of permissions are needed to run and schedule Work flows You need execute permissions on the folder to run/schedule a workflow. You may have read and write. But u need execute permissions as well. what is the hierarchies in DWH Data sources ---> Data acquisition ---> Warehouse ---> Front end tools ---> Metadata management ---> Data warehouse operation management what is the exact meaning of domain? particular environment or a name that identifies one or more IP addressesexample gov Government agencies edu - Educational institutions org - Organizations (nonprofit) mil Military com - commercial business net - Network organizations ca - Canada th Thailand in - India Domain in Informatica means - A central Global repository (GDR) along with the registered Local repositories (LDR) to this GDR. This is possible only in PowerCenter and not PowerMart Can any one explain real time complain mappings or complex transformations in Informatica. Specially in Sales Domain. Most complex logic we use is denormalization. We dont have any Denormalizer transformation in INformatica. So we will have to use an aggregator followed by an expression. Apart from this, we use most of the complexicity in expression transformation involving lot of nested IIF's and Decode statements...another one is the union tranformation and joiner. how do you create a mapping using multiple lookup transformation? Use Multiple Lookups in the mapping Use unconnected lookup if same lookup repeats multiple times. In the source, if we also have duplicate records and we have 2 targets, T1- for unique values and T2- only for duplicate values. How do we pass the unique values to T1 and duplicate values to T2 from the source to these 2 different targets in a single mapping? use this sequence to get the result.

source--->sq--->exp-->sorter(with enable select distinct check box)--->t1 --->aggregator(with enabling group by and write count function)--->t2 source--->sq--->exp-->sorter(with enable select distinct check box)--->t1 --->aggregator(with enabling group by and write count function)--->t2 If u want only duplicates to t2 u can follow this sequence --->agg(with enable group by write this code decode(count(col),1,1,0))--->Filter(condition is 0)--->t2. take two source instences and in first one embeded distinct in the source qualifier and connect it to the target t1. and just write a query in the second source instance to fetch the duplicate records and connect it to the target t2. << if u use aggregater as suggested by my friend u will get duplicate as well as distinct records in the second target >> This is not a right approach friends. There is a good practice of identifying duplicates. Normally when you ask someone how to identify a duplicate record in informatica, they say "Use aggregator transf". well you can just get a count from this, but not really identify which record is a duplicate. If it is RDBMS, you can simply write a query "select ... from ...group by <key fields> having count(*) > 1. great! But what if the source is a flat file? you can use an aggregate and get the count of it. then you will filter and wanted to make sure it reacheds the T1 and T2 tgt's appropriately. This would be the easiest way. Use a sorter transformation. sort on key fields by which u want to find the duplicates. then use an expression tranformation. Example: Example:

field1--> field2--> SORTER: field1 --ascending/descending field2 --ascending/descending Expression: --> field1 --> field2 <--> v_field1_curr = field1 <--> v_field2_curr = field2 v_dup_flag = IIF(v_field1_curr = v_field1_prev, true, false) o_dup_flag = IIF(v_dup_flag = true, 'Duplicate', 'Not Duplicate' <--> v_field1_prev = v_field1_curr <--> v_field2_prev = v_field2_curr use a Router transformation and put o_dup_flag = 'Duplicate' in T2 and 'Not Duplicate' in T1. Informatica evaluates row by row. So as we sort, all the rows come in order and it will evaluate based on the previous and current rows. hope its clear. what are the enhancements made to Informatica 7.1.1 version when compared to 6.2.2 version? 1.union & custom transformation 2.lookup on flatfile 3.we can use pmcmd command 4.we can export independent&dependent repository objects 5.version controlling 6.data proffiling 7.supporting of 64mb architecture 8.ldap authentication what is difference between COM & DCOM? What is the difference between Power Centre and Power Mart?

What is the procedure for creating Independent Data Marts from Informatica 7.1? Power Centre have Multiple Repositories,where as Power mart have single repository(desktop repository)Power Centre again linked to global repositor to share between users power center powermart no.of repository aplicability global repository local repository ERP support n No. high end WH supported supported available n No. low&mid range WH not supported supported not available

what is lookup transformation and update strategy transformation and explain with an example. Look up transformation is used to lookup the data in a relationa table,view,Synonym and Flat file. The informatica server queries the lookup table based on the lookup ports used in the transformation. It compares the lookup transformation port values to lookup table column values based on the lookup condition By using lookup we can get the realated value,Perform a caluclation and Update SCD. Two types of lookups Connected Unconnected Update strategy transformation This is used to control how the rows are flagged for insert,update ,delete or reject. To define a flagging of rows in a session it can be insert,Delete,Update or Data driven. In Update we have three options Update as Update Update as insert

Update else insert what is the logic will you implement to laod the data in to one factv from 'n' number of dimension tables. to load data into one fact table from more than one dimension tables . firstly u need to create afact table and dimension tables. later load data into individual dimensions by using sources and transformations(aggregator,sequence generator,lookup) in mapping designer then to the fact table connect the surrogate to the foreign key and the columns from dimensions to the fact. after loading the data into the dimention tables we will load the data into the fact tables ... the reason for this is that the dimention tables contain the data related to the fact table. to load the data from dimention table to fact table is simple .. assume (dimention table as source tables) and fact table as target. thats all..... yaa, dimension table has to be treated as a source, but how 'n' dimensions will be loaded into a single fact table. like do we use the one dimesion as a source and other dimnsions as a look up table we will load?or else any other logic we will implement? Can i use a session Bulk loading option that time can i make a recovery to the session? If the session is configured to use in bulk mode it will not write recovery information to recovery tables. So Bulk loading will not perform the recovery as required. no,why because in bulk load u wont create redo log file,when u normal load we create redo log file, but in bulk load session performance increases How do you configure mapping in informatica? You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations. For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. You can also perform the following tasks to optimize the mapping: Configure single-pass reading. Optimize datatype conversions. Eliminate transformation errors. Optimize transformations.

Optimize expressions. You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations. For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. You can also perform the following tasks to optimize the mapping: Configure single-pass reading. Optimize datatype conversions. Eliminate transformation errors. Optimize transformations. Optimize expressions. what is difference between dimention table and fact table and what are different dimention tables and fact tables? In the fact table contain measurable data and less columns and meny rows, It's contain primarykey Diffrent types of fact tables: additive,non additive, semi additive In the dimensions table contain textual descrption of data and also contain meny columns,less rows Its contain primary key what is worklet and what use of worklet and in which situation we can use it A set of worlflow tasks is called worklet, Workflow tasks means 1)timer2)decesion3)command4)eventwait5)eventrise6)mail etc...... But we r use diffrent situations by using this only Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use worklets. To execute a Worklet, it has to be placed inside a workflow. The use of worklet in a workflow is similar to the use of mapplet in a mapping. Worklet is reusable workflows. It might contain more than on task in it. We can use these worklets in other workflows. what are mapping parameters and varibles in which situation we can use it

If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple. Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file . But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables . In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run. Mapping parameters have a constant value through out the session whereas in mapping variable the values change and the informatica server saves the values in the repository and uses next time when u run the session. explain use of update strategy transformation? To flag source records as INSERT, DELETE, UPDATE or REJECT for target database. Default flag is Insert. This is must for Incremental Data Loading. maintain the history data and maintain the most recent changaes data. what is meant by complex mapping, Complex maping means involved in more logic and more business rules. Actually in my project complex mapping is In my bank project, I involved in construct a 1 dataware house Meny customer is there in my bank project, They r after taking loans relocated in to another place that time i feel to diffcult maintain both prvious and current adresses in the sense i am using scd2 This is an simple example of complex mapping I have an requirement where in the columns names in a table (Table A) should appear in rows of target table (Table B) i.e. converting columns to rows. Is it possible through Informatica? If so, how? if data in tables as follows Table A Key-1 char(3); table A values

_______ 1 2 3 Table B bkey-a char(3); bcode char(1); table b values 1T 1A 1G 2A 2T 2L 3A and output required is as 1, T, A 2, A, T, L 3, A the SQL query in source qualifier should be select key_1, max(decode( bcode, 'T', bcode, null )) t_code, max(decode( bcode, 'A', bcode, null )) a_code, max(decode( bcode, 'L', bcode, null )) l_code from a, b where a.key_1 = b.bkey_a group by key_1 / use sequence genetrator transformation If a session fails after loading of 10,000 records in to the target.How can u load the records from 10001 th record when u run the session next time in informatica 6.1? using performance recovery option Running the session in recovery mode will work, but the target load type should be normal. If its bulk then recovery wont work as expected Skip the number of initial rows to skip to 10001 in the source type option from session property

How to perform a "Loop Scope / Loop condition" in an Informatica program ? Give me few examples . can we run a group of sessions without using workflow manager ya Its Posible using pmcmd Command with out using the workflow Manager run the group of session. as per my knowledge i give the answer. what is difference between lookup cashe and unchashed lookup? Can i run the mapping with out starting the informatica server? the difference between cache and uncacheed lookup iswhen you configure the lookup transformation cache lookup it stores all the lookup table data in the cache when the first input record enter into the lookup transformation, in cache lookup the select statement executes only once and compares the values of the input record with the values in the cachebut in uncache lookup the the select statement executes for each input record entering into the lookup transformation and it has to connect to database each time entering the new record

Das könnte Ihnen auch gefallen