Sie sind auf Seite 1von 8

INFORMATICA TRANSFORMATIONS

Transformations are nothing but business logic * Active number of inputs is not equal to number of outputs * Passive number of inputs is equal to number of outputs Transformation Name Source Qualifier Filter Transformation Sorter Transformation Expression Transformation Aggregator Transformation Joiner Transformation Union Transformation Rank Transformation Sequence Generator Transformation Router Transformation Normaliser Transformation Update Strategy Transformation Lookup Transformation Type active/connected active/connected active/connected Passive/connected active/connected active/connected active/connected active/connected Passive/connected active/connected active/connected active/connected active/connected No of Ports 2 I/p and O/p 2 I/p and O/p 3 I/p, O/p and K (key) 3 I/p, O/p and V (Variable) 3 I/p, O/p and V (Variable) 3 I/p, O/p and M (Master) 2 I/p and O/p 4 I/p, O/p, V (Variable), R (Rank) 1 O/p (2 types Next Val & Current Val) 2 I/p and O/p 2 I/p and O/p 2 I/p and O/p 4 I/p, O/p, R (Return), L (Lookup)

1) Source Qualifier: Type: active/connected No of ports: 2 (I/p and O/p) Properties: * SQL Query -- Using this we can write the SQL queries and this is called SQL override * User defined joins -- We can write join conditions for RDBMS sources * Source filter -- Filter the source data only for RDBMS sources * Tracing level -- Tracing the source and target data. Default tracing level is normal Normal -- mostly used Verbose data Terse Verbose initialization * Distinct -- Will eliminate the duplicates * Pre SQL -- Queries are going to execute before the transformation * Post SQL -- Queries are going to execute after the transformation (PRE and POST SQL are mostly used in PL/SQL * Based on the distinct property only the source qualifier is a active transformation * Every active transformation by default act as a passive transformation Check List: * We can extract the data from both the sources (RDBMS and flat files) * Eliminate the duplicates * Extract the data from multiple RDBMS sources is possible * Extract the data from multiple flat files is not possible * We can sort the data (order by) only for RDBMS sources * Source qualifier only mostly work for RDBMS (all the remaining transformations will work for both RDBMS and Flat files Working with Flat Files: (.txt .csv .xml): * We can extract the data for both source and target as flat files * ODBC connection is not required Delimited -- Data is separated with special characters Fixed Width -- Data is separated with the space * Fixed width has more performance than delimited because it will check for comma again and again (delimited)

* From which source we can extract the data very fast -- Flat File (no ODBC connection is required) * which target is suggestable -- RDBMS * Which source is suggestable Flat file * If we are using a flat file as a target, the values are going to override for every time * We are getting flat file from client as FTP are Remote server * Flat File path in real time: 1) Home/Informatica/server/Infa_shared/Source 2) IP address 2) Filter Transformation: Type: active/connected No of ports: 2 (I/p and O/p) Properties: * Source qualifier works only for RDBMS sources * Using this we can filter the data for both the RDBMS and flat files * Not possible to use IN function * For multiple conditions use AND, OR, NOT 3) Sorter Transformation: Type: active/connected No of ports: 3 (I/p, O/p and K key) Properties: * Distinct Based on this property the sorter is active transformation * In sorting NULL values are the highest level * Cache File applicable for all transformations and it is stored in informatica server * Performance is low * For doing the sorting (every row) it is creating the cache file (temporary space) * It is created at the time of execution and it is deleted after the execution of the transformation * Default cache file size is 2GB Check List: * Using this we can sort the data in ASC or DESC * Work for all sources RDBMS and flat files * If it is a flat file for sorting use sorter transformation * RDBMS source qualifier Source Qualifier Transformation * Only for RDBMS sources * Writing SQL query is possible * Remove the duplicates Source Qualifier Transformation Filter Transformation * Both the sources (RDBMS and flat files) * Not possible * Not possible flat files check Sorter Transformation * Both (RDBMS and flat files) * Create cache files * Performance is less

* Sort only RDBMS data


* Cannot create cache files * Performance is more 4) Expression Transformation: Type: Passive/connected No of ports: 3 (I/p, O/p and V Variable) Properties: NA Check List:

* For implementing data scrubbing use expression transformation

* We can do the calculations * Work only for single row functions * For both the sources (RDBMS and flat files) 5) Aggregator Transformation: Type: Active/connected No of ports: 3 (I/p, O/p and V Variable) Properties: * Aggregator will create 2 types of cache files and their size is around 2 GB Data cache -- contains aggregate calculation values Index cache group by port information * The performance is less, because it is creating the cache file means the sorting is done here * We can increase the performance by using sorted i/p option (before the aggregator) Check List: * In aggregator the default sorting is ascending * In SQL it is descending * We can use aggregator instead of expression? But why we are using expression always? Aggregator is going to create the cache files. So to increase the performance use expression always Expression Transformation * Support only single row functions * Wont create cache files * Performance is more Session Target Properties: * Target load type default is bulk (in real time) Normal -- one by one row loading and it takes more time to load Performance is less Tracing errors is easy Bulk -- loading all values at a time into the target and it takes less time to load Performance is more Tracing errors is difficult * In how many ways you can load the values into the target in informatica? 2 ways bulk and normal mode normally bulk mode is used in our project Types of Sources: * Homogeneous extracting the data from same ODBC connection 2 RDBMS sources (or) from 2 Teradata * Heterogeneous extracting the data from different ODBC connection 1 RDBMS and 1 flat file (or) 1 RDBMS and 1 Teradata 6) Joiner Transformation: Type: Active/connected No of ports: 3 (I/p, O/p and M Master) Properties: * 2 types of table in Joiner Master -- contains less records use this to increase the performance Detail -- contains more records * 2 types of cache files Index cache contains Joiner condition Master cache contains master table data * Joiner will sort the data itself (Default sorting is ascending) -- so the performance is low * To increase the performance use sorter data option in Joiner Aggregator Transformation * Supports both single & multi row function * Create cache files * Performance is less

Check List: * Extracting data from multiple sources * Work for both the sources (Homogeneous and Heterogeneous) * Using this we can implement data merging (mostly used) * It will eliminate the duplicates * To extract data from 2 flat files use Joiner * Joiner will create 2 types of cache files * Using source qualifier we can extract data from same RDBMS sources (different is not possible) * Only one join condition is possible to write in source qualifier * Which type of source is used in your project heterogeneous * 1 column is enough to write the join condition * If 2 tables are not having any relationship is it possible to load the values using Source qualifier? Yes No need of Primary & Foreign key relationship but common columns is required * Default join in Source qualifier is Equi Join (belongs to Inner Join and possible to write Left Outer Join in Source qualifier) * We can write multiple join conditions in Joiner * In joiner we have to use only 1 Master and 1 Detail table * We cannot connect Sequence Generator to Joiner Joiner will eliminate the duplicates But sequence generator wont eliminate the duplicates (It needs all the rows) * Reader Error Check the Source * Writer Error Check the Target Joiner Transformation Data Merging Expression Transformation Data cleansing & Data scrubbing Aggregator Transformation Data aggregations

Joiner Union Transformation Source Indirect Method Sources -- 5 tables but diff structure 5 tables same structure Sources -- 5 tables same structure (same condition for flat files) (same condition for flat files) * For the third column Possible to done by using Union transformation but source indirect method is easy Types of Joins in Joiner Transformation: (These are informatica related joins and not SQL joins) * Normal (or) Equi (or) Simple Joins It is the default join and it gives matched records (or) rows from both the tables (Mostly used) * Master Outer Join Gives matched rows from both the tables & unmatched rows from the detailed table * Detail Outer Join Gives matched rows from both the tables & unmatched rows from the master table * Full Outer Join Gives matched & unmatched rows from both the tables How to increase the performance in Joiner: * Use sorter data option in Joiner * Selecting the master table which is having less no of records Source Qualifier Transformation * Only for Homogeneous sources * Cannot create cache files * Performance is more 7) Union Transformation: Type: Active/connected No of ports: 2 (I/p and O/p) Properties: * Is partitionable -- Based on this property the union is active transformation. Else it is a Passive one Joiner Transformation * Both the sources (Homo & Heterogeneous) * Create cache files * Performance is less

Check List: * Like Joiner we can implement the Data merging process * We can extract the data from multiple sources -- RDBMS and flat files (but source contains the same structure) * Work based on UNION ALL set operators * Wont eliminate the duplicates * In union no need to write any condition, but in Joiner we have to write the condition * Union wont create cache files * Union transformation will automatically create one group by default called o/p group (contains only o/p ports) * You have 5 sources. How many groups you have to create? 4. By default there is one port called o\p port * Upstream Before Union transformation * Downstream after Union transformation Union Transformation Groups: (2 types) * I/p (or) User Defined Group Contains source information and we can create n no of i/p groups For each source we have to create 1 i/p group * O/p (or) Built in group It is created by default and using this we can connections to the target Contains only one o/p port Limitations: * Cannot connect Sequence generator, Update strategy to the Union transformation Joiner Transformation * We can extract the data from multiple sources of different structures * Remove duplicates * Have to write the join condition * Create cache files * Performance is less * Structures are different * N sources n-1 joiner transformation Performance Tuning: * Use Filter transformation near Union. After filter use Expression 8) Rank Transformation: Type: Active/connected No of ports: 4 (I/p, O/p, V Variable, R Rank) Properties: Check List: * Using this we can get the top & bottom values * Support for number data type & String data type * Create cache files * For V port in Expression Single row Aggregator Both single & multiple rows Rank Only single row * Why there is a group by port in rank finding ranks for department wise Rank Index: * Rank transformation will automatically create one O/P port called rank index (gives ranks) Union Transformation * We can extract the data from multiple sources of same structures * Wont Remove duplicates * No need to write any condition * Dont create cache files * Performance is more * Structures is same * N sources Use only 1 Union Transformation

Dense Ranking: * Finding the ranks group wise * For implementing dense ranking we have to use group by port in rank transformation 9) Sequence Generator Transformation: Type: Passive/connected No of ports: 1 O/p (2 types Next Val and Current Val) Properties: * Next Val & Current Val * Possible values in end value is 19 digit no Check List: * Using this we can generate the sequence numbers (means Surrogate keys System generated one) * Using this we can develop the SCD mappings (Must use for SCD2 & SCD3) * Here we cannot able to pass any data * How we will find the rank without rank transformation (using sequence generator) * Without Sequence generator how we will generate the serial no (Using SQL Sequences & Stored procedure transformation) Limitations: * We cannot connect sequence generator values before to Joiner, Union, Update strategy (use always after) 10) Router Transformation: Type: Active/connected No of ports: 2 (I/p & O/p) Properties: * User Defined Group Contains condition matched data & we can write the conditions We can edit/delete the groups * Default Group Contains condition unmatched data & we cannot write any conditions Not possible edit/delete the groups * While creating a new group, by default one group will be created as default1 contains unmatched rows Check List: * Using this we can route/load the data (or) values into multiple target * Router will work for only 1 source * If we are having multiple sources then we have to go for Source Qualifier, Joiner, and Union Filter Transformation * Only 1 condition is possible ( By using AND for multiple conditions) * Route the data to only 1 target * Cannot create any groups 11) Normaliser Transformation: Type: Active/connected No of ports: 2 (I/p & O/p) Properties: Check List: * This transformation is not used in real time. Only for interview purpose * Using this we can convert single rows into multiple rows Data Pivoting Router Transformation * Multiple conditions are possible * Route the data to multiple targets * Creating groups are possible

* Extracting data from COBOL files are also possible * If your source is a COBOL file then we have to use a normaliser transformation (Use by default) * Here it acts as a Source qualifier (By using source qualifier we cannot import COBOL as a source file) Occurrence: * How many times do we want to repeat the column? Normaliser * Process of converting single row into multiple rows 12) Update Strategy Transformation: Type: Active/connected No of ports: 2 (I/p & O/p) Properties: * Default value in the update transformation is Insert DD_Insert 0 DD_Update 1 DD_Delete 2 DD_Reject 3 Check List: * How (or) what way we are the loading the values Data driven * Using this we can design the SCD2 mappings * IIF This is Informatica IIF and not as in SQL * Check for update as null option in session properties (alternate for update option used in update transformation) * Without using update transformation is it possible to insert (or) update the values? Yes by check the option in session Target properties 1st Method: Insert Update as update Update as insert Update else insert Delete 2nd Method: Treat source as 13) Lookup Transformation: Type: Active/connected No of ports: 4 (I/p, O/p, R Return, L Lookup (Another R port in Informatica is in Rank transformation)) Properties: * Lookup SQL overrides (Supports only for source data) * Lookup source filter * Contains cache files Default is static Check List: * Using this we can look the values into source (or) target * It returns NULL (or) NOT NULL * Must use the update strategy transformation (after lookup) in lookup * 2 types of lookup Connected Physically connected Unconnected Logically connected * Use R port only in unconnected lookup * In mapping while creating lookup default is target (Looking source is also possible based on your requirement) Normalization * Process to removing the duplicates

* Which transformation is going to occupy more memory Lookup (Compared to all other transformations in informatica) * Coolest transformation in Informatica Lookup * Is it possible to lookup values in 2 targets? Refer 1) Connected lookup: * Creates 3 types of cache files Static Once we load all the values into the target and then only the cache file is created Performance is more and the data in the cache file cannot be refreshed Dynamic Once 1 row is inserted into the target, and then 1 row is inserted into the cache files Performance is less and the data in the cache file can be refreshed Persistent Contains bulk data & cache file is going to store permanently Performance is more when compared to static & dynamic 2) Unconnected lookup: * Connected logically and must use R compulsorily * In which situation we are using unconnected lookup? The values that are returned reusable (using multiple lookup values) and whereas in connected the values are not reusable Connected lookup * Return multiple port values (R port) * Values are not reusable * 3 cache files are present Unconnected lookup * Return only 1 port values (R port) * Values are reusable * Only 2 cache files are present (No Dynamic)

Das könnte Ihnen auch gefallen