Sie sind auf Seite 1von 4

Problem Description Creating a Change data capture (CDC) approach or a technique to identify changes , only changes, in the source.

Solution Description Change data capture (CDC) is an approach or a technique to identify changes, onl y changes, in the source. Building an ETL application without CDC is a costly mi ss and usually a backtracking step. Below is a different methods of implementing CDC. Change Data Capture in Informatica : -------------------------------------------------Change data capture (CDC) is an approach or a technique to identify changes, onl y changes, in the source. Building an ETL application without CDC is a costly mi ss and usually a backtracking step. Below is a different methods of implementing CDC. Scenario #01: Change detection using timestamp on source rows :: ----------------------------------------------------------------------In this typical scenario the source rows have extra two columns say row_created_ time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modified 1. In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime data t ype 2. Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime); th is steps stores the time at which the Session was started to $$LAST_ETL_RUN_TIME 3. Use $$LAST_ETL_RUN_TIME in the -- where--- clause of the source SQL. During t he first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where last_m odified_date >01/01/1900 00:00:000 4. Now let us assume the session is run on 01/01/2010 00:00:000 for initial seed 5. When the session is executed on 02/01/2010 00:00:000 the sequel would be like : select * from employee where last_modified_date > 01/01/2010 00:00:000 , here by pulling records that had only got changed in between successive runs Scenario #02: Change detection using load_id or Run_id :: -----------------------------------------------------------Under this scenario the source rows have a column say load_id, a positive runnin g number. The load_id is updated as and when the record is updated. 1. In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer data ty pe 2. Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the maximum load_id is stored into mapping variable 3. Use $$LAST_READ_LOAD_ID in the -- where -- clause of the source SQL. During t he first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where load_i d > 0; Assuming all records during initial seed have load_id =1, the mapping var iable would store 1 into the repository. 4. Now let us assume the session is run after five load into the source, the seq uel would be select * from employee where load_id >1 ; hereby we limit the sourc e read only to the records that have been changed after the initial seed

5. Consecutive runs would take care of updating the load_id & pulling the delta in sequence

Change data capture (CDC) is an approach or a technique to identify changes, onl y changes, in the source. Building an ETL application without CDC is a costly mi ss and usually a backtracking step. Below is a different methods of implementing CDC. Change Data Capture in Informatica : -------------------------------------------------Change data capture (CDC) is an approach or a technique to identify changes, only changes, in the source. Building an ETL application without CDC is a costl y miss and usually a backtracking step. Below is a different methods of implemen ting CDC. Scenario #01: Change detection using timestamp on source rows :: ----------------------------------------------------------------------In this typical scenario the source rows have extra two columns say row_crea ted_time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modified 1. In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime data type 2. Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime); t his steps stores the time at which the Session was started to $$LAST_ETL_RUN_TIM E 3. Use $$LAST_ETL_RUN_TIME in the -- where--- clause of the source SQL. During the first run or initial seed the mapping variable would have a default value a nd pull all the records from the source, like: select * from employee where la st_modified_date >01/01/1900 00:00:000 4. Now let us assume the session is run on 01/01/2010 00:00:000 for initial s eed 5. When the session is executed on 02/01/2010 00:00:000 the sequel would be like : select * from employee where last_modified_date > 01/01/2010 00:00:0 00 , hereby pulling records that had only got changed in between successive run s Scenario #02: Change detection using load_id or Run_id :: -----------------------------------------------------------Under this scenario the source rows have a column say load_id, a positive ru nning number. The load_id is updated as and when the record is updated. 1. In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer data t ype 2. Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the maximum load_id is stored into mapping variable 3. Use $$LAST_READ_LOAD_ID in the -- where -- clause of the source SQL. Durin g the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where loa d_id > 0; Assuming all records during initial seed have load_id =1, the mapping variable would store 1 into the repository. 4. Now let us assume the session is run after five load into the source, the s equel would be select * from employee where load_id >1 ; hereby we limit

the source read only to the records that have been changed after the initial se ed 5. Consecutive runs would take care of updating the load_id & pulling the delta in sequence Change data capture (CDC) is an approach or a technique to identify changes, onl y changes, in the source. Building an ETL application without CDC is a costly mi ss and usually a backtracking step. Below is a different methods of implementing CDC. Change Data Capture in Informatica : -------------------------------------------------Change data capture (CDC) is an approach or a technique to identify chan ges, only changes, in the source. Building an ETL application without CDC is a c ostly miss and usually a backtracking step. Below is a different methods of impl ementing CDC. Scenario #01: Change detection using timestamp on source rows :: ----------------------------------------------------------------------In this typical scenario the source rows have extra two columns say row_ created_time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modifi ed 1. In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime d ata type 2. Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime ); this steps stores the time at which the Session was started to $$LAST_ETL_RUN _TIME 3. Use $$LAST_ETL_RUN_TIME in the -- where--- clause of the source SQL. Du ring the first run or initial seed the mapping variable would have a default val ue and pull all the records from the source, like: select * from employee wher e last_modified_date >01/01/1900 00:00:000 4. Now let us assume the session is run on 01/01/2010 00:00:000 for initi al seed 5. When the session is executed on 02/01/2010 00:00:000 the sequel w ould be like : select * from employee where last_modified_date > 01/01/2010 00: 00:000 , hereby pulling records that had only got changed in between successive runs Scenario #02: Change detection using load_id or Run_id :: -----------------------------------------------------------Under this scenario the source rows have a column say load_id, a positiv e running number. The load_id is updated as and when the record is updated. 1. In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer da ta type 2. Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the max imum load_id is stored into mapping variable 3. Use $$LAST_READ_LOAD_ID in the -- where -- clause of the source SQL. D uring the first run or initial seed the mapping variable would have a default va lue and pull all the records from the source, like: select * from employee where load_id > 0; Assuming all records during initial seed have load_id =1, the mapp ing variable would store 1 into the repository. 4. Now let us assume the session is run after five load into the source, t he sequel would be select * from employee where load_id >1 ; hereby we l imit the source read only to the records that have been changed after the initia l seed

5. Consecutive runs would take care of updating the load_id & pulling the d elta in sequence

Das könnte Ihnen auch gefallen