Sie sind auf Seite 1von 4

Strategies for Incremental Load

05/11/2008

Hi-Tech ISU

Manish Gupta
manish.g@tcs.com
Requirement

The requirement is to migrate data from Table A in database to Table B in data


warehouse.

Assumptions

There will be both inserts and updates in the Table A. These changes must be
reflected in Table B.

Solution

• There will be one initial load which will move all the historical records from Table
A to Table B.
• During subsequent incremental loads, new records will be copied and updated
records will be modified from Table A to Table B.

TCS Public
Strategies

There are in all 4 strategies which can be used for data migration during incremental
loads –

1. Complete Refresh of Data


The approach is to delete the data warehouse table (Table B) and copy the entire
source data (Table A) over each load.
Pros: This is the easiest approach as no logic is required to find delta records. Also,
the same code will be used for initial and delta load.
Cons: This is not a practical approach for large volume databases.

2. Compare all records in both database and data warehouse and write the
Deltas
The approach is to compare each field in the source (Table A) with the fields in the
data warehouse (Table B), identify the changes and insert/update the records in the
data warehouse (Table B).
Pros: None
This approach may be necessary if there is no column in source table (Table A) to
identify delta records but complete refresh of data is a better option.
Cons: This is not a practical approach for large volume databases.

3. Identify and Process Delta records


The approach is to identify new/updated records in the source table (Table A) and
write them to the data warehouse table (Table B).
Pros: This can be the best approach if we are able to identify delta records (new and
updated) as only few records will be processed subsequently.
Cons: There can be few problems in implementation of this strategy as
• There might not be any column in source data (Table A) to identify delta
records.
• Logic needs to be implemented to identify delta records.
• There will be separate code for Initial and Delta load.

TCS Public
4. Real Time Load
The approach is to load an intermediate table with delta records only as soon as
there are new/updated records in source table (Table A). The intermediate table can
be loaded using some trigger on source table (Table A) or some other methodology.
Pros: This is the best and easiest approach as we need to process only intermediate
table.
Cons: Tables similar to source (Table A) needs to be created at source.

TCS Public

Das könnte Ihnen auch gefallen