Beruflich Dokumente
Kultur Dokumente
Data Change Capture Process: Full Load vs Incremental Load ClearPeaks Blog
www.clearpeaks.com/blog/etl/data-change-capture-process-full-load-vs-incremental-load
1/4
1/21/13
Determine which changes to capture:
Data Change Capture Process: Full Load vs Incremental Load ClearPeaks Blog
In this case the data of the tables from the data sources have modifications every day related to the previous day, which is why it has to determine which changes the process has to capture in order to have the data updated every day. The change that has to be captured is the value column of the fact table. It has to compare the value of the incoming row with the value of the existing row. Find an example below:
Design a method to identify changes. The method in this case is identified if the incoming row already exists in the database; if it exists update the value for the value of the incoming row. It compares the values of the columns of the primary key or unique key of the fact table with the incoming data. Determine which changes should be updates and which should be inserts Taking into account the result of the comparison of the columns of the primary key or unique key of the rows, it will label the row with a flag. > If not exist -> FLAG=Insert > If exists -> FLAG=Update Take a look at the time stamping of the rows where you want to do the changes Regarding the requirements and the analysis of the data sources, the modifications can be done up to a maximum of the last 3 months. The process checks if there are changes of the data of the last 3 months avoiding having to check all of the data. The change data capture process will only check changes in the fact table where the data of the incoming row is after the last 3 months of last time date captured This picture illustrates the diagram:
www.clearpeaks.com/blog/etl/data-change-capture-process-full-load-vs-incremental-load
2/4
1/21/13
Data Change Capture Process: Full Load vs Incremental Load ClearPeaks Blog
In conclusion we can deduce that incremental loading is more efficient than full reloading unless the operational data sources happen to change dramatically. Thus, incremental loading is generally preferable. However, the development of ETL jobs for incremental loading is ill-supported by existing ETL tools. In fact, currently separate ETL jobs for initial loading and incremental loading have to be created by ETL programmers. Since incremental load jobs are considerably more complex their development is more costly and error-prone. To overcome this obstacle in this scenario we proposed the Change Data Capture (CDC) technique.
0 0 0 736
1.
Stephen Coleman says: June 29, 2011 at 6:35 pm This is a great overview of the importance of the incremental load when considering performance costs of truncating/inserting large datasets. ETL tools such as Oracle Warehouse Builder have the ability to set table loading to insert/update that will support both full load and incremental load with the use of the same ETL routines. The key to supporting this is a created table in the staging layer to join to source tables based upon update or create dates of the record.
2.
Harold Jackson says: June 29, 2011 at 9:37 pm Im coming into this discussion a little late, but the answer seems obvious. Why would any solution that required the complete data set to be loaded over and over even be considered? The data from both of these applications resides in Oracle databases. The first thing that I would investigate is employing Oracles Replication technolgy to replicate the transactional data to the staging tables in real time. Every insert, update or delete is immediately captured. The I/O load will be light and latency between systems will be low. Finally, the data will be fresher than waiting for some big process to finish uploading a whole new copy of the data..
3.
Daniel says: June 30, 2011 at 11:42 am Thanks for your comment Stephen, We have a specialized ETL Team and we have applied the technique of CDC for several customers using diferent tools.
4.
Daniel says: June 30, 2011 at 2:36 pm Thanks for your comment Harold, The solution to achieve it could be Oracle GoldenGate tool. It provides high speed data replication between heterogeneous platforms and it allows you to capture and deliver real-time change data to data warehouses.
Leave a Comment
Name (required)
www.clearpeaks.com/blog/etl/data-change-capture-process-full-load-vs-incremental-load
3/4
1/21/13
Data Change Capture Process: Full Load vs Incremental Load ClearPeaks Blog
Mail (will not be published) (required)
Website
Blog Categories
Academy (17) Analytics (1) Customer Success Stories (2) Data Management (1) Data Warehousing (8) ETL (11) Events (18) Exalytics (1) General (18) Oracle BI EE (36) Oracle BI EE 11g (26) Reporting (3) Webinars (11)
www.clearpeaks.com/blog/etl/data-change-capture-process-full-load-vs-incremental-load
4/4