Beruflich Dokumente
Kultur Dokumente
Executive Overview
Todays integration project teams face the daunting challenge that, while data volumes are exponentially growing, the need for timely and accurate business intelligence is also constantly increasing. Batches for data warehouse loads used to be scheduled daily to weekly; todays businesses demand information that is as fresh as possible. The value of this realtime business data decreases as it gets older, latency of data integration is essential for the business value of the data warehouse. At the same time the concept of business hours is vanishing for a global enterprise, as data warehouses are in use 24 hours a day, 365 days a year. This means that the traditional nightly batch windows are becoming harder to accommodate, and interrupting or slowing down sources is not acceptable at any time during the day. Finally, integration projects have to be completed in shorter release timeframes, while fully meeting functional, performance, and quality specifications on time and within budget. These processes must be maintainable over time, and the completed work should be reusable for further, more cohesive, integration initiatives. Conventional Extract, Transform, Load (ETL) tools closely intermix data transformation rules with integration process procedures, requiring the development of both data transformations and data flow. Oracle Data Integrator (ODI) takes a different approach to integration by clearly separating the declarative rules (the what) from the actual implementation (the how). With ODI, declarative rules describing mappings and transformations are defined graphically, through a drag-and-drop interface, and stored independently from the implementation. ODI automatically generates the data flow, which can be fine-tuned if required. This innovative approach for declarative design has also been applied to ODI's framework for Changed Data Capture. ODIs CDC moves only changed data to the target systems and can be integrated with Oracle GoldenGate, thereby enabling the kind of real time integration that businesses require. This technical brief describes several techniques available in ODI to adjust data latency from scheduled batches to continuous real-time integration.
Introduction
The conventional approach to data integration involves extracting all data from the source system and then integrating the entire setpossibly using an incremental strategyin the target system. This approach, which is suitable in most cases, can be inefficient when the integration process requires real-time data integration. In such situations, the amount of data involved makes data integration impossible in the given timeframes.
Basic solutions, such as filtering records according to a timestamp column or changed flag, are possible, but they might require modifications in the applications. In addition, they usually do not sufficiently ensure that all changes are taken into account. ODIs Changed Data Capture identifies and captures data as it is being inserted, updated, or deleted from datastores, and it makes the changed data available for integration processes.
Oracle has various solutions for different real-time data integration use cases. Query offloading, high availability/disaster recovery, and zero-downtime migrations can be handled through the Oracle GoldenGate product that provides heterogeneous, non-intrusive and highly performant changed data capture, routing, and delivery. In order to provide no to low latency loads, ODI has various alternatives for real-time data warehousing through the use of CDC mechanism, including the integration with Oracle GoldenGate. This integration also provides seamless operational reporting. Data federation and data service use cases are covered by Oracle Data Service Integrator (ODSI).
Mini-Batch
Data is loaded incrementally using intra-day loads.
Micro-Batch
Real-Time
Source changes Source changes are captured and are captured and accumulated to immediately be loaded in applied to the intervals. DW. Hourly or higher 15min & higher sub-second Filter Query CDC CDC Pull Push, then Pull Push Low Impact, load frequency is tuneable Queries at peak Some to none depending on CDC times necessary technique
See also: Real-Time Data Warehousing: Challenges and Solutions by Justin Langseth (http://dssresources.com/papers/features/langseth/langseth02082004.html)
ODI supports each of the described data warehouse load architectures with its modular Knowledge Module architecture. Knowledge Modules enable integration designers to separate the declarative rules of data mapping from selecting a best practice mechanism for data integration. Batch and Mini-Batch strategies can be defined by selecting Load Knowledge Modules (LKM) for the appropriate incremental load from the sources. MicroBatch and Real-Time strategies use the Journalizing Knowledge Modules (JKM) to select a CDC mechanism to immediately access changes in the data sources. Mapping logic can be left unchanged for switching KM strategies, so that a change in loading patterns and latency does not require a rewrite of the integration logic. Methods for Tracking Changes using CDC ODI has abstracted the concept of CDC into a journalizing framework with a JKM and journalizing infrastructure at its core. By isolating the physical specifics of the capture process from the process of detected changes, it is possible to support a number of different techniques that are represented by individual JKMs: Database Triggers
Source Target
ODI Load
J$
JKMs based on database triggers define procedures that are executed inside the source database when a table change occurs. Based on the wide availability of trigger mechanisms in databases, JKMs based on triggers are available for a wide range of sources such as Oracle DB, IBM DB2/400 and UDB, Informix, Microsoft SQL Server, Sybase, and others. The disadvantage is the limited scalability and performance of trigger procedures, making them optimal for use cases with light to medium loads.
Database Log-Facilities
Source ODI Load Target
S
Log
J$
Oracle Streams
Some databases provide APIs and utilities to process table changes programmatically. Oracle DB provides the Streams interface to process log entries and store them in separate tables. Such log-based JKMs have better scalability than trigger-based mechanisms, but still require changes to the source database. ODI also supports log-based CDC on DB2/400 using its journals. Non-invasive CDC through Oracle GoldenGate
Source Staging ODI Load Target
S
Log GoldenGate Figure 3: GoldenGate-based CDC
J$
Real-Time Reporting
Oracle GoldenGate provides a CDC mechanism that can process source changes noninvasively by processing log files of completed transactions and storing these captured changes into external Trail Files independent of the database. Changes are then reliably transferred to a staging database. The JKM uses the metadata managed by ODI to generate all Oracle GoldenGate configuration files, and processes all GoldenGate-detected changes in the staging area. These changes will be loaded into the target data warehouse using ODIs declarative transformation mappings. This architecture enables separate real-time reporting on the normalized staging area tables in addition to loading and transforming the data into the analytical data warehouse tables.
Oracle CDC Adapters ODI provides separate CDC adapters that cover legacy platforms such as Microsoft SQL Server, DB2/390, VSAM CICS, VSAM Batch, IMS/DB and Adabas. These adapters provide performance by capturing changes directly from the database logs. Source databases supported for ODI CDC
Database JKM Oracle GoldenGate Log-based CDC Database Log Facilities Oracle CDC Adapters Triggerbased CDC
Oracle MS SQL Server Sybase ASE DB2/UDB DB2/400 DB2/390 Informix, Hypersonic DB Teradata, Enscribe, MySQL, SQL/MP, SQL/MX DB2/390, VSAM CICS, VSAM Batch, IMS/DB, Adabas Publish-and-Subscribe Model The ODI journalizing framework uses a publish-and-subscribe model. This model works in three steps:
2,3 2 2 2 2
2 3
Requires customization of Oracle GoldenGate configuration generated by JKM. MySQL support is planned for Oracle GoldenGate 11g
1. An identified subscriber, usually an integration process, subscribes to changes that might occur in a datastore. Multiple subscribers can subscribe to these changes. 2. The Changed Data Capture framework captures changes in the datastore and then publishes them for the subscriber. 3. The subscriberan integration processcan process the tracked changes at any time and consume these events. Once consumed, events are no longer available for this subscriber. ODI processes datastore changes in two ways: Regularly in batches (pull mode)for example, processes new orders from the Web site every five minutes and loads them into the operational datastore (ODS) In real time (push mode) as the changes occurfor example, when a product is changed in the enterprise resource planning (ERP) system, immediately updates the on-line catalog
Subscribe Capture/Publish
Orders
Order #5A32
CDC
Consume Consume Order #5A32
Integration Process 1
Target 1
Integration Process 2
Subscribe
Target 2
Processing the Changes ODI employs a powerful declarative design approach, Extract-Load, Transform (E-LT), which separates the rules from the implementation details. Its out-of-the-box integration interfaces use and process the tracked changes.
Developers define the declarative rules for the captured changes within the integration processes in the ODI Designer graphical user interfacewithout having to code. With the ODI Designer, customers declaratively specify set-based maps between sources and targets, and then the system automatically generates the data flow from the set-based maps.
The technical processes required for processing the changes captured are implemented in ODIs Knowledge Modules. Knowledge Modules are scripted modules that contain database and application-specific patterns. The runtime then interprets these modules and optimizes the instructions for targets. Ensuring Data Consistency Changes frequently involve several datastores at one time. For example, when an order is created, updated, or deleted, it involves both the orders table and the order lines table. When processing a new order line, the new order to which this line is related must be taken into account. ODI provides a mode of tracking changes, called Consistent Set Changed Data Capture, for this purpose. This mode allows you to process sets of changes that guarantee data consistency.
Operational Source(s)
Customer Use Case: Overstock.com Overstock.com is using both GoldenGate and ODI to load customer transaction data into a real-time data warehouse. GoldenGate is used to capture changes from its retail site, while ODI is used for complex transformations in its data warehouse. Overstock.com has seen the benefits of leveraging customer data in real time. When the company sends out an e-mail campaign, rather than waiting one, two, or three days, it immediately sees whether consumers are clicking in the right place, if the e-mail is driving consumers to the site, and if those customers are making purchases. By accessing the data in real time using Oracle GoldenGate, Overstock.com can immediately track customer behavior, profitability of campaigns, and ROI. The retailer can now analyze customer behavior and purchase history to target marketing campaigns and service in real time and provide better service to its customers.
Conclusion
Integrating data and applications throughout the enterprise, and presenting a consolidated view of them, is a complex proposition. Not only are there broad disparities in data structures and application functionality, but there are also fundamental differences in integration architectures. Some integration needs are data oriented, especially those involving large data volumes. Other integration projects lend themselves to an event-oriented architecture for asynchronous or synchronous integration. Changes tracked by Changed Data Capture constitute data events. The ability to track these events and process them regularly in batches or in real time is key to the success of an eventdriven integration architecture. ODI provides rapid implementation and maintenance for all types of integration projects.
Best Practices for Real-time Data Warehousing May 2010 Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 oracle.com 0510 Copyright 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.