Sie sind auf Seite 1von 9

Oracle CDC tables from BODS

created by Siji Mathew on 9-Jul-2013 11:41 AM, last modified by Siji Mathew on 9-Jul-2013 12:01 PM
Version 1
inShare0
Tweet

ORACLE CDC
Whenever a change is made in the source database, the information is written into
database logs first and later, when it is convenient for the database, in the real database
file itself. This is required to ensure maximum performance. And when you need to
know what rows got changed in the source, wouldn't it be perfect to just browse the
database logs for that? The information is there, you just can't read it unless the
database vendor opened up support for that. Like Oracle did in version 10...
This document aims to give an introduction to utilizing Oracle CDC within Data
Integrator. The part A explains Oracle CDC and what is needed for its setup; part B
details how Data Integrator uses this functionality. It is assumed that the CDC section of
the Data Integrator Technical Manual will be consulted as well.
Part A
Data extraction is an integral part of all data warehousing. Data is often extracted on a
nightly basis from the transactional
systems and transported to the data warehouse. Typically, all the data in the data
warehouse is refreshed with data extracted from the source system. But this involves
the extraction and transportation of huge volumes of data and is very expensive in both
resources and time.
Since the data extraction takes place daily, it would be much more efficient to extract
and transport only the data that has changed
since the last extraction. However, in most source systems, it is extremely difficult, if not
impossible, to identify and extract only the recently changed data.
Beyond the challenge of identifying the recently changed data, many extraction,
transformation, and loading (ETL) environments involve one source system feeding
data into multiple target systems. It is also a challenge to sync up change data from one
source to many targets.
The Oracle Change Data Capture (CDC) feature simplifies the process of identifying the
change data since the last extraction. If you are a DBA or a developer involved in a data
integration project that involves the regular extraction and transportation of a large
amount of data from one system to another system or systems, consider CDC to help
reduce your ETL time.

The CDC feature was introduced in Oracle9i Database. CDC helps identify the data in
the source system that has changed since the last extraction. With CDC, data extraction
takes place at the same time the INSERT,UPDATE, or DELETE operations occur in the
source tables, and the change data is stored inside the database in change tables. The
change data, thus captured, is then made available to the target systems in a controlled
manner, using database views.
CDC creates the subscriber view to control accessibility of the change table data to the
target systems. A target system is any
application that wants to use the data captured from the source system.
The Publisher and the Subscriber
The CDC architecture is based on the publisher/subscriber model. The publisher
captures the change data and makes it available to the subscribers. The subscribers
utilize the change data obtained from the publisher. Usually, a CDC system has one
publisher and many subscribers.
The publisher first identifies the source tables from which the change data needs to be
captured. It then captures the change data
and stores it in specially created change tables. Then it allows the subscribers
controlled access to the change data.
The subscriber needs to know what change data it is interested in. A subscriber may not
be interested in all the data that is published by the publisher. The subscriber creates a
subscriber view to access the change data to which it has been granted access by the
publisher.

Setting Up a CDC Environment


The CDC feature is part of the Oracle9i (and later) Database software; you don't need to
install any software other than Oracle9i
Database to use the CDC feature. To make use of Oracle Database's CDC feature,do
the following:

Enable Java (the Java virtual machine JVM) in the database. Refer to the
Oracle9i Database Java Developer's Guide for
detailed instructions on configuring Oracle JVM. When creating a database using
Database Configuration Assistant (DBCA), you can choose to install Oracle JVM.

If synchronous CDC should be used (Oracle 9i supports this trigger based


approach only), enable system triggers. When
you create a database, system triggers are enabled by default.

If asynchronous mode is used, the database has to be brought into archive log

mode

In the command prompt type sqlplus system/password as


sysdba
shutdown immediate;
startup mount;
alter system set log_archive_dest_1 =
'D:\app\unameit\product\11.1.0\db_1\admin\sample\adump';
alter database archivelog;
alter database open;

For the purposes of this article, I will use a sales application example. The source
application is a transactional system that records the information on each sale. The
tables for the transactional application are owned by schema owner SALES_OP.

A publisher is a database user who creates and maintains the change tables. As
mentioned earlier, the responsibility of the publisher is to identify and extract the change
data and make it available to the subscribers. To be able to do that, the following two
database roles must be granted to the publisher:

SELECT_CATALOG_ROLE

o
o
o

EXECUTE_CATALOG_ROLE
For simplicity, let the user SALES_OP assume the role of the publisher.
Since SALES_OP owns the source table, it is easier to designate this user as the
publisher. However, you can designate any other database user as the publisher. In
situations where tables owned by multiple database users constitute the source tables,
it is better to designate another user as the publisher. In such a situation, the publisher
needs the SELECT privilege on the source tables.The database user SALES_OP owns
the source table PRICE_LIST, which contain the price list of the items sold.
Then create script:

GRANT CONNECT, RESOURCE, SELECT_CATALOG_ROLE,


EXECUTE_CATALOG_ROLE
TO sales_op IDENTIFIED BY sales_op;
CREATE TABLE sales_op.PRICE_LIST (

PRODUCT_ID VARCHAR2(20) PRIMARY KEY,


UOM VARCHAR2(20) NOT NULL,

PRICE_PER_UOM NUMBER(10,2) NOT NULL);

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Once your database is ready for CDC, you can follow this
step-by-step approach to set up a CDC environment
Identify the source tables.
Set up a publisher.
Create change tables.
Set up a subscriber.
Subscribe to the source tables, and activate the subscription.
Set up the CDC window.
Prepare a subscriber view.
Access data from the change tables.
Drop the subscriber view, and purge the CDC window.
Repeat steps 6 through 9 to see new data.
But actually, Data Integrator Designer provides a wizard for that as shown in the next
chapter.
Create a CDC datastore
Create a CDC enabled datastore

Open this datastore and - if no table has been imported yet - a wizard will be brought up
to create the publisher via Designer. You can start the same wizard manually by clicking
on "New" in the popup menu. But make sure the datastore workspace screen is set the
"external metadata".
First, we have to choose the mode of the publishing. This can be either
Synchronous: Is trigger based, only option for Oracle 9i
Asynchronous Hotlog: RedoLog reading will be done on the source database
Asynchronous AutoLog: RedoLogs are sent by the Oracle Archiver to another
machine and processed there. If this connection would fail, the Archiver cannot continue
writing the archive log files and if not resolved quickly, the entire source database will be
blocked: Archiver stuck.
So for Oracle 10, in most cases Asynchronous HotLog will be used, for Oracle 9i

Synchronous is the only option.

After choosing the table to publish...

...we can define the CDC tablename and the content. Thedefault will typically be fine,
the same owner and the tablename CDC__[sourcetablename]

In the last screen (if Asynchronous was chosen) you add this change table to an
existing or a new change set. A change set is used synchronizing one transaction

across multiple CDC tables.

In case you get an ORA-01031: insufficient permissions, I am still analyzing that one. If
you grant all 100 system privileges of the DBA role to the user directly, it is working. If
you just grant the dba role it is not.
Creating the Job
Every morning, the changes made in the source tables must be extracted and made
available to the data warehouse application.
The dataflow to accomplish that is very simple. We take all the changed records and
feed them into the Map_CDC transform to convert the rows into records with the correct
OPCode.

The source CDC reader does have some special flags. The CDC subscription name is
an arbitrary name under which the database knows the DI objects. For this subscription

the database will store what records have been read already and if the flag enable
checkpoint" is set, will provide us with the new records only. So there is no need to deal
with from to dates for the delta load interval, based on the subscription name and the
table we will get new row only.

And that's it. From now on you will get the changed records of this table only, no
duplicates, very fast.
Just to mention, there is one important limitation. You can have one CDC source table
per dataflow only. So if you join two source tables into one target table, you have to
build two dataflows

Das könnte Ihnen auch gefallen