Beruflich Dokumente
Kultur Dokumente
A: Before refreshing a set of tables from source to target, the subscription must
be stopped if currently in mirroring status.
Once the subscription is stopped, a number of tables can be selected for a
REFRESH operation.
A: CDC can REFRESH a set of selected tables as one operation, but will only
process a REFRESH for one table at a time within a single subscription. To
perform parallel refresh, multiple subscriptions can be used.
A: CDC can support REFRESH on any tables which are supported for mirroring.
A: Practically, yes. The CDC source engine REFRESH query may cause the
source database to perform significant read I/O during the REFRESH. The
database may take many hours to perform a "table scan" operation to provide
rows. Scanning large tables can cause significant disk I/O contention for
databases which have changing data, or maintenance operations which are
active during the REFRESH operation.
A: Database extract / import, backup / restore, insert into select * from (remote
database linked table), operations are used by customers instead of REFRESH
when these options are more suitable.
Q: Does CDC use bulk extract utilities to obtain rows for REFRESH?
A: CDC uses a SQL query to obtain rows during REFRESH. CDC does not use
bulk extract.
A: CDC queries the source database tables directly, and does not have an option
to redirect queries to backup, standby or tables stored on different databases.
The source table mapped for a subscription is the one queried.
Q: Does CDC require source database tables to be quiescent during the
Refresh?
A: CDC has "REFRESH while active" logic that allows REFRESH during periods
where the source database is processing changes (Insert, Update, Delete) to
tables involved in the REFRESH operation.
A: CDC uses very little CPU on the source system during standard REFRESH
operations. The majority of the processing is disk read I/O on the tables involved
in the REFRESH.
A: Oracle specific : CDC will use a method which opens a transactional read-only
snapshot query on the REFRESH operations table which will cause the undo
space in the source database to be utilized during the length of the query. The
DBA should review this as very large tables (>50M rows) which are changed
freqeuently (>100 changes per second) can require larger amounts of undo
space.
Q: When does the target table get truncated for the refresh?
A1. The CDC source sends "START_REFRESH" message before selecting rows
from table. The CDC target will do the truncate when received "START
REFRESH" message. The CDC source then sends data records for the target
table which are applied to completion or error.
Q: Are there any recommendations for maintenance procedures or other
operations to be commenced before a REFRESH?
A: CDC will query the source database leading to increased disk read I/O for
those tables which will be part of the REFRESH. CDC will apply these changes
to the target database tables. If you have database maintenance procedures
such as backup, re-index, or other disk intensive operations scheduled, these
may cause some level of disk I/O contention. The DBA team should review the
opportunity to schedule the REFRESH of large tables when it best suits the
source and target databases.
Q: How is data loaded into a DB2 UDB LUW database by the CDC DB2
target engine REFRESH?
A: CDC DB2 uses the DB2 bulk load utility to INSERT the refreshed rows into the
target database. This behavior can be changed to use a JDBC SQL based
INSERT operation if the customer decides not to use bulk load. Note however
that bulk load is by far the fastest method of loading refresh data into a target
database in most cases.
Q: How is data loaded into a Oracle database by the CDC Oracle target
replication engine during a REFRESH?
A: CDC Oracle uses the Oracle OCI DirectPathLoad bulk loader API to INSERT
the refreshed rows into the target database. The OCI DirectPathLoad API avoids
staging bulk load files on disk by utilizing in-memory loading. This behavior can
be changed to use a JDBC SQL based INSERT operation if the customer
decides not to use bulk load. Note however that bulk load is by far the fastest
method of loading refresh data into a target database in most cases.
Q: How is data loaded into a Teradata database by the CDC Teradata target
replication engine during a REFRESH?
A: CDC Teradata uses the Teradata FASTLOAD bulk load utility to INSERT the
refreshed rows into the target database. This behavior can be changed to use a
JDBC SQL based INSERT operation if the customer decides not to use bulk
load. Note however that bulk load is by far the fastest method of loading refresh
data into a target database in most cases.
Q: How is data loaded into DB2/z databases by the CDC target replication
engine during a REFRESH?
A: CDC uses DB2-CLI API with SQL based batch INSERT operations to load the
target table.
Q: How is data loaded into DB2/400 (iSeries, IBM i) databases by the CDC
target replication engine during a REFRESH?
A: CDC DB2/400 populates the target table file directly using native DB2/400 I/O
operations which avoid the SQL libraries. This method has the highest
performance for loading data into the database.
Q: How is data loaded into other databases by the CDC target replication
engine during a REFRESH?
A: CDC uses the native database bulk load utility to INSERT the refreshed rows
into the target database. This behavior can be changed to use a JDBC SQL
based INSERT operation if the customer decides not to use bulk load. Note
however that bulk load is by far the fastest method of loading refresh data into a
target database in most cases.
A. The order in which each individual table is refreshed is based on the group
order. Group order is set via Management Console. If all tables have the same
group order, then they'll be used as the same order they're stored in the CDC
metadata.
Q: How does CDC support REFRESH for tables with referential integrity?
A. Use the Table Group order facility via Management Console to organize the
order of tables to REFRESH to keep within the constraints imposed on the
tables.
At least with 6.3, I believe refresh of tables with RI is not supported with the
default configuration. The user would have to set some system parameters. See
JIRA JUDB-1275 and JORA-1174
A: CDC will attempt to use the native bulk load interface supported by a particular
database platform and release, Bulk load operations are typically not logged,
and many databases have implemented short cuts to load data within the tables
faster than for SQL based INSERT operations.
Q: When does CDC switch from bulk load to the SQL method?
When table mapping is set for Live Audit, CDC does not bulk load as the audit
table needs to be appended to and not re-loaded.
The presence of LOB columns will cause CDC target to use JDBC loader on
some platforms such as SQL Server where the bulk load interface does not
support LOB. Other platforms such as Oracle support LOB during OCI
DirectPathLoad.
The JDBC apply will be used if user exits are configured as CDC cannot know if
the user exit is referencing data in the target table which would necessitate that
rows be inserted in transactions and therefore immediately visible to the user exit
code.
The JDBC apply will be selected if target columns have non-ASCII character
names on platforms such a SQL Server where the bulk load interface has such
limitations.
For medium tables which comfortably fit on DASD, the LOAD method is typically
more optimal than using SQL based INSERTS.
For very large tables, the significant disk resource requirement of staging the
entire LOAD file may not match existing resource availability, and may not
perform significantly better than the default SQL based REFRESH.
CDC for DB2/z and CDC for DB2/400 do not drop indexes prior to load.
CDC for DB2 UDB by default does not drop indexes, but can be configured by
system parameter to optionally drop and re-create indexes when bulk loading.
CDC for Oracle will drop indexes prior to load and recreate them afterwards.
CDC for SQL Server and CDC for Sybase will drop indexes prior to load and
recreate them afterwards when using bulk loader.
A: CDC may drop indexes on the target table prior to loading rows depending on
the load method available. When REFRESH completes, CDC will recreate any
indexes it had previously dropped one at a time until all indexes are recreated.
CDC then moves on to the next table to REFRESH. To optimize the loading of
multiple tables, and especially those with many indexes, manually drop indexes
except the primary key on the target tables prior to REFRESH. When CDC has
notified that the table has finished a REFRESH operation, manually perform
index re-create outside of CDC product. While some databases provide for
parallel index recreation, this may result in CPU and I/O bottlenecks on the target
database.
A: The CDC target engine writes an entry to the DM_BOOKMARK table when
changing records in the target database. When the CDC recursion prevention
feature is enabled, the CDC log scraper detects when a transaction contains a
change to the DM_BOOKMARK table, and discards these transactions which
originated from CDC. The only time CDC does not write an entry to the
DM_BOOKMARK table during replication is during REFRESH bulk load
operations which are not logged and therefore would not be replicated by CDC.
Q: What is "Differential Refresh" and how does it work?
A: Standard REFRESH method will truncate the target table before bulk loading
rows. Differential method available as of CDC 6.2+ keeps the target table online
during the REFRESH operation.
User interface
-Management Console option on refresh
-Command line
-Table by table
-User initiated
A: CDC source and target (in a future release) may automatically detect and
support the case where during a REFRESH operation on a table, primary keys
are modified by an UPDATE statement on the table. The scenario of UPDATE
changing a primary key value is rare and has only ever been reported by
customers a few times. The workaround is to refresh such tables when the
affected database tables are idle, meaning, no UPDATE on keys being
performed.
A: CDC source and target (in a future release) may support refreshing a subset
of the table via a WHERE clause that can be specified to reduce the amount of
data replicated during REFRESH. This is useful for tables with many partitions,
where only the newest data needs to be refreshed due to an operational issue
related to data which was newly replicated / in-scope.
A: CDC target (in a future release) may contain multiple parallel threads to
improve refresh performance for large tables.