Beruflich Dokumente
Kultur Dokumente
com
InfoSphere Data Replication Technical Enablement, CDL, IBM
Q Replication
Performance Tuning & Best Practice
Information Management
Objectives
Abstract
– Q Replication is a high performance log capture / transaction replay replication
technology which uses IBM WebSphere ® MQ to transmit and stage data between
source and target database systems.
– Q Replication can be used in various business scenarios, such as offload reporting, real-
time warehousing, data synchronization, information integration, high availability and
active/active solutions. Its performance is key to achieve success in these usage
scenarios.
– This presentation describes performance challenges that Q Replication faces, and the
best practices to achieve optimal performance.
Objectives
– Understand components that may impact Q Replication performance
– Learn how to identify Q Replication performance bottlenecks
– Learn key performance monitoring area and tuning parameters
– Learn best practice to achieve optimized performance, such as on application design
Outline
Q Replication overview
Performance bottleneck analysis
Tuning source database
Tuning Q Capture
Tuning MQ
Tuning Q Apply
Tuning target database
Application design considerations
Q Replication scalability
Q Replication Overview
Q Replication
Technical characteristics
– Asynchronous replication – not limited by geographical
distance
– Application oriented – can replicate a subset of tables
– Log-based change data capture – lowest impact to source
system
– Only changed data is delivered – minimum data
processing
– Transferring and staging data in IBM WebSphere® MQ
queues – excellent data recovery capability
– Target data always transactional consistent – target data
available at any time
– Parallel data applying – high performance
– Run as a service – no batch window
Q Replication Architecture
Source Target
SOURCE2
SOURCE1
TARGET2
WebSphere WebSphere
MQ MQ
DB Log
A Capture program reads changed data from the database recovery log, and puts data directly into
WebSphere MQ queues
WebSphere MQ delivers data to a target system where an Apply program runs
The Apply program pulls data from queues and applies to target tables
Usage Scenarios
Log-based Capture
… - Information Integration
- Synchronization
Real-time Capture - Distribution and Consolidation
Asynchronous
Changed Data Only 2) One target system for:
- Offload Query/Reporting
- Real-time BI (Dynamic Warehouse)
- Audit
Customers
DB2® Advanced Enterprise Server LUW Free when replicating with another DB2 LUW
Edition server (v97)
Free when replicating with other two DB2 LUW
servers (v10)
InfoSphere® Warehouse LUW Free when replicating with another DB2 LUW
server (v97)
Free when replicating with other two DB2 LUW
servers (v10)
Performance Bottleneck
Analysis
Performance In Q Replication
Site A Site B
Applications WebSphere MQ WebSphere MQ Applications
queue manager queue manager Q Apply agents
execute SQL
Captured DB2 transactions
SQL Transactions
statements from the LOG SQL
Q Capture (1 per MQ msg) MQ transactions
MQGET
DB2 spill file MQ recovery
recovery
log
log (browse)
7 DB2
browser agent
Send/transmit
User queue(s) 5 Receive queue(s)
Q Apply agent
tables User
Q Capture
tables
logrdr 2 4 Channel pruner agent
database MQ Bufferpool MQ Bufferpool 10
recovery publisher (TCP/IP)
MQGET
8 9
log
1 3
MQPUT (delete)
DB2/z IFI 306 MQCMIT 6
DB2/luw logRead API
MQ pageset MQ pageset
Q Replication Latencies
CAPMON APPLYMON
TARGET1
SOURCE3 TARGET1
SOURCE2 TARGET1
SOURCE1
CAPTURE_LATENCY Small Big Big Big Big Small Small Small Small Small Small
QLATENCY Small Small Small Small Small Big Big Big Big Big Big
APPLY_LATENCY Small Small Small Small Small Small Small Small Small Big Big
DBMS_TIME (*) Small Small Small Small Small Small Small Small Small Small Big
LOGREAD_API_TIME Normal Big Normal Normal Normal Normal Normal Normal Normal Normal Normal
LOGRDR_SLEEPTIME Normal Busy Busy Normal Normal Normal Normal Normal Normal Normal Normal
CPU Time
MQPUT_TIME Normal Normal Normal Normal Big Normal Normal Normal Normal Normal Normal
MQGET_TIME Normal Normal Normal Normal Normal Normal Big Normal Big Normal Normal
APPLY_SLEEP_TIME Normal Normal Normal Normal Normal Normal Normal Normal Normal Busy Busy
Capture memory
Normal Normal Normal Full Full Full Full Full Full Full Full
Memory / Storage
(CURRENT_MEMORY)
XMITQDEPTH Small Small Small Small Small Jam Jam Jam Jam Jam Jam
Receive queue depth
Small Small Small Small Small Small Jam Jam Jam Jam Jam
(QDEPTH)
Apply Memory
Normal Normal Normal Normal Normal Normal Normal Normal Normal Full Full
(CURRENT_MEMORY)
15 © 2013 IBM Corporation
DONEMSG row count Normal Normal Normal Normal Normal Normal Normal Normal Big Normal Normal
Information Management
(one row per send queue) MQ_MESSAGES Total number of messages put into MQ by Q Capture
MQPUT_TIME Time spent on MQPUT calls
At the SOURCE database
XMITQDEPTH Number of messages currently in the MQ transmit queue.
IBMQREP_APPLYMON OLDEST_TRANS Q Replication synchronization point - all source transactions prior to this timestamp
have been applied
(one row per receive queue)
ROWS_APPLIED Total number of rows applied to target database.
At the TARGET database
CURRENT_MEMORY The amount of memory used by Q Apply to read transactions
MQGET_TIME Time spent on MQGET calls
QDEPTH Number of messages currently in the MQ receive queue.
END2END_LATENCY Average end-to-end latency time for all transactions applied in this monitor interval -
between source DB commit and target DB commit
Breakdown CAPTURE_LATENCY Latency time spent in Capture – between source DB commit and source MQ commit
QLATENCY Latency time spent in MQ – between source MQ commit and target MQGET
by components
APPLY_LATENCY Latency time spent in Apply – between target MQGET and target DB commit
DBMS_TIME Latency time spent in target database for SQL processing
16 APPLY_SLEEP_TIME Total sleep time of all apply agents in this monitor interval © 2013 IBM Corporation
Information Management
Description
– Performance issue in source database when merging and returning log records to Q
Capture
Typical causes
– Log files are not accessible (e.g., have been moved to tape by automatic archival)
– Log file I/O performance
– Log read contention inside source database
– Performance issues when merging log records from various members
Key symptoms – used to identify the bottleneck
– Log read latency is increasing
– Log reader API time is big
Reference symptoms
– Capture latency and end-to-end latency is increasing
– Log read throughput is low
– Q Capture used memory is low
Database log
– Use separate disk storage for database log
– Use disk striping to improve performance
– Log archival strategy should be adjusted to keep log files in disk before they are
captured
Database parameters (LUW)
– logbufsz
• A value between 64 and 128 pages should be adequate
• Do not set this value to be more than 512 pages to avoid performance degrade
• Do not set this value to be more than 35% of database heap size
Database parameters (z/OS)
– cachedyn = YES
– deallt = NOLIMIT
Tuning Q Capture
Description
– Performance issue in Q Capture log reader thread when requesting for source log
records via log read API and/or constructing source transactions in internal memory
Typical causes
– Q Capture spills monster transactions into disk
– Q Capture job priority is too low
Key symptoms – used to identify the bottleneck
– Log read latency is increasing
– Log reader API time is low
– Log reader sleep time is low (busy)
Reference symptoms
– Capture latency and end-to-end latency is increasing
– Log read throughput is low
– Q Capture used memory is low
Description
– Performance issue in Q Capture publisher thread when constructing MQ messages
and/or publishing them into MQ
– This bottleneck is seldom observed.
Typical causes
– LOB columns whose value needs to be fetched from source tables
– Expensive row filtering (???)
– Too many queues for Q Capture publisher thread to handle
– Too many columns subscribed
– Q Capture job priority is too low
Key symptoms – used to identify the bottleneck
– Log read latency is normal
– Capture latency is increasing
– MQPUT time cost is normal
Reference symptoms
– Capture latency and end-to-end latency is increasing
– Capture throughput is low
– Q Capture used memory is full
– Q Capture publisher thread is busy
22 © 2013 IBM Corporation
Information Management
SLEEP_INTERVAL (Q Capture)
Description
– Defined in IBMQREP_CAPPARMS control table
– How long Q Capture log reader thread sleeps when
• Reaching end of log (EOL)
• Or at the end of an IFI306 call scope
• Or memory usage will exceeds MEMORY_LIMIT
– This parameter can be changed dynamically using Q Capture “chgparms” command
Default value
– 500 milliseconds (0.5 second)
Tuning recommendations
– If the workload volume is high
• Usually no need to tune this parameter, since log reader thread is continually reading
logs
– If the workload volume is low
• Big SLEEP_INTERVAL can reduce CPU usage, but result in big replication latency
• Small SLEEP_INTERVAL can reduce replication latency, but result in higher CPU
usage
MEMORY_LIMIT (Q Capture)
Description
– Defined in IBMQREP_CAPPARMS control table
– The amount of memory Q Capture uses to build transactions
– At most 32,000 transactions are buffered
• For OLTP workloads with small transactions, Q capture may be unable to use
memory upto MEMORY_LIMIT
– If there is a monster transaction which is bigger than MEMORY_LIMIT, Q capture will
spill the transaction data into spill file
• LUW: spilled to a disk file in CAPTURE_PATH
• z/OS: spilled to VIO or file specified by CAPSPILL DD
– This parameter can be changed dynamically using Q Capture “chgparms” command on
LUW, but cannot be changed dynamically on z/OS
Default value
– LUW: 500 megabytes
– z/OS: 0 – Q Capture calculates memory allocation based on region size
Tuning recommendations
– If IBMQREP_CAPMON.TRANS_SPILLED > 0
• Increase MEMORY_LIMIT to avoid transaction spilling
• This is needed usually for large transactions or batch processing,
Best practice
– avoid monster transactions in application design
25 © 2013 IBM Corporation
Information Management
COMMIT_INTERVAL (Q Capture)
Description
– Defined in IBMQREP_CAPPARMS control table
– How often Q Capture publisher thread issues MQCMIT to commit MQPUT operations
– Q Capture publisher issues MQCMIT when
• COMMIT_INTERVAL reached
• 128 source transactions have been put in send/xmit queues since last MQCMIT
This parameter can be changed dynamically using Q Capture “chgparms” command
Default value
– 500 milliseconds (0.5 second)
Tuning recommendations
– If the workload transaction rate (TPS) is high
• Usually no need to tune this parameter, since publisher thread frequently issues
MQCMIT due to the condition 128 source transactions
– If the workload transaction rate (TPS) is low
• Big COMMIT_INTERVAL can reduce CPU usage, but result in big replication latency
• Small COMMIT_INTERVAL can reduce replication latency, but result in higher CPU
usage
26 © 2013 IBM Corporation
Information Management
TRANS_BATCH_SZ (Q Capture)
Description
– Defined in Q Capture command line
– How many source transactions are packaged by Q Capture publisher thread into a single
MQ message
– The purpose is to avoid too small MQ messages
– This single MQ message will be processed as a single source transaction by Q Apply at
target side, resulting in possibly more transaction dependencies
– This parameter can not be changed dynamically
Default value
– 1 (no batching)
Tuning recommendations
– For OLTP systems with small transaction size and high TPS
• Throughput of a single MQ channel is not good when messages are small. In this
situation, increasing TRANS_BATCH_SIZE may create bigger messages and
improve MQ channel throughput.
MAX_MESSAGE_SIZE (Q Capture)
Description
– Defined in IBMQREP_SENDQUEUES control table
– Determine the maximum size of MQ messages that Q Capture will publish into queues
– For transactions whose size is larger than MAX_MESSAGE_SIZE
• The transaction will be split into multiple messages
• The transaction is broken at a row boundary. This requires that each row should not
exceed MAX_MESSAGE_SIZE
• For LOB columns, if LOB_SEND_OPTION=„S‟, LOB columns are sent in separated
messages and segmented as necessary
– This parameter can be changed dynamically using Q Capture “reinitq” command
Default value
– 64 kilobytes
Tuning recommendations
– If there are lots of big transactions bigger than MAX_MESSAGE_SIZE
• This can be concluded if IBMQREP_CAPQMON.MQ_MESSAGES is much bigger
than IBMQREP_CAPQMON.TRANS_PUBLISHED
• Increase MAX_MESSAGE_SIZE, to reduce the number of segmented MQ
messages
30 © 2013 IBM Corporation
Information Management
Description
– Defined in IBMQREP_SUBS control table
– Determine if Q Capture will publish the non-key columns that are not changed
• „Y‟:Q Capture will publish only the key columns and changed non-key columns. This
mode is also called “column suppression”
• „N‟: Q Capture will publish all subscribed columns
– This option impacts UPDATE operation only
– This parameter can be changed dynamically using Q Capture “reinit” command
Default value
– „Y‟
Tuning recommendations
– For tables which have many columns any only a few are updated
• Using CHANGED_COLS_ONLY=„Y‟ can let Q Capture sends out only key and
changed data, resulting in smaller MQ messages and saving network bandwidth
• At target side, CONFLICT_ACTION cannot be „F‟ since the received row data is
incomplete
Tuning WebSphere MQ
Description
– Performance issue in source MQ queue manager when putting MQ messages into
send/transmit queues from Q Capture
Typical causes
– Bad MQ logging performance
– MQ I/O contention (e.g., check point)
– MQ job priority is too low
Key symptoms – used to identify the bottleneck
– Log read latency is normal
– Capture latency is increasing
– MQPUT time cost is big
Reference symptoms
– Capture latency and end-to-end latency is increasing
– Capture throughput is low
– Q Capture used memory is full
– Depth of send/transmit queue is small
Description
– Performance issue in target MQ queue manager when putting MQ messages into
receive queues from channels and reading/getting them out to Q Apply
Typical causes
– Bad MQ logging performance
– MQ I/O contention (e.g., check point)
– MQ job priority is too low
– Target MQ read ahead feature is not enabled
– Target MQ buffer pool shortage
Key symptoms – used to identify the bottleneck
– Q latency is big
– Apply latency is normal
– MQGET time cost is big
– Depth of receive queue is big
Reference symptoms
– End-to-end latency is increasing
– Capture performance (latency and throughput) is also impacted
35 © 2013 IBM Corporation
Information Management
MQ Logging (MQ)
z/OS only
MQ performance will be greatly impacted by buffer pool shortage
– MQ Accounting will show long elapsed time of MQPUT/MQGET operations when there
are lots of messages accumulated in transmit or receive queues
– In such cases, MQPUT/MQGET will involve disk I/O operations
– A buffer pool page is written to disk page set when
• At MQ checkpoint or shutdown
• 15% free threshold - causes async write
• 5% free pages - causes sync write
– Currently the maximum buffer pool size is limited
• Usually 1GB - limited by 31-bit MQ address space
Tuning recommendations
– Allocate a buffer pool as large as possible in advance
– Enable MQ read-ahead feature, which can improve MQPUT/MQGET performance a lot
when MQ buffer pool is full
Description
– Number of messages that are committed by the queue manager between two synch
points.
– A channel batch is committed when:
• BATCHSZ messages have been sent, or
• The transmission queue is empty and BATCHINT is exceeded
– Big BATCHSZ will reduce the number of MQ commits
Default value
– 50
Tuning recommendations
– If the workload transaction rate (TPS) is low
• Usually no need to tune this parameter.
– If the workload transaction rate (TPS) is high
• Use a bigger value for BATCHSZ, usually between 50 and 640, to reduce number of
commits and improve channel throughput, with a trade off of more consumed
memory for uncommitted messages.
Tuning Q Apply
Description
– Performance issue in Q Apply browser thread when getting MQ messages from receive
queues and construct transactions in internal memory
Typical causes
– Q Apply bad performance when calculating transaction dependencies
– Big waiting due to synchronization between different consistency groups
– Q Apply job priority is too low
Key symptoms – used to identify the bottleneck
– Q latency is big
– Apply latency is normal
– MQGET time cost is small
– Depth of receive queue is big
Reference symptoms
– End-to-end latency is increasing
– Capture performance (latency and throughput) is also impacted
Description
– Performance issue in Q Apply agent threads when applying data changes to target
database
– This bottleneck is seldom observed.
Typical causes
– Q Apply job priority is too low
Key symptoms – used to identify the bottleneck
– Apply latency is big
– Target DB2 response time is normal
– Depth of receive queue is big
– Q Apply used memory is full
Reference symptoms
– End-to-end latency is increasing
– Q latency is big
– Capture performance (latency and throughput) is also impacted
Description
– Performance issue in Q Apply pruner/housekeeping thread when pruning applied
transactions from IBMQREP_DONEMSG table and/or receive queues.
– This bottleneck is seldom observed.
Typical causes
– Bad performance when deleting records from IBMQREP_DONEMSG table.
– Bad performance when deleting MQ messages from receive queues.
– Q Apply job priority is too low
Key symptoms – used to identify the bottleneck
– Q Apply latency is normal
– Q Apply throughput is normal or equal to Q Capture throughput
– Row count of IBMQREP_DONEMSG is big
Reference symptoms
– Depth of receive queue is big
– Q Apply used memory is normal
NUM_APPLY_AGENTS (Q Apply)
Description
– Defined in IBMQREP_RECVQUEUES control table
– The number of Q Apply agent threads that concurrently apply transactions to target
database for a single receive queue (consistent group).
– Q Apply agents fetches independent transactions from internal WORKQ and apply them
in parallel
– This parameter can be changed dynamically using Q Apply “reinitq” command
Default value
– 16 agents
Tuning recommendations
– More agents may increase apply throughput, but may cause more contention at target
database.
– Start tuning from the default value (16 agents)
– If the agent threads are busy (IBMQREP_APPLYMON.APPLY_SLEEP_TIME is nearly
0), increase the number of agent threads
– If there are lock contention in target database between Q Apply agents, decrease the
number of agent threads to avoid unnecessary CPU usage
MEMORY_LIMIT (Q Apply)
Description
– Defined in IBMQREP_RECVQUEUES control table
– The amount of memory that a Q Apply program can use as a buffer to process
transactions from one receive queue (consistency group).
– When the used memory exceeds MEMORY_LIMIT, Q browser thread will stop reading
more messages from receive queue
– If a single transaction is bigger than MEMORY_LIMIT, Q Apply will apply this transaction
in serial mode
– This parameter can be changed dynamically using Q Apply “reinitq” command
Default value
– 32 megabytes
Tuning recommendations
– It‟s not always the case that bigger MEMORY_LIMIT will make better performance
– Start tuning from the default value (32MB)
– Usually full memory (IBMQREP_APPLYMON.MEM_FULL_TIME is big) indicates that
agents or target database do not perform well. Increasing Q browser memory cannot
solve such issues.
– If the agent threads are idle and target database response time is normal, and the
browser memory is full, try increase MEMORY_LIMIT
MAXAGENTS_CORRELID (Q Apply)
Description
– Defined in IBMQREP_RECVQUEUES control table
– The maximum number of Q Apply agent threads that can concurrently apply transactions
belonging to the same correlation ID for a single receive queue (consistent group).
– This is designed for batch processing. One batch job has a single correlation ID, and
users may use MAXAGENTS_CORRELID to control the parallelism degree for each
batch job.
– This parameter can be changed dynamically using Q Apply “reinitq” command
Default value
– 0 (no limit – use as many agents as possible)
Tuning recommendations
– Start tuning from the default value (same effect as NUM_APPLY_AGENTS)
– Check the value of IBMQREP_APPLYMON.DEADLOCK_RETRIES and
IBMQREP_APPLYMON.JOB_DEPENDENCIES. If they are high, try to decrease the
value of MAXAGENTS_CORRELID
Description
– Performance issue in target database when executing SQL statements from Q Apply.
Typical causes
– Lack of indexes that can be utilized when executing SQL statements from Q Apply
– Bad performance when accessing Q Apply control tables (such as
IBMQREP_DONEMSG)
– Bad performance inside target database (such as lock contention, I/O contention, etc.)
Key symptoms – used to identify the bottleneck
– Target DB2 response time is big
Reference symptoms
– Apply latency is big
– Depth of receive queue is big
– Q Apply used memory is full
– End-to-end latency is increasing
– Capture performance (latency and throughput) is impacted
– Q latency is big
General configuration
– Target database should be tuned in a similar way as the source database to optimize
performance
Special configuration
– Unique index
• Unique indexes at target side are necessary for Q Apply to generate effective SQL
statements, and is very important to ensure Q Apply performance.
– Lock size
• Configure target table spaces to use row level locking to avoid lock contention.
Special consideration to tune Q Apply control tables
– IBMQREP_DONEMSG
• This is a HOT table introduced by Q Replication
• Totally 2 additional operations (1 insert + 1 delete) on this table for each replicated
source transaction
• The table space is by default defined with APPEND ON and VOLATILE keywords.
Users should periodically run REORG against the table space and index space.
Application Design
Considerations
During application design, special considerations are required on database and workload
design to guarantee the functionality and performance of Q Replication
– SEQUENCE and IDENTITY columns
– Unique indexes on target table
– Triggers
– RI constraints
– Large Object (LOB) data types
– Big transactions
– Long running transactions jobs
– Hot row
– Non-logged operations
– Multi-row update statement
The full list of application considerations is documented at:
– http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.swg.im.iis.repl.qrepl.doc/to
pics/iiyrqplnconstructs.html
Non-logged LOB
– Q Capture fetches LOB column value from table when publishing row changes
– The performance is very bad
Logged LOB
– In-lined LOB
• Supported since DB2 for z/OS v10 and DB2 for LUW v9.7
• Q Capture fetches LOB column value directly from DB2 recovery log records
• Users should define LOB as inline in DB2 whenever possible
– Not in-lined LOB
• Q Capture fetches LOB column value from table when publishing row changes
• The performance is very bad
• Since DB2 for LUW v10.1, Q Capture fetches LOB column values directly from DB2
recovery log records, no matter they are in-lined or not
Best practice
– Avoid using LOBs, or using in-lined LOBs as much as possible
Big Transactions
Best practice
– Avoid very big transactions as much as possible, such as, forcefully commit after some
(such as, 1000) row changes
– Use the Q Capture “warntxsz” parameter to detect the existence of very large
transactions, and tune MEMORY_LIMIT parameters
– Some customers chose to exclude some very large transactions (e.g., a purge job that
deletes GBs of data in a single DB2 transaction) from replication by using the
IBMQREP_IGNTRAN table, and manually execute the job at each site
Hot Row
Q Replication Scalability
Send queues
Receive Queue
Q Capture
Q Apply
Q Apply
Q Capture
Send queues Receive queues
Q Apply
Q Capture
Q Apply
Q Capture
Q Apply
Q Capture
Q Apply
Q Capture
References
References