Beruflich Dokumente
Kultur Dokumente
Applies to:
SAP NetWeaver 7.0 or higher on DB2 10.1 or higher for Linux, UNIX, and Windows.
Summary
Multiple improvements have been made to the DB2 High Availability Disaster Recover (HADR) feature. DB2
Version 10.1 for Linux, UNIX, and Windows supports multiple standbys providing customers with true
database disaster recovery (DR) capability along with high availability (HA). IBM DB2 BLU Acceleration is
also supported for HADR environments as of DB2 10.5 FP4. This paper describes the new features related
to HADR as well as provides examples to enable single or multiple standbys in an SAP environment to
achieve DR in HA systems.
Author Bio
Ali Mehedi is a Software Developer at IBM with years of experience in test tools development, DB2
Administration, SAP NetWeaver installation, configuration and maintenance, and DB2 for LUW and SAP
integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX,
and Linux environments.
Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP
Integration and Support Center located in the Toronto IBM Lab. He currently tests, develops, and integrates
new features of DB2 with SAP. Through his role in SAP Development Support and as a Customer Advocate
for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and
performance optimization.
Catherine Vu is a member of the IBM SAP Integration and Support team that plays a critical role in certifying
every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability.
In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and
Windows customers. Before joining the IBM and SAP Integrations and Support team, Catherine had many
years experience in DB2 when she was working in the DB2 Development team and the Technical
Enablement team.
1
Table of Contents
1 Introduction ...................................................................................................................................................... 4
2 Planning ........................................................................................................................................................... 5
2.1 References ................................................................................................................................................ 5
2.2 Technology................................................................................................................................................ 5
2.2.1 IBM Tivoli System Automation for Multiplatforms (SA MP) ................................................................................. 5
2.2.2 HADR synchronization modes ............................................................................................................................ 5
2.2.3 Multiple standbys ................................................................................................................................................ 6
2.2.4 Log spooling ........................................................................................................................................................ 7
2.2.5 HADR replay delay.............................................................................................................................................. 7
2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address ................................................. 7
2.2.7 DB2 HADR for DB2 BLU Acceleration ................................................................................................................ 8
2.2.8 DB2 LOAD with COPY YES for BLU tables ........................................................................................................ 8
2.3 Hardware and operating system requirements ....................................................................................... 10
2.4 DB2 database requirements ................................................................................................................... 11
3 Preparation .................................................................................................................................................... 12
3.1 Configuration of the test system ............................................................................................................. 12
3.1.1 Hardware and operating system in the test systems ......................................................................................... 13
3.1.2 Required software downloads ........................................................................................................................... 13
3.2 Basic network setup ................................................................................................................................ 13
3.3 File system setup .................................................................................................................................... 13
3.4 Operating system users and groups ....................................................................................................... 14
4 Installing the standby ..................................................................................................................................... 15
4.1 Exporting the file systems ....................................................................................................................... 15
4.2 Turning on DB2 log archiving.................................................................................................................. 15
4.3 Taking a backup of the primary............................................................................................................... 16
4.4 Performing a homogeneous system copy using SWPM......................................................................... 16
4.5 Configuring ports ..................................................................................................................................... 21
4.6 Restoring the database from a backup ................................................................................................... 21
4.7 Configuring databases for HADR............................................................................................................ 22
4.8 Performing HADR checks ....................................................................................................................... 24
4.9 Starting HADR......................................................................................................................................... 24
4.10 Checking the HADR status using the db2pd tool.................................................................................. 25
5. Enabling automatic failover using SA MP..................................................................................................... 27
5.1 Installing the SA MP software and license .............................................................................................. 27
5.2 Setting up the HADR cluster ................................................................................................................... 28
5.2.1 Creating the cluster configuration file ................................................................................................................ 28
5.2.2 Creating the database cluster ........................................................................................................................... 30
5.2.3 Displaying the database cluster ........................................................................................................................ 31
5.2.4. Enabling the SAP system with virtual database host name and IP address .................................................... 34
5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT) ..................................... 34
5.3.1 GMT Configuration ............................................................................................................................................ 35
5.3.2 Micro-failover test .............................................................................................................................................. 36
2
5.3.4 Testing a disaster scenario ............................................................................................................................... 40
6. Installing the auxiliary standby database instance ....................................................................................... 42
6.1 Mounting file systems.............................................................................................................................. 42
6.2 Updating port configurations ................................................................................................................... 42
6.3 Performing a homogeneous system copy using SWPM......................................................................... 42
6.4 Configuring the HADR auxiliary standby database................................................................................. 42
7 Failover scenarios.......................................................................................................................................... 47
7.1 Failover scenario #1: The primary is down ............................................................................................. 47
7.2 Failover scenario #2: Both the primary and principal standby are down ................................................ 49
7.3 Failover scenario #3: The principal standby is down .............................................................................. 54
8 Miscellaneous troubleshooting in an SA MP environment ............................................................................ 56
8.1 HADR congestion.................................................................................................................................... 56
8.2 Manual creation and deletion of an SA MP cluster ................................................................................. 57
8.3 SA MP cluster resource group ................................................................................................................ 59
8.4 Collection of traces.................................................................................................................................. 59
8.5 HADR simulator ...................................................................................................................................... 60
8.6 Split-brain condition................................................................................................................................. 60
9 Conclusion ..................................................................................................................................................... 61
10 Related Content ........................................................................................................................................... 62
Copyright........................................................................................................................................................... 63
3
1 Introduction
First introduced in DB2 Version 8.2, the DB2 High Availability Disaster Recovery (HADR) database
replication feature provides protection against database outages and site failures. In an HADR environment,
the transaction logs from a source database, called the primary, are shipped via TCP/IP and replayed to a
target database, called the standby. If the primary is offline or is lost due to a disaster, the standby can be
made available as the new primary using a procedure called HADR failover.
HADR failover can be automated using IBM Tivoli System Automation for Multiplatforms (SA MP). This
allows applications, such as SAP, to continue with zero disruption to user activities under ideal conditions.
Starting with DB2 10.1, DB2 HADR supports multiple standby databases. This makes IBM DB2 capable of
providing a complete Disaster Recovery (DR), High Availability (HA) and Continuous Availability solution in a
single and easily manageable feature. The following figure illustrates a DB2 HADR cluster that contains
multiple standby databases, along with automatic failover implemented through SA MP:
Starting with DB2 10.5 Fix Pack 4, the DB2 HADR feature is also supported for databases containing
column-organized (BLU) tables. IBM DB2 with BLU Acceleration is optimized for the SAP environment. The
greater performance of DB2 BLU Acceleration combined with the improved HA and DR capabilities of DB2
HADR makes DB2 an ideal RDBMS for an SAP BW environment.
This document will cover the implementation of DB2 HADR by adding a principal and auxiliary standby to an
existing SAP NetWeaver (ABAP) system on AIX, Solaris SPARC or Linux. Furthermore, it describes the
implementation of automatic failover using SA MP as well as recovery from several failover scenarios.
4
2 Planning
DB2 HADR does not require a brand new SAP installation. The version-specific installation guide, for
example, for SAP NetWeaver 7.31 can be found at http://help.sap.com/nw731/ .
In general, SAP NetWeaver installation guides can be found at http://service.sap.com/instguidesnw <your
SAP NetWeaver main release> Installation Installation SAP NetWeaver Systems.
2.1 References
The following documents should be reviewed before reading this paper:
SAP Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)
http://service.sap.com/sap/support/notes/1612105
2.2 Technology
The DB2 HADR feature comes with many configuration and performance tuning options for various business
needs.
5
SYNC mode Definition Standby ACK Data protection
for log receive
Note: SAP recommends to use NEARSYNC as this provides adequate data protection without significant performance
impact.
Depending on which synchronization mode is chosen, HADR can have a performance impact on the primary
due to network latency during HADR log shipping. Better performance can be achieved by decreasing the
distance between the primary and the standby servers, and by having them connected with a high
performance LAN backbone. However, this introduces the risk of losing both the primary and the standby
during a wide scale natural disaster such as flood, fire, etc.
Increasing the distance, on the other hand, increases the network latency which causes performance
degradation and a longer failover time. Therefore, with a single standby, there is a tradeoff between the
database performance and the degree of DR capability. The HADR multiple standby feature solves this
6
problem. The principal standby is used to achieve HA during outages and the auxiliary standby is to be used
for DR purposes only.
Note: It is recommended to have the primary and the principal standby in the same building with a high performance LAN
connection for faster log shipping and quicker failover during micro-outages. The auxiliary standby is only to be
used for DR purposes and is recommended to be in a different location, preferably in a different city or country.
There is no restriction on the maximum distance between the primary and the standby servers. The auxiliary
standby is forced to use SUPERASYNC synchronization mode which has no synchronous dependency on
replication to the standby. Therefore, there is minimal performance impact for having multiple standbys.
Note: SAP recommends using HADR_SPOOL_LIMIT along with HADR_REPLAY_DELAY to accommodate the logs
accumulated during the replay delay period.
2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address
If there is a change in HADR role, that is, the standby has to take over the primary during an outage, all the
clients can reconnect to the new primary automatically by using one of the following options:
1. Using a virtual IP (VIP): The VIP is bound to the primary servers network interfaces. After a
takeover, the virtual IP is bound to the network interfaces of the standby server (the new primary
server).
2. Using the DB2 Automatic Client Reroute (ACR) feature: The client is configured to know the two
database servers. If the database client cannot connect to the configured primary server, the
database client tries to connect to the configured standby (alternate) server.
Note: SAP recommends using the virtual IP address option with SA MP. More details can be found in SAP Note
1568539, DB6: HADR - Virtual IP or Automatic Client Reroute
(http://service.sap.com/sap/support/notes/1568539).
Automatic failover is not supported between the primary and an auxiliary standby. Auxiliary standbys are to be
used for DR purposes only. A manual takeover must be issued on one of the auxiliary standbys to switch it to the
primary.
7
2.2.7 DB2 HADR for DB2 BLU Acceleration
As of DB2 10.5 Fix Pack 4, the DB2 HADR feature can be used with databases containing BLU (column-
organized) tables. Except for Reads on Standby (RoS), all HADR features including multiple standbys are
supported for BLU tables without any additional requirements or settings.
The COPY YES option creates a mini backup image of the data loaded into the shared directory specified and
the standby reads from the backup and replays the changes. Therefore, the shared directory must be
accessible by the standby database. In case of a failover or after a database restore, this backup can be
used to roll forward to the end of logs that also includes the data loaded in the primary using the LOAD
command.
Note: The LOAD command with the COPY NO option is not supported for HADR environments. Only LOAD with the COPY
YES option is supported in an HADR environment. If data is loaded into tables without the COPY YES option in the
HADR primary, the changes will not be propagated to the standby. Moreover, the table will be marked as unusable
in all standbys as it becomes inconsistent with the table that is in the primary. The
DB2_LOAD_COPY_NO_OVERRIDE registry variable can be set on the primary database to enable a load operation
with the COPY NO option to be converted to a load operation with the COPY YES option. More information can be
found at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0011761.
html?lang=no. The HADR state is not affected by the LOAD operation. The DB2 list utilities command
can be used to display load progress in the primary.
Example:
db2 list utilities show detail
ID = 5
Type = LOAD
Database Name = D01
Member Number = 0
Description = [LOADID: 27862.2014-10-21-16.40.39.069004.0 (4;4)]
[*LOCAL.db2d01.141021204038] OFFLINE LOAD Unknown file type AUTOMATIC INDEXING
REPLACE COPY YES SAPD01./BIC/FICBLU01-ALI
Start Time = 10/21/2014 16:40:39.082853
State = Executing
Invocation Type = User
Progress Monitoring:
Phase Number = 1
Description = SETUP
Total Work = 0 bytes
Completed Work = 0 bytes
Start Time = 10/21/2014 16:40:39.082859
8
Phase Number = 3
Description = LOAD
Total Work = 0 rows
Completed Work = 0 rows
Start Time = Not Started
Phase Number = 4
Description = BUILD
Total Work = 2 indexes
Completed Work = 0 indexes
Start Time = Not Started
The standby is updated after the load is complete in the primary. The standby db2diag.log file shows the
following logs indicating the load operation start and completion time where data is being loaded in the
primary into the table /BIC/FICBLU01-ALI.
Example:
2014-10-15-16.48.58.682164-240 I20140A498 LEVEL: Warning
PID : 4915624 TID : 27765 PROC : db2sysc 0
INSTANCE: db2d01 NODE : 000 DB : D01
APPHDL : 0-175 APPID: *LOCAL.DB2.141015193921
HOSTNAME: sapaix11
EDUID : 27765 EDUNAME: db2agent (D01) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:548
DATA #1 : String, 74 bytes
Starting to restore a load copy.
SAPD01./BIC/FICBLU01-ALI.20141015160659
Note: As long as the standby has access to the shared directory where the mini backup from LOAD with COPY YES is
located, LOAD will complete even if a failover happens right after data is loaded into the primary.
If data is loaded into a table in the primary using the LOAD command without COPY YES option or with COPY
NO option, the table will be marked as unavailable in the standby.
Example:
SELECT COUNT(*) AS COUNT FROM SAPD01./BIC/FICBLU01-ALI
COUNT
-----------
SQL1477N For table "SAPD01./BIC/FICBLU01-ALI" an object "130" in table space
"20" cannot be accessed. SQLSTATE=55019
db2diag.log:
9
APPHDL : 0-8 APPID: *LOCAL.DB2.140904011422
HOSTNAME: sapaix11
EDUID : 18507 EDUNAME: db2redow (D01) 0
FUNCTION: DB2 UDB, data management, sqldMarkObjInErr, probe:1
MESSAGE : ADM5571W The "DATA" object with ID "130" in table space "20" for
table "TBSPACEID=20.TABLEID=130" is being marked as unavailable.
If a failover happens before the data load is complete in the primary, the tablespace containing the table will
be marked as Restore Pending in the standby when it becomes the new primary. The table will be marked as
Load Pending in the primary, now the standby.
db2 list tablespaces show detail
Tablespace ID = 3
Name = D01#FACTI
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0100
Detailed explanation:
Restore pending
Tablespace ID = 4
Name = D01#FACTD
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0100
Detailed explanation:
Restore pending
To resolve this issue, the load operation must be terminated using the TERMINATE option from the database
host where it was started.
Example:
db2 LOAD FROM <filename>.ixf OF IXF TERMINATE INSERT INTO <schema name>."<table
name>" COPY YES TO <shared directory>'
Note: If the LOAD command with the COPY YES option is used, this situation can be avoided in an HADR environment.
To recover from a Restore Pending state, the tablespace must be restored from a tablespace level backup.
Tablespace level backup is not supported for HADR. Therefore, to take a tablespace level backup, HADR must be
disabled. Once the tablespace is restored, the standby must be refreshed with a new copy (using a full database
backup) of the primary.
2. A TCP/IP interface must be available between the HADR host machines, and a high-speed, high-
capacity network is recommended. The network bandwidth required for HADR log shipping between
the primary and the principle standby depends on the amount of logs generated in the primary per
second during peak time. The minimum bandwidth can be easily calculated using the following
paper: http://scn.sap.com/docs/DOC-56040
Note: The primary ships logs to the principle standby and all the active auxiliary standbys simultaneously.
Therefore, the required network bandwidth is multiplied by the number of active standbys. The HADR simulator
tool described in the following link can also be used to determine the maximum network shipping rate between
systems:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simula
tor
10
2.4 DB2 database requirements
1. The versions of the database systems for the primary and standby must be identical; for example,
both must be either 10.1 or 10.5.
2. During a rolling fix pack update, the modification level (for example, the fix pack level) of the
database system for the standby can be temporarily higher than that of the primary in order to test
the new level. Both databases should be on the same DB2 version and fix pack level for normal
operations.
For more information about software and hardware requirements, see the following links:
11
3 Preparation
For the purpose of this exercise, we will enable HADR for an existing SAP NetWeaver system with a
distributed installation.
12
3.1.1 Hardware and operating system in the test systems
All database hosts are on separate hardware with identical configuration and on the same operating system
level.
Example:
For the test systems saplxvm07, saplxvm08, and saplxvm09, the OS and hardware configuration is
compared using /proc/meminfo, /proc/cpuinfo and /etc/SuSE-release files. Each system has 4
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz CPUs, 8 GB of RAM, and the following operating system:
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
Adding static IP addresses to hostname mappings in the hosts file removes the systems DNS servers as a
single point of failure. In case of a DNS failure, the clustered systems can still resolve the addresses of the
other machines via the /etc/hosts file. From each host, ping all other hosts to check communication.
13
3.4 Operating system users and groups
All SAP and DB2 related user IDs and group IDs from the primary and the SAP Application Server must also
be available and free to use in the standby server.
Example:
The following command is used to collect group IDs and user IDs on the hosts saplxvm07 and saplxvm06.
The same IDs will be used in the standby and auxiliary standby servers.
id ahaadm
14
4 Installing the standby
To perform the steps in this section, Section 3, must have been completed.
/sapmnt/AHA/exe saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)
/sapmnt/AHA/global saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)
/sapmnt/AHA/profile saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)
The following commands must be executed on host saplxvm06 to confirm the export and allow access:
exportfs -a
exportfs
The following lines can be added to the /etc/fstab file on the standby host saplxvm08 so that the
directories are automatically remounted after a system restart:
saplxvm06:/sapmnt/AHA/exe /sapmnt/AHA/exe nfs defaults 0 0
saplxvm06:/sapmnt/AHA/global /sapmnt/AHA/global nfs defaults 0 0
saplxvm06:/sapmnt/AHA/profile /sapmnt/AHA/profile nfs defaults 0 0
The following command can be used to mount all directories mentioned in /etc/fstab.
mount -a
Example:
saplxvm07:db2aha > db2 update db cfg for AHA using LOGARCHMETH1
DISK:/db2/AHA/log_archive/
After enabling log archiving, a complete offline backup must be taken in order to take the database out of the
backup pending state. To reduce production downtime during the offline backup, the backup can be split to
multiple files. Using the COMPRESS option during backup will increase the backup duration as it adds
compression time to the regular backup time.
Example:
saplxvm07:db2aha > db2 backup db aha to /db2/db2aha/backup, /db2/db2aha/backup,
/db2/db2aha/backup, /db2/db2aha/backup
15
4.3 Taking a backup of the primary
An online or offline backup of the primary database is required, and the backup image will be used to create
the standby.
Example:
saplxvm07:db2aha > db2 backup db aha online to
/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup
Note: Synchronization mode corresponds to the DB2 database configuration parameter HADR_SYNCMODE. HADR
local and remote service name corresponds to the database configuration parameters HADR_LOCAL_SVC and
HADR_REMOTE_SVC respectively.
A homogeneous system copy creates all users, sets up the environment, installs the database software,
creates the instance on the standby, and then prompts the user to restore the database from a backup. This
is when the backup taken in step 4.3 from the primary server must be restored in the standby server.
Note: To set up HADR, the standby must be in rollforward pending state. Therefore, SWPM is no longer required for the
HADR setup and must be exited after restoring the database.
Example:
The following screens show SWPM HADR-related settings:
16
Figure 3: Start screen of SWPM
17
Figure 4: SWPM screen for IBM Tivoli System Automation for Multiplatforms (SA MP) for High
Availability installation options
Note: The cluster configuration file is used to create a cluster for the automatic failover, which will be explained later in
this document. In figure 4, the Generate cluster configuration files checkbox is not selected because SWPM will be
exited after restoring the database and you will not reach the step to create the cluster configuration file. For a new
installation of the primary, if this option is selected, the cluster configuration file (cluster_config.xml) will be
generated in the directory /tmp.
18
Figure 5: DB2 High Availablity Disaster Recovery options in SWPM
Note: The two port numbers will be assigned to the database configuration parameters HADR_LOCAL_SVC and
HADR_REMOTE_SVC. This can be changed later.
19
Figure 6: SWPM message window to restore database for homogeneous system copy
Note: As described earlier, at this stage, SWPM must be stopped by clicking the Cancel button.
20
Figure 7: Stop SWPM.
Note: As described earlier, SWPM is no longer needed for the HADR setup. Stop it by clicking the Stop button.
Note: The port numbers configured for AHA_HADR_1 and AHA_HADR_2 are used for the primary (saplxmv07) and the
standby (saplxmv08) servers HADR local and remote service name database configuration parameters
(HADR_LOCAL_SVC, HADR_REMOTE_SVC). The same port number can be used for both parameters.
Example:
saplxvm07:db2aha 51> cd /db2/AHA
saplxvm07:db2aha 52> ls -lrt
total 32
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata4
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata3
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata2
21
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata1
drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:45 db2aha
drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:46 log_dir
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 15:40 log_archive
drwxr-xr-x 27 db2aha dbahaadm 4096 Jul 4 14:00 db2dump
The following directories are created (or mounted) on the standby host with the same ownership and
permission as in the primary host above.
saplxvm08:db2aha 60> cd /db2/AHA
saplxvm08:db2aha 61> ls -lrt
total 32
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 db2aha
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata4
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata3
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata2
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata1
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_dir
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_archive
drwxr-xr-x 4 db2aha dbahaadm 4096 Jun 13 14:22 db2dump
The backup taken in section 4.3 of this document is restored in the standby host using the following
command:
saplxvm08:db2aha 31> db2 restore db aha from
/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup
DB20000I The RESTORE DATABASE command completed successfully.
Note: DB2 HADR requires the standby to be in rollforward pending mode. Therefore, after restoring the database, it is not
necessary to execute the ROLLFORWARD DATABASE command.
Example:
saplxvm08:db2aha 23> db2 get db cfg for aha|grep -i Rollforward
Rollforward pending = DATABASE
Note: During the database instance installation on the standby (saplxvm08), the parameter DBHOST in SAP
DEFAULT.PFL was changed to the host name of the standby host (saplxvm08). You should change the value of
SAPDBHOST and j2ee/dbhost to a virtual host name later on when setting up the database virtual host. For
now, it should be changed to the primary host (saplxvm07).
Example:
saplxvm08:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL
SAPDBHOST = saplxvm07
j2ee/dbhost = saplxvm07
22
UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm08;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_2;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;
UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;
UPDATE DB CFG FOR AHA USING HADR_SYNCMODE NEARSYNC;
UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;
UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;
UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;
Note: Port numbers for the parameters AHA_HADR_1 and AHA_HADR_2 are defined in the /etc/services file (see
section 4.5). Actual port numbers (5951 and 5952) can also be used instead of these.
23
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
Both the primary and the standby must be deactivated and reactivated for the changes to take effect. This
will require a production system downtime.
1. The /etc/services file on both the standby and primary host contains the same port numbers.
Example:
saplxvm07:~ # cat /etc/services | grep -i aha
sapdb2AHA 5912/tcp
AHA_HADR_1 5951/tcp # DB2 HADR log shipping
AHA_HADR_2 5952/tcp # DB2 HADR log shipping
sapmsAHA 3600/tcp # SAP System Message Server Port
DB2_db2aha 60006/tcp
DB2_db2aha_1 60007/tcp
DB2_db2aha_2 60008/tcp
DB2_db2aha_3 60009/tcp
DB2_db2aha_4 60010/tcp
DB2_db2aha_END 60011/tcp
2. The DB2 licenses on both the primary and the standby are valid and not a trial license. Use the
db2licm -l command to verify. SA MP is not supported for DB2 temporary licenses. Apply a
valid license using the db2licm a <license file name> command as user db2aha.
3. The database manager configuration Parameter SVCENAME is defined as sapdb2<SID> on both the
primary and the standby hosts.
Example:
saplxvm08:db2aha 23> db2 get dbm cfg | grep -i svcename
TCP/IP Service name (SVCENAME) = sapdb2AHA
Example:
saplxvm07:db2aha 59> db2 connect to aha user sapaha using ******
Database Connection Information
Database server = DB2/LINUXX8664 10.5.0
SQL authorization ID = SAPAHA
Local database alias = AHA
24
DB20000I The START HADR ON DATABASE command completed successfully.
HADR is now enabled and the standby will begin to replay the logs to catch up to the primary.
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 05/21/2014 14:47:57.497164 (1400698077)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 3
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000056
LOG_HADR_WAIT_ACCUMULATED(seconds) = 1.464
LOG_HADR_WAIT_COUNT = 36460
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000007.LOG, 4393, 2666673961
STANDBY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 05/22/2014 00:38:17.000000 (1400733497)
STANDBY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)
STANDBY_REPLAY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
25
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 05/22/2014 00:42:17.000000 (1400733737)
READS_ON_STANDBY_ENABLED = N
Refer to the following IBM Knowledge Center page for more details on the above values:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0011729.html
26
5. Enabling automatic failover using SA MP
The steps in section 4 describe how to enable HADR, but not automatic failover. Therefore, if the primary
goes down, a manual takeover operation must be performed from the standby. All applications must be
manually redirected to the standby, the new primary. This can be done by changing the db2cli.ini and
SAP profile (see section 5.2.4).
To enable automatic failover, SA MP must be installed and configured on both the primary and the standby
hosts.
2. Install SA MP.
saplxvm07: ./prereqSAM
prereqSAM: All prerequisites for the ITSAMP installation are met on operating
system
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
saplxvm07: ./installSAM --noliccheck
4. Install the HA scripts on the primary and the standby by running the db2cptsa command.
saplxvm07:/db2/db2aha/db2_software/install/tsamp # ./db2cptsa
DBI1110I The DB2 High Availability (HA) scripts for the IBM Tivoli
System Automation for Multiplatforms (SA MP) were successfully
updated in /usr/sbin/rsct/sapolicies/db2.
Explanation:
27
User response:
No action is required.
Example:
saplxvm07:~ # tar -xzvf samp_scripts_633_20140317.tgz
samp_scripts/
samp_scripts/sampdbcmd
samp_scripts/startdb
samp_scripts/startj2eedb
samp_scripts/stopdb
samp_scripts/stopj2eedb
samp_scripts/sapdb2cluster.sh
The sapdb2cluster.sh script is used to set up and create the cluster using the configuration file with all the
required parameters. The script has the following options:
1. Create, Show or Edit Database Configuration
2. Create Database Cluster
3. Show Database Cluster State
4. Delete Database Cluster
For further information, refer to the SAP installation guide IBM DB2 High Availability Solution: IBM Tivoli
System Automation for Multiplatforms that can be found on SAP Service Marketplace at
http://service.sap.com/instguidesnw.
By default, the configuration information is saved in the sapdb2cluster.conf file and the log is saved in the
sapdb2cluster.log file.
Select option 1 - Create, Show or Edit Database Configuration which displays values from the current
configuration file if it exists in the current directory or prompts you for new values.
Example:
General Cluster Configuration
28
[4] TSA_DOMAIN_NAME = sap_ahadb2
[5] TSA_TIEBREAKER_IP_ADDRESS = 9.26.166.1
[6] TSA_DISK_HEARTBEAT = [OFF]
[7] TSA_REMOTE_CMD = ssh
The parameter TSA_DISK_HEARTBEAT enables SA MP Disk Heartbeat and is defined by the accessibility of
the raw disks, logical volumes (LVID), multipath devices (MPATH), or physical volumes (PVID). This allows
TSA to distinguish between a network failure and a node failure. Refer to the following link for more
information: http://www-
01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/sampugdiskheartbeat.html?cp
=SSRM2X_4.1.0%2F0-4-5-1-1&lang=en
The parameter DB2_HOSTNAME_LIST takes the primary host name and the standby host name separated by
comma. DB2_HA_HOSTNAME is used to assign the virtual host name. The virtual host name must not be in use.
It is also required to supply a valid SA MP license location. In our example, the license file /root/sam32.lic
29
is assigned to variable TSA_LICENSE_FILE.
30
Executing db2haicu at node saplxvm07 : OK
The above output also describes the steps performed during cluster creation. The script configures both
systems for HADR and generates the /tmp/cluster_config.xml configuration file on both systems. The
script then uses the configuration file to execute the db2haicu command on both the primary and the standby
to create the cluster. The output of the db2haicu command is stored in /tmp/cluster_config.log by the
script.
Example:
The following is an example of the cluster_config.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<DB2Cluster xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="db2ha.xsd"
clusterManagerName="TSA" version="2.2">
<ClusterDomain domainName="sap_ahadb2">
<Quorum quorumDeviceProtocol="network" quorumDeviceName="9.26.166.1"/>
<PhysicalNetwork physicalNetworkName="db2network" physicalNetworkProtocol="ip">
<Interface interfaceName="eth0" clusterNodeName="saplxvm07">
<IPAddress baseAddress="9.26.166.199" subnetMask="255.255.254.0"
networkName="db2network"/>
</Interface>
<Interface interfaceName="eth0" clusterNodeName="saplxvm08">
<IPAddress baseAddress="9.26.166.200" subnetMask="255.255.254.0"
networkName="db2network"/>
</Interface>
<LogicalSubnet baseAddress="9.26.166.0" subnetMask="255.255.254.0"
networkName="db2network"/>
</PhysicalNetwork>
<ClusterNode clusterNodeName="saplxvm07"/>
<ClusterNode clusterNodeName="saplxvm08"/>
</ClusterDomain>
<FailoverPolicy>
<HADRFailover />
</FailoverPolicy>
<DB2PartitionSet>
<DB2Partition dbpartitionnum="0" instanceName="db2aha" />
</DB2PartitionSet>
<HADRDBSet>
<HADRDB databaseName="AHA"
localInstance="db2aha"
remoteInstance="db2aha"
localHost="saplxvm07"
remoteHost="saplxvm08" />
<VirtualIPAddress baseAddress="9.26.166.97" subnetMask="255.255.254.0"
networkName="db2network" />
</HADRDBSet>
</DB2Cluster>
31
Figure 8: lssam output of SA MP cluster configuration for HADR
Note: With option 3, the lssam command is executed by the script to collect the status output.
Term Description
32
db2_db2aha_db2aha_AHA-rg_group-equ Equivalency database instance resource group equivalent to
db2_db2aha_db2aha_AHA-rg
db2_db2aha_saplxvm07_0-rg_group- Equivalency database instance resource group for the host
equ saplxvm07
db2_db2aha_saplxvm08_0-rg_group- Equivalency database instance resource group for the host
equ saplxvm08
Use ifconfig -a on the primary host to check that the virtual IP address is linked to the primary host IP
address.
Example:
saplxvm07:~/samp_scripts # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:1A:98:28
inet addr:9.26.166.199 Bcast:9.26.167.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:166592 errors:0 dropped:581 overruns:0 frame:0
TX packets:106491 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:169945146 (162.0 Mb) TX bytes:18069161 (17.2 Mb)
The link is defined by eth0:0 in the sample output above where 9.26.166.97 is the virtual IP address. The
standby host IP configuration remains unchanged.
Note: During a failover scenario, this IP link is removed and a new link between the standby host and the virtual IP is
created in the standby host.
33
5.2.4. Enabling the SAP system with virtual database host name and IP address
This step has already been completed by sapdb2cluster.sh.
On saplxvm06, the SAPDBHOST and j2ee/dbhost in the default profile and hostname in the
db2cli.ini file are replaced by the virtual host name. In the following example, the virtual host name is
saplxvmsap:
saplxvm06:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL
SAPDBHOST = saplxvmsap
j2ee/dbhost = saplxvmsap
saplxvm06:~ # cat /usr/sap/AHA/SYS/global/db6/db2cli.ini
; Comment lines start with a semi-colon.
[AHA]
Database=AHA
protocol=tcpip
hostname=saplxvmsap
servicename=5912
[COMMON]
diagpath=/usr/sap/AHA/SYS/global/db6/db2dump
Once the database cluster is created, all connections to the database must be refreshed to pick up the
change. The virtual host (saplxvmsap) is reflected as the database host in the SAP system.
Figure 9: Dashboard screen in the DBA Cockpit (Web Dynpro user interface)
5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT)
The micro-outage feature of SAP on IBM DB2 for LUW can be used for pausing SAP applications for a short
period to perform a controlled failover without having to stop any SAP ABAP application servers. This allows
administrators to perform certain database maintenance without any significant downtime. The GMT makes
34
optimal use of the micro-outage feature and provides an easy way to perform a controlled, graceful HADR
failover.
The GMT can be downloaded from SAP Note 1530812 - DB6: Graceful Maintenance Tool and can be used
from the primary or the standby host as root user. The GMT requires the ABAP routines attached to SAP
Note 1907533 - ABAP Routines for Graceful Maintenance Tool (GMT) and SAP Note 1443426 - DB6:
Graceful Cluster Switch.
Example:
saplxvm07:~/samp_scripts # tar xzvf gmt_scripts_633_20140317.tgz
gmt_scripts/
gmt_scripts/exitDB2Restart.sh
gmt_scripts/exitFPActivate.sh
gmt_scripts/exitNoOp.sh
gmt_scripts/exitResumeBtcExternal.sh
gmt_scripts/exitSuspendBtcExternal.sh
gmt_scripts/sapdb2gmt.sh
e - Exit
Input:
Database Configuration
35
Database Graceful Maintenance Tool
Press Enter to Exit or select a number to edit a parameter (e.g. 1 for SAP_SID):
After the configuration has been completed, Option 2 can be used to check all prerequisites for micro failover
using the GMT.
36
Clean quiesce file : OK
Checking SAP DBSL feature prerequisites: : OK
Checking for transactions running for more than 60 seconds : WARNING
*** WARNING: Found Long Running Transactions (Tue Jul 22 11:10:53 EDT 2014) ***
COMMENT APPL_NAME AGENT_ID UOW_START_TIME STATUS RUNTIME
SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_RE
---------- ------------ -------- ------------------- ------------ ----------- ---
--------- -------------------- ----------- ------ --------------
LONGRUNNER DB2ATS 3354 2014-07-22-11.00.00 LOCKWAIT 652
DB2AHA saplxvm07 -Task: -
1 record(s) selected.
WARNING: Long running transactions are active which might be canceled (rolled
back)!
*******************************************************************************
The tool displays a list of active transactions and asks for confirmation to proceed.
If Yes is selected, the script proceeds and waits at the step, Waiting for the Quiesce file
(current: 12 s; timeout: 65 s) until the SAP applications are paused.
Example:
Warnings occurred. Do you want to proceed with graceful maintenance? [Yes|No]:
Yes
Note: This requires downtime. All connections to the database will be closed.
37
************** Open Connections (Tue Jul 22 11:46:10 EDT 2014) ***************
APPL_NAME AGENT_ID AUTHID UOW_START_TIME STATUS
SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_REPORT
-------------------- -------- ---------- ------------------- -------------------- -
----------- -------------------- ----------- ------------------------------
DB2ATS 3354 DB2AHA 2014-07-22-11.00.00 LOCKWAIT
DB2AHA saplxvm07 -Task: -
dw.sapAHA_DVEBMGS01 27 SAPAHA 2014-07-22-11.44.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 28 SAPAHA 2014-07-22-11.42.27 UOWWAIT
SAPSYS saplxvm06 SPO SAPLSPOA
dw.sapAHA_DVEBMGS01 29 SAPAHA 2014-07-22-11.43.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 32 SAPAHA 2014-07-22-11.44.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 34 SAPAHA 2014-07-22-00.01.03 UOWWAIT
DDIC saplxvm06 BTC SAPMSSY2
dw.sapAHA_DVEBMGS01 35 SAPAHA 2014-07-22-11.43.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 37 SAPAHA 2014-07-22-11.44.10 UOWWAIT
SAPSYS saplxvm06 DIA CL_ABSTRACT_SAML_PROTOCOL=====
dw.sapAHA_DVEBMGS01 39 SAPAHA 2014-07-22-11.39.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
9 record(s) selected.
*******************************************************************************
If all connections cannot be closed within 400 seconds, the tool will remind the user again to wait or to
continue:
Example:
Total Timeout is now 240 (Maximum allowed 400)
38
Do you want to wait 60 seconds longer? [Yes|No]: No
If No is selected, the tool will continue closing connections and prompt if the transactions can be forcefully
rolled back.
Example:
Do you want to wait 60 seconds longer? [Yes|No]: No
The cluster should reflect the changes and can be displayed using the lssam command.
Example:
saplxvm07:ahaadm 5> lssam
Online IBM.ResourceGroup:db2_db2aha_db2aha_AHA-rg Nominal=Online
|- Online IBM.Application:db2_db2aha_db2aha_AHA-rs
|- Offline IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm07
'- Online IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm08
'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs
|- Offline IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm07
'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm08
Online IBM.ResourceGroup:db2_db2aha_saplxvm07_0-rg Nominal=Online
'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs
'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs:saplxvm07
Online IBM.ResourceGroup:db2_db2aha_saplxvm08_0-rg Nominal=Online
'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs
'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs:saplxvm08
Online IBM.Equivalency:db2_db2aha_db2aha_AHA-rg_group-equ
|- Online IBM.PeerNode:saplxvm08:saplxvm08
'- Online IBM.PeerNode:saplxvm07:saplxvm07
Online IBM.Equivalency:db2_db2aha_saplxvm07_0-rg_group-equ
'- Online IBM.PeerNode:saplxvm07:saplxvm07
Online IBM.Equivalency:db2_db2aha_saplxvm08_0-rg_group-equ
'- Online IBM.PeerNode:saplxvm08:saplxvm08
Online IBM.Equivalency:db2network
39
|- Online IBM.NetworkInterface:eth0:saplxvm08
'- Online IBM.NetworkInterface:eth0:saplxvm07
The virtual IP link from the old primary is removed and a new link is created in the new primary host
(saplxvm08).
saplxvm08:~ # ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:76:2C:2C
inet addr:9.26.166.97 Bcast:9.26.167.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
During the cluster switch, the changes can be monitored live using the lssam -top command as user root
from any of the hosts.
40
Figure 10: Cluster status after the primary is down (lssam output)
After a successful takeover, the primary (saplxvm07) will switch roles with the standby (saplxvm08). Once
the test is complete, the /db2/db2aha/sqllib/adm/db2sysc_backup must be moved back to
/db2/db2aha/sqllib/adm/db2sysc. Once the file is moved back, the old primary (saplxvm07) will be
automatically brought back up and activated as the new standby. It may take several minutes. The status
can be monitored using the lssam -top command.
Note: Moving the db2sysc file is only performed to simulate a disaster scenario and is not recommended.
41
6. Installing the auxiliary standby database instance
As described earlier, the auxiliary standby is for DR purposes only and should be used to protect data from
wide spread disasters. Adding an auxiliary standby is similar to adding a principal standby except for minor
changes to the DB2 database configuration. The following sections show how to add the first auxiliary
standby.
Example:
saplxvm09:~ # mount | grep AHA
saplxvm06:/sapmnt/AHA/exe on /sapmnt/AHA/exe type nfs (rw,addr=9.26.166.198)
saplxvm06:/sapmnt/AHA/global on /sapmnt/AHA/global type nfs (rw,addr=9.26.166.198)
saplxvm06:/sapmnt/AHA/profile on /sapmnt/AHA/profile type nfs (rw,addr=9.26.166.198)
Note: The homogeneous system copy changes the SAPDBHOST and j2ee/dbhost variables to saplxvm09 in the SAP
default profile /sapmnt/AHA/profile/DEFAULT.PFL. This needs to be manually changed back to the virtual
host, saplxvmsap. The auxiliary standby is also in rollforward pending mode just like the principal standby.
42
UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;
UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST
saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2;
UPDATE DB CFG FOR AHA USING HADR_SYNCMODE SUPERASYNC;
UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;
UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;
UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;
saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
Note: The HADR_TARGET_LIST parameter is where the other two HADR server host names and their port numbers are
listed in pairs. The order means that the first host in the list is the principal standby and the second host is the
auxiliary standby, and so on.
The HADR_TARGET_LIST database configuration parameter also needs to be updated in the primary and the
standby.
Example:
saplxvm07:db2aha 190> db2 "UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST
saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3"
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
SQL1363W One or more of the parameters submitted for immediate modification
were not changed dynamically. For these configuration parameters, the database
must be shutdown and reactivated before the configuration parameter changes
become effective.
saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3
43
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
Note: To convert from a single standby to multiple standbys, as the above message indicated, the database must be
deactivated and reactivated. This will require a downtime of the system. It is recommended to quiesce the SAP
system to close the connections temporarily instead of stopping.
Example:
44
Once HADR is successfully activated, the db2pd hadr command on the primary host will list all of the
standbys:
Example:
saplxvm07:db2aha 200> db2pd -d aha -hadr
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:20.709721 (1386105260)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 17
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 12/03/2013 16:19:51.000000 (1386105591)
READS_ON_STANDBY_ENABLED = N
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SUPERASYNC
STANDBY_ID = 2
LOG_STREAM_ID = 0
HADR_STATE = REMOTE_CATCHUP
45
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm09
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:21.118983 (1386105261)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 16
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = N
For more information on DB2s HADR multiple standby database feature, see the IBM Knowledge Center:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0059994.html
46
7 Failover scenarios
As mentioned earlier, the auxiliary standbys are only for disaster recovery (DR) purposes and automatic
failover is not supported between the primary and auxiliary standbys.
The HADR setup in the example of this paper has a primary, a principal standby, and an auxiliary standby.
The database manager configuration parameter HADR_TARGET_LIST on the primary, the principal standby,
and the auxiliary standby are set as follows:
saplxvm07:db2aha 71> db2 get db cfg for aha | grep HADR_TARGET_LIST
HADR target list (HADR_TARGET_LIST) =
saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3
With these settings, if the primary (on host saplxvm07) is down, the standby host saplxvm08 will become the
new primary host. The database on host saplxvm07 will be the principal standby, and saplxvm09 will be the
auxiliary standby. Once a failover happens, the goal should always be to bring back the failed host and
return to the original configuration. The following subsections show failover scenarios and how to recover
from them.
To test these failover scenarios, a DB2 software failure is simulated by killing the db2sysc process, which is
the main DB2 process. This is similar to a kernel panic and causes DB2 to become inaccessible.
Note: During the test scenario, workload is generated using SAP transaction SGEN, and the cluster is monitored using
lssam top.
Example:
saplxvm07:db2aha 64> ps -ef | grep -i db2sysc
db2aha 4428 4426 2 10:18 ? 00:00:45 db2sysc 0
db2aha 32216 28174 0 10:48 pts/0 00:00:00 grep -i db2sysc
saplxvm07:db2aha 65> which db2sysc
/db2/db2aha/sqllib/adm/db2sysc
saplxvm07:db2aha 66> mv /db2/db2aha/sqllib/adm/db2sysc
/db2/db2aha/sqllib/adm/db2sysc.backup
saplxvm07:db2aha 66>
saplxvm07:db2aha 66> kill -9 4428
saplxvm07:db2aha 66>
When the primary (on host saplxvm07) goes down, the standby (on host saplxvm08) will automatically take
over. The SAP workload is temporarily interrupted but is automatically failed over to the standby database
and continues without stopping the application. The following screenshots show changes in the cluster
during the failover:
47
Figure 12: Cluster status during failover
As shown in the above figure, the cluster resource for host saplxvm07 is in pending online state as TSA tries
to restart DB2. However, since it is inaccessible (db2sysc was renamed, so it cannot start DB2), failover
occurs, as shown in the figure below:
Figure 13: Cluster status after the primary went down and the standby took over
As shown in the figure above, failover has occurred, the database is up as the resource group
db2_db2aha_db2aha_AHA is set to Online. The database is up
asIBM.Application:db2_db2aha_db2aha_AHA is set to Online.
48
The resource group IBM.ResourceGroup:db2_db2aha_saplxvm07 has the status Pending Online.
This indicates that it cannot start DB2 on saplxvm07 (since db2sysc has been renamed). Once DB2 can
be started on saplxvm07, SA MP automatically detects that and assigns saplxvm07 as the principal
standby.
The example below shows the renaming of db2sysc to make DB2 accessible again.
Example:
saplxvm07:db2aha 74> mv /db2/db2aha/sqllib/adm/db2sysc.backup
/db2/db2aha/sqllib/adm/db2sysc
saplxvm07:db2aha 81> db2start
07/23/2014 11:34:10 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
After DB2 becomes accessible on host saplxvm07, SA MP assigns it as the principal standby. The auxiliary
standby remains as it is. Once the failed system is brought back up and is back in the cluster, the system will
be in HADR catch up state. All the logs from the current primary (saplxvm08) must be replayed for the
system to be in PEER state. To make saplxvm07 the primary again, HADR takeover can be performed
using the Graceful Maintenance Tool (GMT) as described in section 5.3.2 of this document.
7.2 Failover scenario #2: Both the primary and principal standby are down
The following scenario describes a situation where both the primary and the principal standby are
unavailable. This is similar to a disaster recovery situation where the auxiliary standby must be brought
online.
When the primary (on host saplxvm07) and the principal standby are both unavailable, all applications will
be stopped as no automatic failover is available. A manual takeover must be initiated from the auxiliary
standby database.
The following figure shows the SA MP resources when both the primary and the principal standby are down.
Figure 14: Cluster status after both the primary and the standby went down
The following steps must be performed to make the auxiliary standby the new primary and to start SAP:
1. Stop the SAP central instance and all application servers.
49
2. The parameter SAPDBHOST in the SAP profile /sapmnt/AHA/profile/DEFAULT.PFL needs to be
updated to point to the auxiliary standby host saplxvm09. Currently, it is pointing to the virtual host name
saplxvmsap. The new values should look as follows:
3. The parameter Hostname in the CLI driver file db2cli.ini needs to be updated to point to the auxiliary
standby host saplxvm09 instead of the virtual host saplxvmsap:
The takeover HADR command is executed on the auxiliary standby to make the host saplxvm09 the new
primary. The takeover BY FORCE option must be used because the primary and the principal standby are not
available.
saplxvm09:db2aha 11> db2pd -db aha -hadr | grep HADR_ROLE
HADR_ROLE = STANDBY
saplxvm09:db2aha 12> db2 TAKEOVER HADR ON DATABASE AHA BY FORCE
DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
saplxvm09:db2aha 13> db2pd -db aha -hadr | grep HADR_ROLE
HADR_ROLE = PRIMARY
saplxvm09:db2aha 17> db2pd -db aha -hadr | grep HADR_STATE
HADR_STATE = DISCONNECTED
Note: The host saplxvm09 is now the primary and all applications are connected directly to this host. The SA MP cluster
and automatic failover are not in effect. Note that because the auxiliary standby is for DR purposes only and is
forced to use the HADR SUPERASYNC synchronization mode, the failover may come with the cost of inflight
transaction loss during this kind of widespread disaster. The takeover operation may take longer depending on the
amount of logs to be replayed from the buffer as well as from disk if log spooling is used.
Once the old primary, host saplxvm07, and the old principal standby, host saplxvm08, are brought back
up, HADR is not active on those hosts:
saplxvm07:db2aha 102> db2pd -db aha -hadr
Database AHA not activated on database member 0 or this database name cannot be found
in the local database directory.
Option -hadr requires -db <database> or -alldbs option and active database.
50
The following steps can be performed to include the principal standby and the old primary back into the
HADR cluster.
1. At the moment, the old primary on host saplxvm07 cannot be activated since the auxiliary standby has
become the new primary through forced takeover. The host saplxvm07 still is the primary, so any attempt
to activate it will result in an error because a primary is already running.
saplxvm07:db2aha 108> db2 activate db aha
SQL1776N The command cannot be issued on an HADR database. Reason code = "6".
saplxvm07:db2aha 109> db2 ? SQL1776N
6
2. The principal standby on host saplxvm08 can be activated since it is still a standby:
The changes are reflected in the new primary, on host saplxvm09, as shown in the db2pd output below:
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm09
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm07
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 13:36:31.238696 (1406136991)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 239
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 17
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
51
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 07/23/2014 13:41:54.000000 (1406137314)
READS_ON_STANDBY_ENABLED = N
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SUPERASYNC
STANDBY_ID = 2
LOG_STREAM_ID = 0
HADR_STATE = REMOTE_CATCHUP
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm09
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 13:19:45.058717 (1406135985)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 37
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 3
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
52
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = N
At the moment, host saplxvm09 is the primary, host saplxvm07 is the principal standby, and host
saplxvm08 is the auxiliary standby host. The cluster and automatic failover are still not active. The following
steps can be performed to go back to the original setup with the primary on host saplxvm07, the principal
standby on host saplxvm08, and the auxiliary standby host saplxvm09.
1. The SAP central instance and application servers must be stopped to make changes to the SAP profile
variables.
2. The cluster configuration must be deleted using option 4 - Delete Database Cluster of
sapdb2cluster.sh to avoid interference (SQL1770N).
3. The takeover HADR command is executed in the standby host saplxvm07.
5. The parameter Hostname in the CLI driver file needs to be updated to point to the virtual host
saplxvmsap instead of the saplxvm09:
6. Now that the original HADR configuration has been restored, option 2 - Create Database Cluster of
sapdb2cluster.sh can be used to create the cluster and enable automatic failover.
Note: The cluster configuration script generates the cluster_config.xml file in the /tmp directory and executes the
db2haicu -f /tmp/cluster_config.xml command in the principal standby and the primary hosts. Error
messages are logged in sapdb2cluster.log, /tmp/cluster_config.log, and db2diag.log files.
For errors related to the configuration, it is recommended to check the /tmp/cluster_config.xml file format
and values. More details can be found in the IBM Knowledge Center at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0052800.h
tml?lang=en
53
7. Now the SAP central instance and application servers can be started.
If the principal standby is down, it will simply be out of the HADR cluster. The applications are not affected.
The standby simply needs to be activated and it will be back in the cluster.
Example:
saplxvm08:db2aha 68> db2pd -db aha -hadr
Database AHA not activated on database member 0 or this database name cannot be found
in the local database directory.
Option -hadr requires -db <database> or -alldbs option and active database.
HADR_ROLE = STANDBY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 0
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 15:18:49.837590 (1406143129)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 0
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 0
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 1.446530
LOG_HADR_WAIT_ACCUMULATED(seconds) = 230.194
LOG_HADR_WAIT_COUNT = 1093
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)
STANDBY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)
54
STANDBY_REPLAY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 07/23/2014 15:16:10.000000 (1406142970)
READS_ON_STANDBY_ENABLED = N
Note: Once HADR has been set up properly with multiple standbys, it is recommended to exercise different failover
scenarios and develop documented procedures to follow during an actual disaster.
55
8 Miscellaneous troubleshooting in an SA MP
environment
This section introduces some useful commands for operations within an SA MP environment.
Note: In this case, the HADR_CONNECT_STATUS_TIME shows the congestion start time.
While in a CONGESTED state and with a synchronization mode of SYNC, NEARSYNC, or even ASYNC, transactions
on the primary are stopped. With SUPERASYNC, transactions continue while the cluster is in a CONGESTED
state.
Note: To solve this issue, it is recommended to increase DB2_HADR_BUF_SIZE on the standby (as well as on the
primary in case it becomes the standby after a takeover). By default, DB2_HADR_BUF_SIZE is twice the size of
the primary's LOGBUFSZ which is also the minimum value.
As mentioned earlier, HADR_SPOOL_LIMIT can be used to avoid HADR congestion. HADR_SPOOL_LIMIT allows
the standby to spool logs to disk if the buffer is full. This means that the primary can continue with transactions
without having to wait for the standby to flush out the logs from the buffer. This is especially effective if congestion
is occurring during peak time.
Example:
saplxvm07:db2aha 61> db2 get db cfg for aha | grep -i LOGBUFSZ
Log buffer size (4KB) (LOGBUFSZ) = 1024
saplxvm07:db2aha 63> db2 get db cfg for aha | grep -i HADR_SPOOL_LIMIT
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
Note: DB2 HADR-related registry variables can be checked using the db2set command. Unless the registry variables
are set to a user-defined value, the default values are used for each variable.
Example:
saplxvm07:db2aha 58> db2set -lr | grep -i hadr
DB2_HADR_BUF_SIZE
DB2_HADR_NO_IP_CHECK
DB2_HADR_PEER_WAIT_LIMIT
DB2_HADR_SOSNDBUF
DB2_HADR_SORCVBUF
DB2_HADR_ROS
saplxvm07:db2aha 50> db2 get db cfg for aha | grep LOGBUFSZ
Log buffer size (4KB) (LOGBUFSZ) = 1024
saplxvm07:db2aha 51> db2set DB2_HADR_BUF_SIZE=3072
saplxvm07:db2aha 52> db2set | grep HADR
DB2_HADR_BUF_SIZE=3072
56
More on the DB2 HADR log shipping method can be found under the following link:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20log
%20shipping
For DB2 Version 10.5 and higher, db2fodc also provides an option to collect congestion-related traces
automatically. The automatic congestion trace can be turned on and off using the following commands:
Example:
saplxvm07:db2aha 54> db2fodc -hadr -db AHA -detect
"db2fodc": Starting detection ...
db2fodc:
Hostname: saplxvm07 HADR congestion detect iteration: 1
saplxvm07:db2aha 50> db2fodc -detect off
"db2fodc": Stopping all FODC detections. Note that it can take up to 60
seconds to stop all detections.
More on the DB2 HADR automatic congestion detection tool can be found at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/r0060632.html?l
ang=en
Example:
saplxvm08:db2aha 57> db2haicu -f /tmp/cluster_config.xml
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).
You can find detailed diagnostic information in the DB2 server diagnostic log file
called db2diag.log. Also, you can use the utility called db2pd to query the status of
the cluster domains you create.
For more information about configuring your clustered environment using db2haicu, see
the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in
the DB2 Information Center.
db2haicu determined the current DB2 database manager instance is 'db2aha'. The
cluster configuration that follows will apply to this instance.
db2haicu is collecting information on your current setup. This step may take some
time as db2haicu will need to activate all databases for the instance to discover all
paths ...
Creating domain 'sap_ahadb2' in the cluster ...
Creating domain 'sap_ahadb2' in the cluster was successful.
57
Configuring quorum device for domain 'sap_ahadb2' ...
Configuring quorum device for domain 'sap_ahadb2' was successful.
Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network
'db2network' ...
Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network
'db2network' was successful.
Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network
'db2network' ...
Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network
'db2network' was successful.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
HADR database 'AHA' has been determined to be valid for high availability. However,
the database cannot be added to the cluster from this node because db2haicu detected
this node is the standby for HADR database 'AHA'. Run db2haicu on the primary for
HADR database 'AHA' to configure the database for automated failover.
All cluster configurations have been completed successfully. db2haicu exiting ...
You can find detailed diagnostic information in the DB2 server diagnostic log file
called db2diag.log. Also, you can use the utility called db2pd to query the status of
the cluster domains you create.
For more information about configuring your clustered environment using db2haicu, see
the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in
the DB2 Information Center.
db2haicu determined the current DB2 database manager instance is 'db2aha'. The
cluster configuration that follows will apply to this instance.
db2haicu is collecting information on your current setup. This step may take some
time as db2haicu will need to activate all databases for the instance to discover all
paths ...
Configuring quorum device for domain 'sap_ahadb2' ...
Configuring quorum device for domain 'sap_ahadb2' was successful.
Network adapter 'eth0' on node 'saplxvm07' is already defined in network 'db2network'
and cannot be added to another network until it is removed from its current network.
Network adapter 'eth0' on node 'saplxvm08' is already defined in network 'db2network'
and cannot be added to another network until it is removed from its current network.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
Adding HADR database 'AHA' to the domain ...
Adding HADR database 'AHA' to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...
58
8.3 SA MP cluster resource group
The rgreq command can be used to start, stop, cancel, lock, unlock, or move an SA MP resource group.
Example:
The following command is used to unlock the resource group db2_db2aha_db2aha_AHA-rg:
Note: Because of APAR IC98315: VIRTUAL IP RESOURCE (IBM.SERVICEIP) CANNOT BE FOUND, one of the
resource groups in the DB2 HADR cluster configuration may remain locked after a graceful cluster switch in DB2
10.5 GA. The issue has been resolved in DB2 10.5 FP1.
saplxvm07:~ # lssam T
Traces can also be collected for a particular resource manager (RM) using the lssrc tool.
lssrc -ls IBM.RecoveryRM
lssrc -ls IBM.GblResRM
lssrc -ls IBM.StorageRM
To collect a trace, first find out where the trace files are located using the following command:
saplxvm07:~ # lssrc -ls IBM.RecoveryRM | grep trace_summary
/var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary -> spooling not enabled
saplxvm07:~ # lssrc -ls IBM.GblResRM | grep trace_summary
/var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary -> spooling not enabled
The following commands can be used to format the trace to a more readable text and store them to the
specified location /tmp:
saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary >
/tmp/RecoveryRM_trace.out
saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary >
/tmp/ GblResRM_trace.out
The samlog command is a handy tool that can be used to collect, format, merge, and display SA MP-related
logs.
Example:
saplxvm07:~ # samlog -t 15m | more
samlog called at 2014-09-25 15:30:35 on saplxvm07 with options
System time offset between local host and saplxvm08 is +5.29 seconds. You may ad
just system times in cluster.
saplxvm07 0.00 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary
saplxvm08 +5.29 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary
-------------------------------------------------------------------------
A list of IBM Tivoli System Automation command references can be found using the following link:
http://www-
01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/samprgcharmcmds.html?lang
=en
59
8.5 HADR simulator
The DB2 HADR simulator can help plan, measure, and diagnose an HADR environment quickly and
efficiently. The tool can be downloaded for free from the IBM developerWorks wiki page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20sim
ulator?section=Introduction. The wiki page also provides a detailed description of how to use the simulator.
The TAKEOVER HADR command is issued on the standby with the PEER WINDOW ONLY option and the
primary is not brought down before the peer window expires.
After a forced takeover, the HADR-related configuration parameters (hadr_remote_host,
hadr_remote_inst, and hadr_remote_svc) are automatically updated on the new primary and
its standbys including the old primary. If the primary is not shut down before a forced takeover from
the standby, it might result in a split-brain condition as the automatic reconfiguration does not take
place until the old primary is shut down and restarted as standby.
To avoid a split brain situation during a force takeover, the standby sends a disabling message, also called a
poison pill, to the primary. The primary is shut down and cannot be reactivated unless the poison pill is
cleared by a START HADR command. More information on proper takeover scenarios in a multiple standby
HADR configuration can be found at the following link: http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0059999.html?l
ang=en
Note: Once a split-brain condition is encountered, HADR must be configured from scratch using a database backup from
the host that has the most up-to-date logs. In such a situation, it is recommended to contact SAP support.
60
9 Conclusion
The DB2 HADR feature delivers a complete Disaster Recovery (DR), High Availability (HA) and Continuous
Availability solution, providing customers with greater data protection with minimum performance impact.
HADR also comes with a variety of configuration options to satisfy different business needs.
For example, SYNC mode can be used for guaranteed database log shipping as opposed to SUPERASYNC
mode which provides zero performance impact because the primary does not wait for an ACK after logs are
shipped. Log spooling helps achieve higher performance, reduced chance of congestion, and greater data
protection.
Automatic failover using SA MP along with SAPs Cluster Setup Tool and Graceful Maintenance Tool make it
easy for customers to monitor and maintain the HADR cluster.
With HADR support for IBM DB2 BLU Acceleration, DB2 HADR can now be used in SAP BW environments
with DB2 column-organized tables providing essential high performance as well as crucial DR and HA
capabilities. Moreover, DB2 HADR and BLU Acceleration are provided as a DB2 feature and can be enabled
out of the box.
With improvements to the DB2 LOAD command, customers can now take advantage of the faster LOAD
operation without compromising data replication.
Under ideal conditions, the tools and processes described in this paper can be used to implement an SAP
Business Suite system that is continuously available, protected from wide-spread disasters, and facilitates
micro-outages to help perform database maintenance with zero downtime.
61
10 Related Content
Note 1555903 - DB6: Supported DB2 Database Features
Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)
Note 1746101 - DB6: High Availability with SAP on DB2 using SA MP
Note 1443426 - DB6: Graceful Cluster Switch
Note 960843 - DB6: Cluster Setup Tool
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR
62
Copyright
2014 SAP SE SE or an SAP SE affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any
form or for any purpose without the express permission of SAP SE.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE and its affiliated companies (SAP SE Group) for informational
purposes only, without representation or warranty of any kind, and SAP SE Group shall not be liable for
errors or omissions with respect to the materials. The only warranties for SAP SE Group products and
services are those that are set forth in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an additional warranty.
SAP SE and other SAP SE products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP SE in Germany and other countries.
Please see
http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark
for additional trademark information and notices.
63