Sie sind auf Seite 1von 63

Enabling Database High Availability

Using DB2 HADR and IBM Tivoli SA MP


in an SAP Environment

Applies to:
SAP NetWeaver 7.0 or higher on DB2 10.1 or higher for Linux, UNIX, and Windows.

Summary
Multiple improvements have been made to the DB2 High Availability Disaster Recover (HADR) feature. DB2
Version 10.1 for Linux, UNIX, and Windows supports multiple standbys providing customers with true
database disaster recovery (DR) capability along with high availability (HA). IBM DB2 BLU Acceleration is
also supported for HADR environments as of DB2 10.5 FP4. This paper describes the new features related
to HADR as well as provides examples to enable single or multiple standbys in an SAP environment to
achieve DR in HA systems.

Authors: Ali Mehedi, Catherine Vu, Edgar Maniago


Company: IBM Canada Inc. and SAP Canada Inc.
Created on: 21 November 2014

Author Bio
Ali Mehedi is a Software Developer at IBM with years of experience in test tools development, DB2
Administration, SAP NetWeaver installation, configuration and maintenance, and DB2 for LUW and SAP
integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX,
and Linux environments.

Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP
Integration and Support Center located in the Toronto IBM Lab. He currently tests, develops, and integrates
new features of DB2 with SAP. Through his role in SAP Development Support and as a Customer Advocate
for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and
performance optimization.

Catherine Vu is a member of the IBM SAP Integration and Support team that plays a critical role in certifying
every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability.
In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and
Windows customers. Before joining the IBM and SAP Integrations and Support team, Catherine had many
years experience in DB2 when she was working in the DB2 Development team and the Technical
Enablement team.

1
Table of Contents
1 Introduction ...................................................................................................................................................... 4
2 Planning ........................................................................................................................................................... 5
2.1 References ................................................................................................................................................ 5
2.2 Technology................................................................................................................................................ 5
2.2.1 IBM Tivoli System Automation for Multiplatforms (SA MP) ................................................................................. 5
2.2.2 HADR synchronization modes ............................................................................................................................ 5
2.2.3 Multiple standbys ................................................................................................................................................ 6
2.2.4 Log spooling ........................................................................................................................................................ 7
2.2.5 HADR replay delay.............................................................................................................................................. 7
2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address ................................................. 7
2.2.7 DB2 HADR for DB2 BLU Acceleration ................................................................................................................ 8
2.2.8 DB2 LOAD with COPY YES for BLU tables ........................................................................................................ 8
2.3 Hardware and operating system requirements ....................................................................................... 10
2.4 DB2 database requirements ................................................................................................................... 11
3 Preparation .................................................................................................................................................... 12
3.1 Configuration of the test system ............................................................................................................. 12
3.1.1 Hardware and operating system in the test systems ......................................................................................... 13
3.1.2 Required software downloads ........................................................................................................................... 13
3.2 Basic network setup ................................................................................................................................ 13
3.3 File system setup .................................................................................................................................... 13
3.4 Operating system users and groups ....................................................................................................... 14
4 Installing the standby ..................................................................................................................................... 15
4.1 Exporting the file systems ....................................................................................................................... 15
4.2 Turning on DB2 log archiving.................................................................................................................. 15
4.3 Taking a backup of the primary............................................................................................................... 16
4.4 Performing a homogeneous system copy using SWPM......................................................................... 16
4.5 Configuring ports ..................................................................................................................................... 21
4.6 Restoring the database from a backup ................................................................................................... 21
4.7 Configuring databases for HADR............................................................................................................ 22
4.8 Performing HADR checks ....................................................................................................................... 24
4.9 Starting HADR......................................................................................................................................... 24
4.10 Checking the HADR status using the db2pd tool.................................................................................. 25
5. Enabling automatic failover using SA MP..................................................................................................... 27
5.1 Installing the SA MP software and license .............................................................................................. 27
5.2 Setting up the HADR cluster ................................................................................................................... 28
5.2.1 Creating the cluster configuration file ................................................................................................................ 28
5.2.2 Creating the database cluster ........................................................................................................................... 30
5.2.3 Displaying the database cluster ........................................................................................................................ 31
5.2.4. Enabling the SAP system with virtual database host name and IP address .................................................... 34
5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT) ..................................... 34
5.3.1 GMT Configuration ............................................................................................................................................ 35
5.3.2 Micro-failover test .............................................................................................................................................. 36

2
5.3.4 Testing a disaster scenario ............................................................................................................................... 40
6. Installing the auxiliary standby database instance ....................................................................................... 42
6.1 Mounting file systems.............................................................................................................................. 42
6.2 Updating port configurations ................................................................................................................... 42
6.3 Performing a homogeneous system copy using SWPM......................................................................... 42
6.4 Configuring the HADR auxiliary standby database................................................................................. 42
7 Failover scenarios.......................................................................................................................................... 47
7.1 Failover scenario #1: The primary is down ............................................................................................. 47
7.2 Failover scenario #2: Both the primary and principal standby are down ................................................ 49
7.3 Failover scenario #3: The principal standby is down .............................................................................. 54
8 Miscellaneous troubleshooting in an SA MP environment ............................................................................ 56
8.1 HADR congestion.................................................................................................................................... 56
8.2 Manual creation and deletion of an SA MP cluster ................................................................................. 57
8.3 SA MP cluster resource group ................................................................................................................ 59
8.4 Collection of traces.................................................................................................................................. 59
8.5 HADR simulator ...................................................................................................................................... 60
8.6 Split-brain condition................................................................................................................................. 60
9 Conclusion ..................................................................................................................................................... 61
10 Related Content ........................................................................................................................................... 62
Copyright........................................................................................................................................................... 63

3
1 Introduction
First introduced in DB2 Version 8.2, the DB2 High Availability Disaster Recovery (HADR) database
replication feature provides protection against database outages and site failures. In an HADR environment,
the transaction logs from a source database, called the primary, are shipped via TCP/IP and replayed to a
target database, called the standby. If the primary is offline or is lost due to a disaster, the standby can be
made available as the new primary using a procedure called HADR failover.
HADR failover can be automated using IBM Tivoli System Automation for Multiplatforms (SA MP). This
allows applications, such as SAP, to continue with zero disruption to user activities under ideal conditions.
Starting with DB2 10.1, DB2 HADR supports multiple standby databases. This makes IBM DB2 capable of
providing a complete Disaster Recovery (DR), High Availability (HA) and Continuous Availability solution in a
single and easily manageable feature. The following figure illustrates a DB2 HADR cluster that contains
multiple standby databases, along with automatic failover implemented through SA MP:

Figure 1: DB2 HADR Topology with Multiple Standbys

Starting with DB2 10.5 Fix Pack 4, the DB2 HADR feature is also supported for databases containing
column-organized (BLU) tables. IBM DB2 with BLU Acceleration is optimized for the SAP environment. The
greater performance of DB2 BLU Acceleration combined with the improved HA and DR capabilities of DB2
HADR makes DB2 an ideal RDBMS for an SAP BW environment.
This document will cover the implementation of DB2 HADR by adding a principal and auxiliary standby to an
existing SAP NetWeaver (ABAP) system on AIX, Solaris SPARC or Linux. Furthermore, it describes the
implementation of automatic failover using SA MP as well as recovery from several failover scenarios.

4
2 Planning
DB2 HADR does not require a brand new SAP installation. The version-specific installation guide, for
example, for SAP NetWeaver 7.31 can be found at http://help.sap.com/nw731/ .
In general, SAP NetWeaver installation guides can be found at http://service.sap.com/instguidesnw <your
SAP NetWeaver main release> Installation Installation SAP NetWeaver Systems.

2.1 References
The following documents should be reviewed before reading this paper:

IBM DB2 High availability


http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0006354.html

SAP Note 1555903 - DB6: Supported DB2 Database Features


http://service.sap.com/sap/support/notes/1555903

SAP Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)
http://service.sap.com/sap/support/notes/1612105

SAP Note 960843 - DB6: High Availability for DB2 using SA MP


http://service.sap.com/sap/support/notes/960843

SAP Note 1851832 - DB6: DB2 10.5 Standard Parameter Settings


http://service.sap.com/sap/support/notes/1851832

2.2 Technology
The DB2 HADR feature comes with many configuration and performance tuning options for various business
needs.

2.2.1 IBM Tivoli System Automation for Multiplatforms (SA MP)


IBM Tivoli SA MP is a high-availability cluster solution that provides several monitoring mechanisms to detect
system failures and a set of rules to initiate the correct action without any user intervention. The set of rules
is called a policy, which describes the relationships between applications and resources. This provides SA
MP with extensive up-to-date information about the system landscape so that it can restart the resource on
the current node or move the database instance to another cluster node.
Since DB2 V9.1, SAP has been partnering with IBM to provide SAP customers with a free two-node license
of SA MP for the IBM DB2 database server. SAP has also published the installation guide IBM DB2 High
Availability Solution: IBM Tivoli System Automation for Multiplatforms on how to set up a database cluster
using SA MP. You can find the latest version of this guide on SAP Service Marketplace at
http://service.sap.com/instguidesnw.

2.2.2 HADR synchronization modes


To ensure that logs are shipped to the standby, the primary must wait for the standby to acknowledge (ACK)
before it can commit. This can have a significant performance impact on the workload depending on the
network bandwidth and the distance between the primary server and the standby server. On the other hand,
if the primary does not wait for an ACK, the standby will be out of sync with the primary. Therefore, it has the
potential risk of data loss in case of an outage.
Defined by database configuration parameter HADR_SYNCMODE, HADR synchronization modes control the risk
of protection against transaction loss during the log shipping. The following table contains the available
synchronization options:

5
SYNC mode Definition Standby ACK Data protection
for log receive

SYNC Transactions are committed Yes Logs are guaranteed to be


on the primary when the stored in both the primary and
database logs are written to the standby. Therefore, this
disk on the primary and the mode provides the greatest
standby and when the protection against transaction
primary has received an ACK loss.
from the standby.

NEARSYNC Transactions are committed Yes Possibility of data loss if both


on the primary when the the primary and the standby
database logs are written to failed at the same time. The
disk on the primary and transactions that are in
received in memory on the memory on the standby are
standby and when the lost.
primary has received an ACK
from the standby.

ASYNC Transactions are committed No Possibility of data loss if both


on the primary when the transaction logs on the primary
database logs are written to and commit record(s) in-flight
disk on the primary and sent to the standby are lost.
to TCP/IP successfully.

SUPERASYNC Transactions are committed No Possibility of data loss in the


on the primary when the standby if a failover operation
database logs are written to is required while there are
disk on the primary. missing log records.

Table 1: DB2 HADR Synchronization Modes

Note: SAP recommends to use NEARSYNC as this provides adequate data protection without significant performance
impact.

2.2.3 Multiple standbys


Starting with DB2 10.1, DB2 HADR supports multiple standbys. The first standby database is called the
principal standby. Any additional standby database is called the auxiliary standby. The transaction logs from
the primary are shipped via TCP/IP and replayed to all the standbys.

Note: A maximum of three standbys is allowed.

Depending on which synchronization mode is chosen, HADR can have a performance impact on the primary
due to network latency during HADR log shipping. Better performance can be achieved by decreasing the
distance between the primary and the standby servers, and by having them connected with a high
performance LAN backbone. However, this introduces the risk of losing both the primary and the standby
during a wide scale natural disaster such as flood, fire, etc.
Increasing the distance, on the other hand, increases the network latency which causes performance
degradation and a longer failover time. Therefore, with a single standby, there is a tradeoff between the
database performance and the degree of DR capability. The HADR multiple standby feature solves this

6
problem. The principal standby is used to achieve HA during outages and the auxiliary standby is to be used
for DR purposes only.

Note: It is recommended to have the primary and the principal standby in the same building with a high performance LAN
connection for faster log shipping and quicker failover during micro-outages. The auxiliary standby is only to be
used for DR purposes and is recommended to be in a different location, preferably in a different city or country.
There is no restriction on the maximum distance between the primary and the standby servers. The auxiliary
standby is forced to use SUPERASYNC synchronization mode which has no synchronous dependency on
replication to the standby. Therefore, there is minimal performance impact for having multiple standbys.

2.2.4 Log spooling


Logs that are sent to the standby are first stored in a memory area called the HADR log receive buffer. This
is controlled by the database configuration parameter DB2_HADR_BUF_SIZE. If the standby is slow in
replaying the received logs, this buffer might be full, causing the primary to be blocked because it is waiting
for an ACK from the standby.
The database configuration parameter HADR_SPOOL_LIMIT is used to define the maximum amount of data
that can be written to disk on the standby if it falls behind in log replay. This can be used to improve HADR
performance while providing better data protection. If the standby falls behind while replaying the logs and
HADR_SPOOL_LIMIT is defined, that amount of logs (defined by HADR_SPOOL_LIMIT) is written to the disk on
the standby without having the primary to wait for an ACK. The standby can then later read these logs and
replay them when it is able to do so. This feature is useful to deal with sudden spikes of workload on the
primary during peak business hours. However, setting a higher HADR_SPOOL_LIMIT or setting it to unlimited
(-1) will cause a higher takeover time because the standby has to read and apply all the logs that have not
yet been applied.

2.2.5 HADR replay delay


Another way to add extra protection to data from accidental human errors is to use the HADR_REPLAY_DELAY
database configuration parameter. If this is enabled, the standby will wait for the duration defined by
HADR_REPLAY_DELAY before replaying the logs received. This is useful if a user accidentally deletes data or a
database object from the primary. The deleted data or object can be recovered before the change gets
propagated to the standby within the replay delay time.

Note: SAP recommends using HADR_SPOOL_LIMIT along with HADR_REPLAY_DELAY to accommodate the logs
accumulated during the replay delay period.

2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address
If there is a change in HADR role, that is, the standby has to take over the primary during an outage, all the
clients can reconnect to the new primary automatically by using one of the following options:
1. Using a virtual IP (VIP): The VIP is bound to the primary servers network interfaces. After a
takeover, the virtual IP is bound to the network interfaces of the standby server (the new primary
server).

2. Using the DB2 Automatic Client Reroute (ACR) feature: The client is configured to know the two
database servers. If the database client cannot connect to the configured primary server, the
database client tries to connect to the configured standby (alternate) server.

Note: SAP recommends using the virtual IP address option with SA MP. More details can be found in SAP Note
1568539, DB6: HADR - Virtual IP or Automatic Client Reroute
(http://service.sap.com/sap/support/notes/1568539).
Automatic failover is not supported between the primary and an auxiliary standby. Auxiliary standbys are to be
used for DR purposes only. A manual takeover must be issued on one of the auxiliary standbys to switch it to the
primary.

7
2.2.7 DB2 HADR for DB2 BLU Acceleration
As of DB2 10.5 Fix Pack 4, the DB2 HADR feature can be used with databases containing BLU (column-
organized) tables. Except for Reads on Standby (RoS), all HADR features including multiple standbys are
supported for BLU tables without any additional requirements or settings.

2.2.8 DB2 LOAD with COPY YES for BLU tables


DB2 10.5 Fix Pack 4 also supports the DB2 LOAD command with the COPY YES option for BLU tables.
Therefore, the LOAD command can be used in an HADR environment with the COPY YES option and changes
will be propagated to the standby databases.
Example:
db2 LOAD FROM <filename>.ixf OF IXF SAVECOUNT 20000 INSERT INTO <schema
name>."<table name>" COPY YES TO <shared directory>'

The COPY YES option creates a mini backup image of the data loaded into the shared directory specified and
the standby reads from the backup and replays the changes. Therefore, the shared directory must be
accessible by the standby database. In case of a failover or after a database restore, this backup can be
used to roll forward to the end of logs that also includes the data loaded in the primary using the LOAD
command.

Note: The LOAD command with the COPY NO option is not supported for HADR environments. Only LOAD with the COPY
YES option is supported in an HADR environment. If data is loaded into tables without the COPY YES option in the
HADR primary, the changes will not be propagated to the standby. Moreover, the table will be marked as unusable
in all standbys as it becomes inconsistent with the table that is in the primary. The
DB2_LOAD_COPY_NO_OVERRIDE registry variable can be set on the primary database to enable a load operation
with the COPY NO option to be converted to a load operation with the COPY YES option. More information can be
found at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0011761.
html?lang=no. The HADR state is not affected by the LOAD operation. The DB2 list utilities command
can be used to display load progress in the primary.

Example:
db2 list utilities show detail

ID = 5
Type = LOAD
Database Name = D01
Member Number = 0
Description = [LOADID: 27862.2014-10-21-16.40.39.069004.0 (4;4)]
[*LOCAL.db2d01.141021204038] OFFLINE LOAD Unknown file type AUTOMATIC INDEXING
REPLACE COPY YES SAPD01./BIC/FICBLU01-ALI
Start Time = 10/21/2014 16:40:39.082853
State = Executing
Invocation Type = User
Progress Monitoring:
Phase Number = 1
Description = SETUP
Total Work = 0 bytes
Completed Work = 0 bytes
Start Time = 10/21/2014 16:40:39.082859

Phase Number [Current] = 2


Description = ANALYZE
Total Work = 41282171 rows
Completed Work = 3949572 rows
Start Time = 10/21/2014 16:40:39.399368

8
Phase Number = 3
Description = LOAD
Total Work = 0 rows
Completed Work = 0 rows
Start Time = Not Started

Phase Number = 4
Description = BUILD
Total Work = 2 indexes
Completed Work = 0 indexes
Start Time = Not Started

The standby is updated after the load is complete in the primary. The standby db2diag.log file shows the
following logs indicating the load operation start and completion time where data is being loaded in the
primary into the table /BIC/FICBLU01-ALI.
Example:
2014-10-15-16.48.58.682164-240 I20140A498 LEVEL: Warning
PID : 4915624 TID : 27765 PROC : db2sysc 0
INSTANCE: db2d01 NODE : 000 DB : D01
APPHDL : 0-175 APPID: *LOCAL.DB2.141015193921
HOSTNAME: sapaix11
EDUID : 27765 EDUNAME: db2agent (D01) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:548
DATA #1 : String, 74 bytes
Starting to restore a load copy.
SAPD01./BIC/FICBLU01-ALI.20141015160659

2014-10-15-16.52.59.701675-240 I26278A449 LEVEL: Warning


PID : 4915624 TID : 27765 PROC : db2sysc 0
INSTANCE: db2d01 NODE : 000 DB : D01
APPHDL : 0-175 APPID: *LOCAL.DB2.141015193921
HOSTNAME: sapaix11
EDUID : 27765 EDUNAME: db2agent (D01) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1142
MESSAGE : Load copy restore completed successfully.

Note: As long as the standby has access to the shared directory where the mini backup from LOAD with COPY YES is
located, LOAD will complete even if a failover happens right after data is loaded into the primary.

If data is loaded into a table in the primary using the LOAD command without COPY YES option or with COPY
NO option, the table will be marked as unavailable in the standby.
Example:
SELECT COUNT(*) AS COUNT FROM SAPD01./BIC/FICBLU01-ALI
COUNT
-----------
SQL1477N For table "SAPD01./BIC/FICBLU01-ALI" an object "130" in table space
"20" cannot be accessed. SQLSTATE=55019

db2diag.log:

2014-09-03-21.18.24.500558-240 E1950A551 LEVEL: Warning


PID : 14025052 TID : 18507 PROC : db2sysc 0
INSTANCE: db2d01 NODE : 000 DB : D01

9
APPHDL : 0-8 APPID: *LOCAL.DB2.140904011422
HOSTNAME: sapaix11
EDUID : 18507 EDUNAME: db2redow (D01) 0
FUNCTION: DB2 UDB, data management, sqldMarkObjInErr, probe:1
MESSAGE : ADM5571W The "DATA" object with ID "130" in table space "20" for
table "TBSPACEID=20.TABLEID=130" is being marked as unavailable.

If a failover happens before the data load is complete in the primary, the tablespace containing the table will
be marked as Restore Pending in the standby when it becomes the new primary. The table will be marked as
Load Pending in the primary, now the standby.
db2 list tablespaces show detail
Tablespace ID = 3
Name = D01#FACTI
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0100
Detailed explanation:
Restore pending

Tablespace ID = 4
Name = D01#FACTD
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0100
Detailed explanation:
Restore pending

To resolve this issue, the load operation must be terminated using the TERMINATE option from the database
host where it was started.
Example:
db2 LOAD FROM <filename>.ixf OF IXF TERMINATE INSERT INTO <schema name>."<table
name>" COPY YES TO <shared directory>'

Note: If the LOAD command with the COPY YES option is used, this situation can be avoided in an HADR environment.
To recover from a Restore Pending state, the tablespace must be restored from a tablespace level backup.
Tablespace level backup is not supported for HADR. Therefore, to take a tablespace level backup, HADR must be
disabled. Once the tablespace is restored, the standby must be refreshed with a new copy (using a full database
backup) of the primary.

2.3 Hardware and operating system requirements


1. The operating system on the primary and the standby must have the same version, including
patches.

2. A TCP/IP interface must be available between the HADR host machines, and a high-speed, high-
capacity network is recommended. The network bandwidth required for HADR log shipping between
the primary and the principle standby depends on the amount of logs generated in the primary per
second during peak time. The minimum bandwidth can be easily calculated using the following
paper: http://scn.sap.com/docs/DOC-56040

Note: The primary ships logs to the principle standby and all the active auxiliary standbys simultaneously.
Therefore, the required network bandwidth is multiplied by the number of active standbys. The HADR simulator
tool described in the following link can also be used to determine the maximum network shipping rate between
systems:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simula
tor

10
2.4 DB2 database requirements
1. The versions of the database systems for the primary and standby must be identical; for example,
both must be either 10.1 or 10.5.

2. During a rolling fix pack update, the modification level (for example, the fix pack level) of the
database system for the standby can be temporarily higher than that of the primary in order to test
the new level. Both databases should be on the same DB2 version and fix pack level for normal
operations.
For more information about software and hardware requirements, see the following links:

System requirements for DB2 high availability disaster recovery (HADR)


http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0011759.html
Installation and storage requirements for high availability disaster recovery (HADR)
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0012543.html
System requirements for IBM DB2 for Linux, UNIX, and Windows
http://www-01.ibm.com/support/docview.wss?uid=swg27038033

11
3 Preparation
For the purpose of this exercise, we will enable HADR for an existing SAP NetWeaver system with a
distributed installation.

3.1 Configuration of the test system


The test system has the following setup where the SAP Central Services Instance (ASCS) and the SAP
Primary Application Server (Central Instance) reside on host saplxvm06. The database instance resides on
host saplxvm07.
The goal is to enable the HADR feature by adding a standby database on host saplxvm08 (IP address
9.26.166.200). Furthermore, HADR will be configured for multiple standby databases by adding an
auxiliary standby on host saplxvm09 (IP address 9.26.166.201). By the end of this exercise the system
topology should look as shown in the following figure:

Figure 2: Topology of the HADR Test System

12
3.1.1 Hardware and operating system in the test systems
All database hosts are on separate hardware with identical configuration and on the same operating system
level.

Example:

For the test systems saplxvm07, saplxvm08, and saplxvm09, the OS and hardware configuration is
compared using /proc/meminfo, /proc/cpuinfo and /etc/SuSE-release files. Each system has 4
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz CPUs, 8 GB of RAM, and the following operating system:
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2

3.1.2 Required software downloads


1. DB2 10.5 for Linux, UNIX, and Windows
Download from SAP Service Marketplace at https://service.sap.com/swdc.
Refer to SAP Note 1851853 - DB6: Using DB2 10.5 with SAP Applications and SAP Note 1260217 -
DB6: Software Components Contained in DB2 License from SAP for supported new features. IBM
Tivoli SA MP for Linux is included in the DB2 10.5 for Linux, UNIX, and Windows image.

2. SAP NetWeaver 7.3 - Including Enhancement Package 1, Support Package Stack 09


Download from SAP Service Marketplace at https://service.sap.com/swdc. Find more information at
SAP Support Portal at http://help.sap.com/nw731.

3. SAP Software Provisioning Manager (SWPM) 1.0 SP04 or higher


Download from SAP Service Marketplace at http://service.sap.com/sltoolset -> Software Logistics
Toolset 1.0 -> Software Provisioning Manager. See SAP Note 1680045 - Release Note for Software
Provisioning Manager 1.0 for more information.

3.2 Basic network setup


Make sure that all hosts are able to communicate with each other via TCP/IP. Add the appropriate IP
addresses to the hostname mappings in the /etc/hosts file of each host.
Example:
Sample content of /etc/hosts file (on saplxvm06, saplxvm07, saplxvm08, and sapxlvm09):
saplxvm06:~ # cat /etc/hosts | grep i saplxvm
9.26.166.198 saplxvm06.torolab.ibm.com saplxvm06
9.26.166.199 saplxvm07.torolab.ibm.com saplxvm07
9.26.166.200 saplxvm08.torolab.ibm.com saplxvm08
9.26.166.201 saplxvm09.torolab.ibm.com saplxvm09

Adding static IP addresses to hostname mappings in the hosts file removes the systems DNS servers as a
single point of failure. In case of a DNS failure, the clustered systems can still resolve the addresses of the
other machines via the /etc/hosts file. From each host, ping all other hosts to check communication.

3.3 File system setup


DB2 HADR does not require shared storage devices. To avoid a single point of failure and data loss, the
primary and all the standbys should have their own storage. It is recommended to have the same file system
structure and storage devices for the primary and all the standbys. This reduces the probability of the
standby falling behind during log replay or log spooling. It is also recommended to have the database log
directory and database tablespace containers in separate file systems for each database.

13
3.4 Operating system users and groups
All SAP and DB2 related user IDs and group IDs from the primary and the SAP Application Server must also
be available and free to use in the standby server.
Example:
The following command is used to collect group IDs and user IDs on the hosts saplxvm07 and saplxvm06.
The same IDs will be used in the standby and auxiliary standby servers.
id ahaadm

Group Name Group ID (GID) Description


dbahaadm 401 SYSADM (system administrator) authority
dbahactl 402 SYSCTRL and SYSMON authority
dbahamnt 403 SYSMAINT and SYSMON authority
dbahamon 404 SYSMAINT and SYSMON authority
sapsys 390 SAP System Services group
sapinst 1000 Common group used by SWPM for all SAP system
users.
Table 2: Groups required for an SAP NetWeaver installation

User Name User ID (UID) Description


ahaadm 301 SAP system administrator to run the SAP Central
Services Instance
db2aha 302 SAP database administrator
sapaha 303 SAP ABAP database connect user
sapahadb 304 SAP JAVA database connect user
sapadm 305 The user sapadm is used for SAP Host Agent.
daaadm 306 SAP Diagnostics Agent administrator
Table 3: Users required for an SAP NetWeaver installation

14
4 Installing the standby
To perform the steps in this section, Section 3, must have been completed.

4.1 Exporting the file systems


The directories /sapmnt/<SID>/exe, /sapmnt/<SID>/profile, and /sapmnt/<SID>/global from
the SAP Application Server must be mounted on the standby host.
Example:
On the host saplxvm06, the following lines must be added to the /etc/exports file:

/sapmnt/AHA/exe saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

/sapmnt/AHA/global saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

/sapmnt/AHA/profile saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)
saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

The following commands must be executed on host saplxvm06 to confirm the export and allow access:

exportfs -a
exportfs

The following lines can be added to the /etc/fstab file on the standby host saplxvm08 so that the
directories are automatically remounted after a system restart:
saplxvm06:/sapmnt/AHA/exe /sapmnt/AHA/exe nfs defaults 0 0
saplxvm06:/sapmnt/AHA/global /sapmnt/AHA/global nfs defaults 0 0
saplxvm06:/sapmnt/AHA/profile /sapmnt/AHA/profile nfs defaults 0 0

The following command can be used to mount all directories mentioned in /etc/fstab.
mount -a

4.2 Turning on DB2 log archiving


DB2 requires log archiving to be turned on for the HADR setup. See the following link to the IBM Knowledge
Center for options to turn on log archiving:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0051344.html

Example:
saplxvm07:db2aha > db2 update db cfg for AHA using LOGARCHMETH1
DISK:/db2/AHA/log_archive/

After enabling log archiving, a complete offline backup must be taken in order to take the database out of the
backup pending state. To reduce production downtime during the offline backup, the backup can be split to
multiple files. Using the COMPRESS option during backup will increase the backup duration as it adds
compression time to the regular backup time.
Example:
saplxvm07:db2aha > db2 backup db aha to /db2/db2aha/backup, /db2/db2aha/backup,
/db2/db2aha/backup, /db2/db2aha/backup

15
4.3 Taking a backup of the primary
An online or offline backup of the primary database is required, and the backup image will be used to create
the standby.
Example:
saplxvm07:db2aha > db2 backup db aha online to
/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup

4.4 Performing a homogeneous system copy using SWPM


To create the standby, it is recommended to perform a homogeneous system copy (database copy method)
using SAP Software Provisioning Manager (SWPM). A custom installation must be selected to be able to
manually enter the same user IDs and group IDs used in the primary server. SWPM also provides a few DB2
HADR-specific installation options. Specifically, IBM Tivoli System Automation for Multiplatforms for DB2
must be selected with HADR (High Availability Disaster Recovery) as Cluster Type. Users are also given the
option to select HADR as synchronization mode and HADR local and remote service name (or port number)
configuration parameters.

Note: Synchronization mode corresponds to the DB2 database configuration parameter HADR_SYNCMODE. HADR
local and remote service name corresponds to the database configuration parameters HADR_LOCAL_SVC and
HADR_REMOTE_SVC respectively.

A homogeneous system copy creates all users, sets up the environment, installs the database software,
creates the instance on the standby, and then prompts the user to restore the database from a backup. This
is when the backup taken in step 4.3 from the primary server must be restored in the standby server.

Note: To set up HADR, the standby must be in rollforward pending state. Therefore, SWPM is no longer required for the
HADR setup and must be exited after restoring the database.

Example:
The following screens show SWPM HADR-related settings:

16
Figure 3: Start screen of SWPM

17
Figure 4: SWPM screen for IBM Tivoli System Automation for Multiplatforms (SA MP) for High
Availability installation options

Note: The cluster configuration file is used to create a cluster for the automatic failover, which will be explained later in
this document. In figure 4, the Generate cluster configuration files checkbox is not selected because SWPM will be
exited after restoring the database and you will not reach the step to create the cluster configuration file. For a new
installation of the primary, if this option is selected, the cluster configuration file (cluster_config.xml) will be
generated in the directory /tmp.

18
Figure 5: DB2 High Availablity Disaster Recovery options in SWPM

Note: The two port numbers will be assigned to the database configuration parameters HADR_LOCAL_SVC and
HADR_REMOTE_SVC. This can be changed later.

19
Figure 6: SWPM message window to restore database for homogeneous system copy

Note: As described earlier, at this stage, SWPM must be stopped by clicking the Cancel button.

20
Figure 7: Stop SWPM.

Note: As described earlier, SWPM is no longer needed for the HADR setup. Stop it by clicking the Stop button.

4.5 Configuring ports


The database connection ports in the /etc/services file must be the same as in the primary host
(saplxvm07).
Example:
sapdb2AHA 5912/tcp
AHA_HADR_1 5951/tcp # DB2 HADR log shipping
AHA_HADR_2 5952/tcp # DB2 HADR log shipping
sapmsAHA 3600/tcp # SAP System Message Server Port
DB2_db2aha 60006/tcp
DB2_db2aha_1 60007/tcp
DB2_db2aha_2 60008/tcp
DB2_db2aha_3 60009/tcp
DB2_db2aha_4 60010/tcp
DB2_db2aha_END 60011/tcp

Note: The port numbers configured for AHA_HADR_1 and AHA_HADR_2 are used for the primary (saplxmv07) and the
standby (saplxmv08) servers HADR local and remote service name database configuration parameters
(HADR_LOCAL_SVC, HADR_REMOTE_SVC). The same port number can be used for both parameters.

4.6 Restoring the database from a backup


Before restoring the database, all required file systems must be created or mounted on the standby host
(saplxvm08), matching those of the primary host (saplxvm07).

Example:
saplxvm07:db2aha 51> cd /db2/AHA
saplxvm07:db2aha 52> ls -lrt
total 32
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata4
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata3
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata2

21
drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata1
drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:45 db2aha
drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:46 log_dir
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 15:40 log_archive
drwxr-xr-x 27 db2aha dbahaadm 4096 Jul 4 14:00 db2dump

The following directories are created (or mounted) on the standby host with the same ownership and
permission as in the primary host above.
saplxvm08:db2aha 60> cd /db2/AHA
saplxvm08:db2aha 61> ls -lrt
total 32
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 db2aha
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata4
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata3
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata2
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata1
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_dir
drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_archive
drwxr-xr-x 4 db2aha dbahaadm 4096 Jun 13 14:22 db2dump

The backup taken in section 4.3 of this document is restored in the standby host using the following
command:
saplxvm08:db2aha 31> db2 restore db aha from
/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup
DB20000I The RESTORE DATABASE command completed successfully.

Note: DB2 HADR requires the standby to be in rollforward pending mode. Therefore, after restoring the database, it is not
necessary to execute the ROLLFORWARD DATABASE command.

Example:
saplxvm08:db2aha 23> db2 get db cfg for aha|grep -i Rollforward
Rollforward pending = DATABASE

Note: During the database instance installation on the standby (saplxvm08), the parameter DBHOST in SAP
DEFAULT.PFL was changed to the host name of the standby host (saplxvm08). You should change the value of
SAPDBHOST and j2ee/dbhost to a virtual host name later on when setting up the database virtual host. For
now, it should be changed to the primary host (saplxvm07).

Example:
saplxvm08:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL
SAPDBHOST = saplxvm07
j2ee/dbhost = saplxvm07

4.7 Configuring databases for HADR


To enable HADR, update the database manager configuration parameters for both the primary and the
standby as shown in the following examples:
Example:
On host saplxvm07, as user db2aha, the following sample script is executed:
saplxvm07:db2aha 11> cat primary_hadr_cfg.sql
UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm07;
UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_1;

22
UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm08;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_2;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;
UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;
UPDATE DB CFG FOR AHA USING HADR_SYNCMODE NEARSYNC;
UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;
UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;
UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm07:db2aha 52> db2 -z primary_hadr_cfg.sql.log -tvf primary_hadr_cfg.sql

saplxvm07:db2aha 55> db2 get db cfg for aha | grep HADR


HADR database role = PRIMARY
HADR local host name (HADR_LOCAL_HOST) = saplxvm07
HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_1
HADR remote host name (HADR_REMOTE_HOST) = saplxvm08
HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_2
HADR instance name of remote server (HADR_REMOTE_INST) = db2aha
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) =
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: Port numbers for the parameters AHA_HADR_1 and AHA_HADR_2 are defined in the /etc/services file (see
section 4.5). Actual port numbers (5951 and 5952) can also be used instead of these.

On host saplxvm08, as user db2aha, the following script is executed:


saplxvm08:db2aha 25> cat standby_hadr_cfg.sql
UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm08;
UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_2;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm07;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_1;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;
UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;
UPDATE DB CFG FOR AHA USING HADR_SYNCMODE NEARSYNC;
UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;
UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;
UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm08:db2aha 26> db2 z standby_hadr_cfg.sql.out -tvf standby_hadr_cfg.sql

saplxvm08:db2aha 53> db2 get db cfg for aha | grep HADR


HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = saplxvm08
HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_2
HADR remote host name (HADR_REMOTE_HOST) = saplxvm07
HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1
HADR instance name of remote server (HADR_REMOTE_INST) = db2aha
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) =
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

23
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Both the primary and the standby must be deactivated and reactivated for the changes to take effect. This
will require a production system downtime.

4.8 Performing HADR checks


Before starting HADR, check the following:

1. The /etc/services file on both the standby and primary host contains the same port numbers.

Example:
saplxvm07:~ # cat /etc/services | grep -i aha
sapdb2AHA 5912/tcp
AHA_HADR_1 5951/tcp # DB2 HADR log shipping
AHA_HADR_2 5952/tcp # DB2 HADR log shipping
sapmsAHA 3600/tcp # SAP System Message Server Port
DB2_db2aha 60006/tcp
DB2_db2aha_1 60007/tcp
DB2_db2aha_2 60008/tcp
DB2_db2aha_3 60009/tcp
DB2_db2aha_4 60010/tcp
DB2_db2aha_END 60011/tcp

2. The DB2 licenses on both the primary and the standby are valid and not a trial license. Use the
db2licm -l command to verify. SA MP is not supported for DB2 temporary licenses. Apply a
valid license using the db2licm a <license file name> command as user db2aha.

3. The database manager configuration Parameter SVCENAME is defined as sapdb2<SID> on both the
primary and the standby hosts.

Example:
saplxvm08:db2aha 23> db2 get dbm cfg | grep -i svcename
TCP/IP Service name (SVCENAME) = sapdb2AHA

4. User sap<SID> is able to connect to the database using a valid password.

Example:
saplxvm07:db2aha 59> db2 connect to aha user sapaha using ******
Database Connection Information
Database server = DB2/LINUXX8664 10.5.0
SQL authorization ID = SAPAHA
Local database alias = AHA

4.9 Starting HADR


HADR must be started on the standby first and then on the primary using the START HADR command.
Example:
On the standby host saplxvm08:
saplxvm08:db2aha 35> db2 deactivate db aha
DB20000I The DEACTIVATE DATABASE command completed successfully.
saplxvm08:db2aha 36> db2 start hadr on db aha as standby

24
DB20000I The START HADR ON DATABASE command completed successfully.

On the primary host saplxvm07:


saplxvm07:db2aha 51> db2 deactivate db aha
DB20000I The DEACTIVATE DATABASE command completed successfully.
saplxvm07:db2aha 52> db2 start hadr on db aha as primary
DB20000I The START HADR ON DATABASE command completed successfully.

HADR is now enabled and the standby will begin to replay the logs to catch up to the primary.

4.10 Checking the HADR status using the db2pd tool


The HADR status can be checked using the db2pd tool. The following example shows an output of db2pd
from the primary.
Example:
saplxvm07:db2aha 75> db2pd -d AHA -HADR

Database Member 0 -- Database AHA -- Active -- Up 0 days 09:50:28 -- Date 2014-05-22-


00.38.20.793285

HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 05/21/2014 14:47:57.497164 (1400698077)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 3
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000056
LOG_HADR_WAIT_ACCUMULATED(seconds) = 1.464
LOG_HADR_WAIT_COUNT = 36460
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000007.LOG, 4393, 2666673961
STANDBY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 05/22/2014 00:38:17.000000 (1400733497)
STANDBY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)
STANDBY_REPLAY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000

25
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 05/22/2014 00:42:17.000000 (1400733737)
READS_ON_STANDBY_ENABLED = N

Refer to the following IBM Knowledge Center page for more details on the above values:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0011729.html

26
5. Enabling automatic failover using SA MP
The steps in section 4 describe how to enable HADR, but not automatic failover. Therefore, if the primary
goes down, a manual takeover operation must be performed from the standby. All applications must be
manually redirected to the standby, the new primary. This can be done by changing the db2cli.ini and
SAP profile (see section 5.2.4).
To enable automatic failover, SA MP must be installed and configured on both the primary and the standby
hosts.

5.1 Installing the SA MP software and license


SWPM provides the option to install SA MP. SA MP must be installed on the primary and the standby hosts.
Example:
The following steps need to be performed to install and configure SA MP:
1. Check the DB2 image that can be downloaded from SAP Service Marketplace for the included SA
MP software.
saplxvm07: cd <DB2-DVD-Mount-Point>/LINUXX86_64/ESE/disk1/db2/linuxamd64/tsamp/
saplxvm07: ls
db2cktsa installSAM integration license Linux prereqSAM README uninstallSAM

2. Install SA MP.
saplxvm07: ./prereqSAM
prereqSAM: All prerequisites for the ITSAMP installation are met on operating
system
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
saplxvm07: ./installSAM --noliccheck

3. Install the SA MP license as described in SAP Note 816773.


saplxvm07: samlicm -i ~/sam32.lic
saplxvm07: samlicm -s
Product: IBM Tivoli System Automation for Multiplatforms 3.2
Creation date: Wed 19 Aug 2009 12:00:01 AM EDT
Expiration date: Thu 31 Dec 2037 12:00:01 AM EST

4. Install the HA scripts on the primary and the standby by running the db2cptsa command.

saplxvm07:/db2/db2aha/db2_software/install/tsamp # ./db2cptsa
DBI1110I The DB2 High Availability (HA) scripts for the IBM Tivoli
System Automation for Multiplatforms (SA MP) were successfully
updated in /usr/sbin/rsct/sapolicies/db2.

Explanation:

You need DB2 HA scripts to use SA MP with the DB2 HA feature.

These DB2 HA scripts are located at /usr/sbin/rsct/sapolicies/db2. The


DB2 installer detects whether these DB2 HA scripts need to be installed
or updated.

The DB2 installer successfully updated the DB2 HA scripts.

27
User response:

No action is required.

5.2 Setting up the HADR cluster


SAP provides a tool called Cluster Setup Tool to easily set up the HADR cluster for automatic failover. The
tool is included in the DB2 media provided by SAP and is located in <DB2-DVD-Mount-
Point>/LINUXX86_64/SA_MP/scripts/. Instructions for the tool are provided in <DB2-DVD-Mount-
Point>/LINUXX86_64/DB6_SAMP_InstGuide.pdf.
The latest version of the Cluster Setup Tool can be downloaded from SAP Note 960843 - DB6: High
Availability for DB2 using SA MP: http://service.sap.com/sap/support/notes/960843

Example:
saplxvm07:~ # tar -xzvf samp_scripts_633_20140317.tgz
samp_scripts/
samp_scripts/sampdbcmd
samp_scripts/startdb
samp_scripts/startj2eedb
samp_scripts/stopdb
samp_scripts/stopj2eedb
samp_scripts/sapdb2cluster.sh

The sapdb2cluster.sh script is used to set up and create the cluster using the configuration file with all the
required parameters. The script has the following options:
1. Create, Show or Edit Database Configuration
2. Create Database Cluster
3. Show Database Cluster State
4. Delete Database Cluster

For further information, refer to the SAP installation guide IBM DB2 High Availability Solution: IBM Tivoli
System Automation for Multiplatforms that can be found on SAP Service Marketplace at
http://service.sap.com/instguidesnw.

5.2.1 Creating the cluster configuration file


The sapdb2cluster.sh script must be run as root user from the primary or the standby host.
Example:
saplxvm07:~/samp_scripts # ./sapdb2cluster.sh -l sapdb2cluster.log -f
sapdb2cluster.conf

By default, the configuration information is saved in the sapdb2cluster.conf file and the log is saved in the
sapdb2cluster.log file.
Select option 1 - Create, Show or Edit Database Configuration which displays values from the current
configuration file if it exists in the current directory or prompts you for new values.
Example:
General Cluster Configuration

[1] SAP_SID = AHA


[2] SAP_CI_HOSTNAME = saplxvm06
[3] SAP_CI_INST_NR =

28
[4] TSA_DOMAIN_NAME = sap_ahadb2
[5] TSA_TIEBREAKER_IP_ADDRESS = 9.26.166.1
[6] TSA_DISK_HEARTBEAT = [OFF]
[7] TSA_REMOTE_CMD = ssh

Database Cluster Configuration

[8] DB2_HOSTNAME_LIST = saplxvm07


saplxvm08
[9] DB2_CLUSTER_TYPE = HADR
[10] DB2_INST_DIR = /db2/db2aha/db2_software
[11] DB2_DB2INSTANCE = db2aha
[12] DB2_HA_HOSTNAME = saplxvmsap
[13] DB2_HA_IP_ADDRESS = 9.26.166.97
[14] DB2_NETWORK_INTERFACE_LIST = eth0:9.26.166.199:255.255.254.0:saplxvm07
eth0:9.26.166.200:255.255.254.0:saplxvm08
[15] DB2_HA_IP_MASK = 255.255.254.0
[16] TSA_LICENSE_FILE = /root/sam32.lic
[17] TSA_USER_LIST = ahaadm
db2aha
[18] DB2_HADR_SYNC_MODE = NEARSYNC
[19] DB2_HADR_PORTS = AHA_HADR_1:5951/tcp
AHA_HADR_2:5952/tcp
[20] TSA_USER_GROUP_NAME = sagrp
[21] TSA_USER_GROUP_ID = 222
[22] DB2_COMM_PORTS = DB2_db2aha:60006/tcp
DB2_db2aha_1:60007/tcp
DB2_db2aha_2:60008/tcp
DB2_db2aha_3:60009/tcp
DB2_db2aha_4:60010/tcp
DB2_db2aha_END:60011/tcp
sapdb2AHA:5912/tcp
[23] DB2_GROUP_LIST = dbahaadm:401:true
dbahactl:402:true
dbahamnt:403:true
dbahamon:404:true
sapsys:390:true
sapinst:1000:true
[24] DB2_USER_LIST =
ahaadm:301:/home/ahaadm:true:/bin/csh:true:sapsys:sapinst:dbahactl
db2aha:302:/db2/db2aha:false:/bin/csh:true:dbahaadm:sapinst
sapaha:303:/home/sapaha:true:/bin/csh:true:dbahamon:dialout:video
sapadm:305:/home/sapadm:true:/bin/false:true:sapsys:dialout:video

Edit Database configuration


Press Enter to Exit or select a number to edit a parameter (e.g. 1 for SAP_SID):

The parameter TSA_DISK_HEARTBEAT enables SA MP Disk Heartbeat and is defined by the accessibility of
the raw disks, logical volumes (LVID), multipath devices (MPATH), or physical volumes (PVID). This allows
TSA to distinguish between a network failure and a node failure. Refer to the following link for more
information: http://www-
01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/sampugdiskheartbeat.html?cp
=SSRM2X_4.1.0%2F0-4-5-1-1&lang=en
The parameter DB2_HOSTNAME_LIST takes the primary host name and the standby host name separated by
comma. DB2_HA_HOSTNAME is used to assign the virtual host name. The virtual host name must not be in use.

It is also required to supply a valid SA MP license location. In our example, the license file /root/sam32.lic

29
is assigned to variable TSA_LICENSE_FILE.

The remaining parameters should be assigned automatically from the system.

5.2.2 Creating the database cluster


Once the cluster configuration file has been generated, the cluster can be created by using option 2 - Create
Database Cluster. Logs are reported in the file sapdb2cluster.log.
Example:
Read configuration file
Check general configuration
Check database cluster configuration
Create cluster domain and nodes
Create SA MP domain
Check for SA MP software on cluster nodes
Check for SA MP on node saplxvm07 : OK
Check for SA MP on node saplxvm08 : OK
Prepare cluster nodes : OK
Create cluster domain sap_ahadb2 : OK
Create database cluster resources
Create database cluster (HADR)
Check DB2 cluster : OK
Prepare DB2 cluster
Grant SA MP control to ahaadm,db2aha on all nodes
Create user group sagrp on node saplxvm07 : OK
Grant SA MP control on node saplxvm07 : OK
Create user group sagrp on node saplxvm08 : OK
Grant SA MP control on node saplxvm08 : OK
Replicate DB2 HADR ports : OK
Disable DB2 Fault Monitor : OK
Configuring SAP for HA DB2 cluster
Modifying User Environment : OK
Replace startdb/stopdb scripts : OK
Update SAP profiles with virtual database hostname : OK
Update thin client configuration with virtual database hostname : OK
Setup HADR for DB2 cluster
Start database servers
Start database server on node saplxvm07 : OK
Start database server on node saplxvm08 : OK
Configure DB2 servers for HADR
Check HADR database roles for DB2 cluster : OK
Configure database server saplxvm07 for HADR : OK
Configure database server saplxvm08 for HADR : OK
Start database server saplxvm08 as STANDBY : OK
Start database server saplxvm07 as PRIMARY : OK
Activate databases
Activate database AHA on node saplxvm08 : OK
Activate database AHA on node saplxvm07 : OK
Wait for HADR cluster to become Peer state : OK
Check database cluster configuration : OK
Generate db2haicu configuration file (/tmp/cluster_config.xml) : OK
Execute db2haicu
Copying config file to cluster node saplxvm07 : OK
Copying config file to cluster node saplxvm08 : OK
Removing virtual IP address from cluster nodes : OK
Executing db2haicu at node saplxvm08 : OK

30
Executing db2haicu at node saplxvm07 : OK

Action finished. Press Enter to continue ...

The above output also describes the steps performed during cluster creation. The script configures both
systems for HADR and generates the /tmp/cluster_config.xml configuration file on both systems. The
script then uses the configuration file to execute the db2haicu command on both the primary and the standby
to create the cluster. The output of the db2haicu command is stored in /tmp/cluster_config.log by the
script.
Example:
The following is an example of the cluster_config.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<DB2Cluster xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="db2ha.xsd"
clusterManagerName="TSA" version="2.2">
<ClusterDomain domainName="sap_ahadb2">
<Quorum quorumDeviceProtocol="network" quorumDeviceName="9.26.166.1"/>
<PhysicalNetwork physicalNetworkName="db2network" physicalNetworkProtocol="ip">
<Interface interfaceName="eth0" clusterNodeName="saplxvm07">
<IPAddress baseAddress="9.26.166.199" subnetMask="255.255.254.0"
networkName="db2network"/>
</Interface>
<Interface interfaceName="eth0" clusterNodeName="saplxvm08">
<IPAddress baseAddress="9.26.166.200" subnetMask="255.255.254.0"
networkName="db2network"/>
</Interface>
<LogicalSubnet baseAddress="9.26.166.0" subnetMask="255.255.254.0"
networkName="db2network"/>
</PhysicalNetwork>
<ClusterNode clusterNodeName="saplxvm07"/>
<ClusterNode clusterNodeName="saplxvm08"/>
</ClusterDomain>
<FailoverPolicy>
<HADRFailover />
</FailoverPolicy>
<DB2PartitionSet>
<DB2Partition dbpartitionnum="0" instanceName="db2aha" />
</DB2PartitionSet>
<HADRDBSet>
<HADRDB databaseName="AHA"
localInstance="db2aha"
remoteInstance="db2aha"
localHost="saplxvm07"
remoteHost="saplxvm08" />
<VirtualIPAddress baseAddress="9.26.166.97" subnetMask="255.255.254.0"
networkName="db2network" />
</HADRDBSet>
</DB2Cluster>

5.2.3 Displaying the database cluster


Option 3 - Show Database Cluster State will display the cluster status.
Example:

31
Figure 8: lssam output of SA MP cluster configuration for HADR

Note: With option 3, the lssam command is executed by the script to collect the status output.

The following terminologies help describe the lssam output:

Term Description

Peer Domain A cluster of servers or nodes


Resource Fixed or floating hardware or software
Resource A virtual group of resources
Group
Equivalency A fixed set of resources of the same class that provide the same functionality
Nominal State The desired state of a resource. It can be online or offline. If changed, SA MP will bring
a resource online or shut it down.
The cluster consists of three resource groups and three equivalency resource groups with the same
functionalities. Equivalency resource groups allow SA MP to select any resource with the same functionality
to perform an operation in case of a failure. The following table is an example of resource groups and their
equivalency resource groups.
Example:

Resource Group Description

db2_db2aha_db2aha_AHA-rg The database instance resource group consists of the


primary and the standby instances, the application resource
db2_db2aha_db2aha_AHA, and the service IP resource
db2ip_9_26_166_97.
db2_db2aha_saplxvm07_0-rg Database instance resource group for the host saplxvm07
db2_db2aha_saplxvm08_0-rg Database instance resource group for the host saplxvm08

32
db2_db2aha_db2aha_AHA-rg_group-equ Equivalency database instance resource group equivalent to
db2_db2aha_db2aha_AHA-rg
db2_db2aha_saplxvm07_0-rg_group- Equivalency database instance resource group for the host
equ saplxvm07
db2_db2aha_saplxvm08_0-rg_group- Equivalency database instance resource group for the host
equ saplxvm08

In the above lssam output, the Online IBM.ResourceGroup:db2_db2aha_db2aha_AHA-rg


Nominal=Online entry means that the resource group db2_db2aha_db2aha_AHA-rg is online and its
nominal state is also online. Note that for the resource group db2_db2aha_db2aha_AHA-rg, both the
standby and the primary resource applications use the same service IP, but only one of them is online. The
lsrpdomain and the lsrpnode commands can be used to check the domain status.
Example:
saplxvm07:~/samp_scripts # lsrpdomain
Name OpState RSCTActiveVersion MixedVersions TSPort GSPort
sap_ahadb2 Online 3.1.4.4 No 12347 12348
saplxvm07:~/samp_scripts # lsrpnode
Name OpState RSCTVersion
saplxvm07 Online 3.1.4.4
saplxvm08 Online 3.1.4.4

Use ifconfig -a on the primary host to check that the virtual IP address is linked to the primary host IP
address.
Example:
saplxvm07:~/samp_scripts # ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:29:1A:98:28
inet addr:9.26.166.199 Bcast:9.26.167.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:166592 errors:0 dropped:581 overruns:0 frame:0
TX packets:106491 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:169945146 (162.0 Mb) TX bytes:18069161 (17.2 Mb)

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:1A:98:28


inet addr:9.26.166.97 Bcast:9.26.167.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

The link is defined by eth0:0 in the sample output above where 9.26.166.97 is the virtual IP address. The
standby host IP configuration remains unchanged.

Note: During a failover scenario, this IP link is removed and a new link between the standby host and the virtual IP is
created in the standby host.

For more information on SA MP clusters and resources, refer to


https://www.ibm.com/developerworks/tivoli/library/tv-tivoli-system-automation/

33
5.2.4. Enabling the SAP system with virtual database host name and IP address
This step has already been completed by sapdb2cluster.sh.
On saplxvm06, the SAPDBHOST and j2ee/dbhost in the default profile and hostname in the
db2cli.ini file are replaced by the virtual host name. In the following example, the virtual host name is
saplxvmsap:
saplxvm06:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL
SAPDBHOST = saplxvmsap
j2ee/dbhost = saplxvmsap
saplxvm06:~ # cat /usr/sap/AHA/SYS/global/db6/db2cli.ini
; Comment lines start with a semi-colon.
[AHA]
Database=AHA
protocol=tcpip
hostname=saplxvmsap
servicename=5912
[COMMON]
diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

Once the database cluster is created, all connections to the database must be refreshed to pick up the
change. The virtual host (saplxvmsap) is reflected as the database host in the SAP system.

Figure 9: Dashboard screen in the DBA Cockpit (Web Dynpro user interface)

5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT)
The micro-outage feature of SAP on IBM DB2 for LUW can be used for pausing SAP applications for a short
period to perform a controlled failover without having to stop any SAP ABAP application servers. This allows
administrators to perform certain database maintenance without any significant downtime. The GMT makes

34
optimal use of the micro-outage feature and provides an easy way to perform a controlled, graceful HADR
failover.
The GMT can be downloaded from SAP Note 1530812 - DB6: Graceful Maintenance Tool and can be used
from the primary or the standby host as root user. The GMT requires the ABAP routines attached to SAP
Note 1907533 - ABAP Routines for Graceful Maintenance Tool (GMT) and SAP Note 1443426 - DB6:
Graceful Cluster Switch.
Example:
saplxvm07:~/samp_scripts # tar xzvf gmt_scripts_633_20140317.tgz
gmt_scripts/
gmt_scripts/exitDB2Restart.sh
gmt_scripts/exitFPActivate.sh
gmt_scripts/exitNoOp.sh
gmt_scripts/exitResumeBtcExternal.sh
gmt_scripts/exitSuspendBtcExternal.sh
gmt_scripts/sapdb2gmt.sh

The script (sapdb2gmt.sh) offers the following three self-explanatory options:


1. Create, Show or Edit GMT Configuration
2. Check Graceful Prerequisites
3. Init Graceful Maintenance Mode
Example:
saplxvm07:~/samp_scripts/gmt_scripts # ./sapdb2gmt.sh -l sapdb2gmt.log -f
sapdb2gmt.conf
./sapdb2gmt.sh version 6.33 started on Fri May 23 14:58:13 EDT 2014
---Graceful Maintenance Tool (GMT) for SAP running on DB2 LUW (Version 6.33)---

1 - Create, Show or Edit GMT Configuration


2 - Check Graceful Prerequisites
3 - Init Graceful Maintenance Mode

e - Exit

Input:

5.3.1 GMT Configuration


The Graceful Maintenance Tool needs to be configured first before it can be used. Option 1 is used to
configure and generate the configuration file (sapdb2gmt.conf). Logs are stored in the sapdb2gmt.log.
Example:
Input: 1

Show GMT configuration

General System Configuration

[1] SAP_SID = AHA


[2] TSA_REMOTE_CMD = ssh

Database Configuration

[3] DB2_INST_DIR = /db2/db2aha/db2_software


[4] DB2_DB2INSTANCE = db2aha

35
Database Graceful Maintenance Tool

[5] DB2_GMT_TIMEOUT = 100


[6] DB2_GMT_CMD = CLUSTER_FAILOVER
[7] DB2_GMT_AS_HOST = saplxvm06
[8] DB2_GMT_AS_NR = 1
[9] DB2_GMT_CLIENT = 001
[10] DB2_GMT_USER = DDIC

Edit GMT configuration

Press Enter to Exit or select a number to edit a parameter (e.g. 1 for SAP_SID):

SAP RFC Configuration Parameters

Parameter Description Value (example)

DB2_GMT_AS_HOST Host name of the SAP primary application server saplxvm06


DB2_GMT_AS_NR Instance number of the SAP primary application 1
server
DB2_GMT_USER User for RFC calls DDIC
DB2_GMT_CLIENT Client to use for RFC calls 001

After the configuration has been completed, Option 2 can be used to check all prerequisites for micro failover
using the GMT.

5.3.2 Micro-failover test


Option 3 of GMT (3 - Init Graceful Maintenance Mode) initiates a DB2 HADR cluster failover.
Example:
Read configuration file
Check general configuration
Check database configuration : OK

Starting graceful maintenance mode


Enter Parameter for Graceful Maintenance Mode

[1] SAP_SID = AHA


[2] TSA_REMOTE_CMD = ssh
[3] DB2_GMT_TIMEOUT = 100
[4] DB2_GMT_SAP_BTC_GRACE_PERIOD = 60
[5] DB2_GMT_CMD = CLUSTER_FAILOVER
[6] DB2_GMT_SAP_COMM_MODE = SAPEVT
[7] DB2_GMT_SAP_EVENT_ACTIVATE = SAP_DBA_GMT_ACTIVATE
[8] DB2_GMT_SAP_EVENT_BTC_SUSPEND = SAP_DBA_GMT_SUSPEND_BATCH_JOBS
[9] DB2_GMT_SAP_EVENT_BTC_RESUME = SAP_DBA_GMT_RESUME_BATCH_JOBS
[10] DB2_GMT_SAP_SCRIPT_BTC_SUSPEND =
[11] DB2_GMT_SAP_SCRIPT_BTC_RESUME =

Check Cluster Prerequisites


Checking DB2 HADR Peer State
Checking DB2 HADR Peer State : OK
Checking DB2 HADR Peer State : OK
Check General Prerequisites

36
Clean quiesce file : OK
Checking SAP DBSL feature prerequisites: : OK
Checking for transactions running for more than 60 seconds : WARNING

*** WARNING: Found Long Running Transactions (Tue Jul 22 11:10:53 EDT 2014) ***
COMMENT APPL_NAME AGENT_ID UOW_START_TIME STATUS RUNTIME
SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_RE
---------- ------------ -------- ------------------- ------------ ----------- ---
--------- -------------------- ----------- ------ --------------
LONGRUNNER DB2ATS 3354 2014-07-22-11.00.00 LOCKWAIT 652
DB2AHA saplxvm07 -Task: -
1 record(s) selected.

WARNING: Long running transactions are active which might be canceled (rolled
back)!
*******************************************************************************

Checking SAP Stack Type : OK


Checking for Java Connections : OK
Checking database connection as db2aha : OK
Checking R3trans connection as ahaadm : OK
Checking ABAP functions (SAP Note 1907533) : OK

Warnings occurred. Do you want to proceed with graceful maintenance? [Yes|No]:


Yes

The tool displays a list of active transactions and asks for confirmation to proceed.
If Yes is selected, the script proceeds and waits at the step, Waiting for the Quiesce file
(current: 12 s; timeout: 65 s) until the SAP applications are paused.
Example:
Warnings occurred. Do you want to proceed with graceful maintenance? [Yes|No]:
Yes

Suspend SAP batch jobs


Create event SAP_DBA_GMT_SUSPEND_BATCH_JOBS via sapevt : OK
Suspend external batch jobs via exit script : OK [skipped]
Waiting grace period (31 s)

Note: This requires downtime. All connections to the database will be closed.

After automatically creating the quiesce file


/usr/sap/AHA/SYS/global/db6_dbsl_quiesce_def_connections, the script displays a list of
database connections and gives you the option to either forcefully close connections and roll back all
transactions, or to wait for transactions to be completed and connections to be closed from application side.
After the cluster switch, the standby, which is in host saplxvm08, becomes the new primary, while the
primary, in host saplxvm07, becomes the new standby.
Example:
Waiting grace period (0 s) : OK
Enable micro outage feature of DBSL
Create event SAP_DBA_GMT_ACTIVATE via sapevt : OK
Waiting for quiesce file (current: 3 s; timeout: 65 s) : OK
Closing database connections (9) : OK [skipped]

37
************** Open Connections (Tue Jul 22 11:46:10 EDT 2014) ***************
APPL_NAME AGENT_ID AUTHID UOW_START_TIME STATUS
SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_REPORT
-------------------- -------- ---------- ------------------- -------------------- -
----------- -------------------- ----------- ------------------------------
DB2ATS 3354 DB2AHA 2014-07-22-11.00.00 LOCKWAIT
DB2AHA saplxvm07 -Task: -
dw.sapAHA_DVEBMGS01 27 SAPAHA 2014-07-22-11.44.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 28 SAPAHA 2014-07-22-11.42.27 UOWWAIT
SAPSYS saplxvm06 SPO SAPLSPOA
dw.sapAHA_DVEBMGS01 29 SAPAHA 2014-07-22-11.43.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 32 SAPAHA 2014-07-22-11.44.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 34 SAPAHA 2014-07-22-00.01.03 UOWWAIT
DDIC saplxvm06 BTC SAPMSSY2
dw.sapAHA_DVEBMGS01 35 SAPAHA 2014-07-22-11.43.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 37 SAPAHA 2014-07-22-11.44.10 UOWWAIT
SAPSYS saplxvm06 DIA CL_ABSTRACT_SAML_PROTOCOL=====
dw.sapAHA_DVEBMGS01 39 SAPAHA 2014-07-22-11.39.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
9 record(s) selected.
*******************************************************************************

Do you want to wait 60 seconds longer? [Yes|No]: Yes


Total Timeout is now 240 (Maximum allowed 400)

Closing database connections (6)

If all connections cannot be closed within 400 seconds, the tool will remind the user again to wait or to
continue:
Example:
Total Timeout is now 240 (Maximum allowed 400)

Closing database connections (5) : OK [skipped]


\n ************** Open Connections (Tue Jul 22 11:48:30 EDT 2014) ***************
APPL_NAME AGENT_ID AUTHID UOW_START_TIME STATUS
SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_REPORT
-------------------- -------- ---------- ------------------- -------------------- -
----------- -------------------- ----------- ------------------------------
DB2ATS 3354 DB2AHA 2014-07-22-11.00.00 LOCKWAIT
DB2AHA saplxvm07 -Task: -
dw.sapAHA_DVEBMGS01 32 SAPAHA 2014-07-22-11.44.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 34 SAPAHA 2014-07-22-00.01.03 UOWWAIT
DDIC saplxvm06 BTC SAPMSSY2
dw.sapAHA_DVEBMGS01 35 SAPAHA 2014-07-22-11.43.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
dw.sapAHA_DVEBMGS01 37 SAPAHA 2014-07-22-11.44.10 UOWWAIT
SAPSYS saplxvm06 DIA CL_ABSTRACT_SAML_PROTOCOL=====
dw.sapAHA_DVEBMGS01 39 SAPAHA 2014-07-22-11.39.27 UOWWAIT
SAPSYS saplxvm06 DIA SAPMSSY2
6 record(s) selected.
*******************************************************************************

38
Do you want to wait 60 seconds longer? [Yes|No]: No

If No is selected, the tool will continue closing connections and prompt if the transactions can be forcefully
rolled back.
Example:
Do you want to wait 60 seconds longer? [Yes|No]: No

Do you want to force applications (rollback of running transactions)?


[Yes|No|Continue]: Yes

Execute DB2 QUIESCE for AHA: : OK


Waiting for DB2 QUIESCE (current: 1 s; timeout: 10 s; connections: 0): OK
Execute Database Cluster Failover (HADR) : OK
Execute DB2 UNQUIESCE: : OK
Disable micro outage feature of DBSL : OK
Clean quiesce file : OK
Checking database connection as db2aha : OK
Checking R3trans connection as ahaadm : OK
Resume SAP batch jobs
Create event SAP_DBA_GMT_RESUME_BATCH_JOBS via sapevt : OK
Resume external batch jobs via exit script : OK [skipped]

Graceful Maintenance Mode Start : Tue Jul 22 11:42:24 EDT 2014


Graceful Maintenance Mode End : Tue Jul 22 11:50:05 EDT 2014

Action finished. Press Enter to continue ...

The cluster should reflect the changes and can be displayed using the lssam command.
Example:
saplxvm07:ahaadm 5> lssam
Online IBM.ResourceGroup:db2_db2aha_db2aha_AHA-rg Nominal=Online
|- Online IBM.Application:db2_db2aha_db2aha_AHA-rs
|- Offline IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm07
'- Online IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm08
'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs
|- Offline IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm07
'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm08
Online IBM.ResourceGroup:db2_db2aha_saplxvm07_0-rg Nominal=Online
'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs
'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs:saplxvm07
Online IBM.ResourceGroup:db2_db2aha_saplxvm08_0-rg Nominal=Online
'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs
'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs:saplxvm08
Online IBM.Equivalency:db2_db2aha_db2aha_AHA-rg_group-equ
|- Online IBM.PeerNode:saplxvm08:saplxvm08
'- Online IBM.PeerNode:saplxvm07:saplxvm07
Online IBM.Equivalency:db2_db2aha_saplxvm07_0-rg_group-equ
'- Online IBM.PeerNode:saplxvm07:saplxvm07
Online IBM.Equivalency:db2_db2aha_saplxvm08_0-rg_group-equ
'- Online IBM.PeerNode:saplxvm08:saplxvm08
Online IBM.Equivalency:db2network

39
|- Online IBM.NetworkInterface:eth0:saplxvm08
'- Online IBM.NetworkInterface:eth0:saplxvm07

The virtual IP link from the old primary is removed and a new link is created in the new primary host
(saplxvm08).
saplxvm08:~ # ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:76:2C:2C
inet addr:9.26.166.97 Bcast:9.26.167.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

During the cluster switch, the changes can be monitored live using the lssam -top command as user root
from any of the hosts.

5.3.4 Testing a disaster scenario


If SA MP fails to connect to the primary, it will first try to start the primary before failing over to the standby.
Therefore, force stopping the primary will not cause an automatic failover. An automatic failover will be
triggered if the primary host is unplugged or the OS is stopped. The following example shows how to test a
disaster scenario and trigger an automatic failover. In the example, the main DB2 engine process, db2sysc,
will be renamed to simulate SA MP being unable to restart DB2 on the primary, causing SA MP to initiate a
failover. This is similar to a kernel panic that will kill the db2sysc process, but the host is still available.
Example:
1. From a command line window, lssam -top is issued to monitor the cluster status live.
2. To bring down the primary without shutting down the server, the
/db2/db2aha/sqllib/adm/db2sysc file is renamed to /db2/db2aha/sqllib/adm/db2sysc_backup
and the db2sysc process is killed on the primary host. The rename is necessary, otherwise SA MP
will just restart DB2 on the primary and a failover will not be triggered.
saplxvm07:db2aha 54> mv /db2/db2aha/sqllib/adm/db2sysc
/db2/db2aha/sqllib/adm/db2sysc_backup
saplxvm07:db2aha 55> db2_kill -9 db2sysc
Application ipclean: Removing DB2 engine and client IPC resources for
db2aha.

3. The changes are reflected live in the lssam output:

40
Figure 10: Cluster status after the primary is down (lssam output)
After a successful takeover, the primary (saplxvm07) will switch roles with the standby (saplxvm08). Once
the test is complete, the /db2/db2aha/sqllib/adm/db2sysc_backup must be moved back to
/db2/db2aha/sqllib/adm/db2sysc. Once the file is moved back, the old primary (saplxvm07) will be
automatically brought back up and activated as the new standby. It may take several minutes. The status
can be monitored using the lssam -top command.

Figure 11: Cluster status after a successful takeover (lssam output)

Note: Moving the db2sysc file is only performed to simulate a disaster scenario and is not recommended.

41
6. Installing the auxiliary standby database instance
As described earlier, the auxiliary standby is for DR purposes only and should be used to protect data from
wide spread disasters. Adding an auxiliary standby is similar to adding a principal standby except for minor
changes to the DB2 database configuration. The following sections show how to add the first auxiliary
standby.

6.1 Mounting file systems


The directories /sapmnt/<SID>/exe, /sapmnt/<SID>/profile, and /sapmnt/<SID>/global from the SAP
application server must be mounted on the auxiliary standby host.

Example:
saplxvm09:~ # mount | grep AHA
saplxvm06:/sapmnt/AHA/exe on /sapmnt/AHA/exe type nfs (rw,addr=9.26.166.198)
saplxvm06:/sapmnt/AHA/global on /sapmnt/AHA/global type nfs (rw,addr=9.26.166.198)
saplxvm06:/sapmnt/AHA/profile on /sapmnt/AHA/profile type nfs (rw,addr=9.26.166.198)

6.2 Updating port configurations


A new port number (AHA_HADR_3:5953) is defined in the /etc/services file of the primary, the standby, and
all auxiliary standby servers. This port number will be used for the newly added auxiliary standby server
(saplxvm09) HADR local service name database configuration parameter.
Example:
sapdb2AHA 5912/tcp
AHA_HADR_1 5951/tcp # DB2 HADR log shipping
AHA_HADR_2 5952/tcp # DB2 HADR log shipping
AHA_HADR_3 5953/tcp # DB2 HADR log shipping
sapmsAHA 3600/tcp # SAP System Message Server Port
DB2_db2aha 60006/tcp
DB2_db2aha_1 60007/tcp
DB2_db2aha_2 60008/tcp
DB2_db2aha_3 60009/tcp
DB2_db2aha_4 60010/tcp
DB2_db2aha_END 60011/tcp

6.3 Performing a homogeneous system copy using SWPM


Sections 4.4 Homogeneous system copy using SWPM, 4.5 Configuring ports, and 4.6 Restoring the
database from a backup of this document must be completed on the new auxiliary host (saplxvm09).

Note: The homogeneous system copy changes the SAPDBHOST and j2ee/dbhost variables to saplxvm09 in the SAP
default profile /sapmnt/AHA/profile/DEFAULT.PFL. This needs to be manually changed back to the virtual
host, saplxvmsap. The auxiliary standby is also in rollforward pending mode just like the principal standby.

6.4 Configuring the HADR auxiliary standby database


On the new auxiliary standby host (saplxvm09), as user db2aha, the following sample script is executed to
configure the auxiliary standby database for HADR:
saplxvm09:db2aha 81> cat auxiliary_standby_hadr_cfg.sql
UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm09;
UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_3;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm07;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_1;
UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;

42
UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;
UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST
saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2;
UPDATE DB CFG FOR AHA USING HADR_SYNCMODE SUPERASYNC;
UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;
UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;
UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm09:db2aha 37> db2 z auxiliary_hadr_cfg.sql.log -tvf


auxiliary_standby_hadr_cfg.sql

saplxvm09:db2aha 55> db2 get db cfg for aha | grep HADR


HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = saplxvm09
HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_3
HADR remote host name (HADR_REMOTE_HOST) = saplxvm07
HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1
HADR instance name of remote server (HADR_REMOTE_INST) = db2aha
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: The HADR_TARGET_LIST parameter is where the other two HADR server host names and their port numbers are
listed in pairs. The order means that the first host in the list is the principal standby and the second host is the
auxiliary standby, and so on.

The HADR_TARGET_LIST database configuration parameter also needs to be updated in the primary and the
standby.

Example:
saplxvm07:db2aha 190> db2 "UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST
saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3"
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
SQL1363W One or more of the parameters submitted for immediate modification
were not changed dynamically. For these configuration parameters, the database
must be shutdown and reactivated before the configuration parameter changes
become effective.

saplxvm07:db2aha 70> db2 get db cfg for aha | grep HADR


HADR database role = PRIMARY
HADR local host name (HADR_LOCAL_HOST) = saplxvm07
HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_1
HADR remote host name (HADR_REMOTE_HOST) = saplxvm08
HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_2
HADR instance name of remote server (HADR_REMOTE_INST) = db2aha
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) =

saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3

43
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

saplxvm08:db2aha 95> db2 "UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST


saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3"
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
SQL1363W One or more of the parameters submitted for immediate modification
were not changed dynamically. For these configuration parameters, the database
must be shutdown and reactivated before the configuration parameter changes
become effective.

saplxvm08:db2aha 55> db2 get db cfg for aha | grep HADR


HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = saplxvm08
HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_2
HADR remote host name (HADR_REMOTE_HOST) = saplxvm07
HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1
HADR instance name of remote server (HADR_REMOTE_INST) = db2aha
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: To convert from a single standby to multiple standbys, as the above message indicated, the database must be
deactivated and reactivated. This will require a downtime of the system. It is recommended to quiesce the SAP
system to close the connections temporarily instead of stopping.

Example:

saplxvm09:db2aha 41> db2 deactivate db aha


DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm08:db2aha 98> db2 deactivate db aha


DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm07:db2aha 192> db2 deactivate db aha


DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm09:db2aha 42> db2 start hadr on db aha as standby


DB20000I The START HADR ON DATABASE command completed successfully.

saplxvm08:db2aha 99> db2 activate db aha


DB20000I The ACTIVATE DATABASE command completed successfully.

saplxvm07:db2aha 193> db2 activate db aha


DB20000I The ACTIVATE DATABASE command completed successfully.

44
Once HADR is successfully activated, the db2pd hadr command on the primary host will list all of the
standbys:
Example:
saplxvm07:db2aha 200> db2pd -d aha -hadr

Database Member 0 -- Database AHA -- Active -- Up 0 days 00:01:49 -- Date 2013-12-03-


16.16.07.536459

HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:20.709721 (1386105260)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 17
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 12/03/2013 16:19:51.000000 (1386105591)
READS_ON_STANDBY_ENABLED = N

HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SUPERASYNC
STANDBY_ID = 2
LOG_STREAM_ID = 0
HADR_STATE = REMOTE_CATCHUP

45
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm09
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:21.118983 (1386105261)
HEARTBEAT_INTERVAL(seconds) = 30
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 16
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = N

For more information on DB2s HADR multiple standby database feature, see the IBM Knowledge Center:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0059994.html

46
7 Failover scenarios
As mentioned earlier, the auxiliary standbys are only for disaster recovery (DR) purposes and automatic
failover is not supported between the primary and auxiliary standbys.
The HADR setup in the example of this paper has a primary, a principal standby, and an auxiliary standby.
The database manager configuration parameter HADR_TARGET_LIST on the primary, the principal standby,
and the auxiliary standby are set as follows:
saplxvm07:db2aha 71> db2 get db cfg for aha | grep HADR_TARGET_LIST
HADR target list (HADR_TARGET_LIST) =
saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3

saplxvm08:db2aha 56> db2 get db cfg for aha | grep HADR_TARGET_LIST


HADR target list (HADR_TARGET_LIST) =
saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3

saplxvm09:db2aha 58> db2 get db cfg for aha | grep HADR_TARGET_LIST


HADR target list (HADR_TARGET_LIST) =
saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2

With these settings, if the primary (on host saplxvm07) is down, the standby host saplxvm08 will become the
new primary host. The database on host saplxvm07 will be the principal standby, and saplxvm09 will be the
auxiliary standby. Once a failover happens, the goal should always be to bring back the failed host and
return to the original configuration. The following subsections show failover scenarios and how to recover
from them.
To test these failover scenarios, a DB2 software failure is simulated by killing the db2sysc process, which is
the main DB2 process. This is similar to a kernel panic and causes DB2 to become inaccessible.

Note: During the test scenario, workload is generated using SAP transaction SGEN, and the cluster is monitored using
lssam top.

Example:
saplxvm07:db2aha 64> ps -ef | grep -i db2sysc
db2aha 4428 4426 2 10:18 ? 00:00:45 db2sysc 0
db2aha 32216 28174 0 10:48 pts/0 00:00:00 grep -i db2sysc
saplxvm07:db2aha 65> which db2sysc
/db2/db2aha/sqllib/adm/db2sysc
saplxvm07:db2aha 66> mv /db2/db2aha/sqllib/adm/db2sysc
/db2/db2aha/sqllib/adm/db2sysc.backup
saplxvm07:db2aha 66>
saplxvm07:db2aha 66> kill -9 4428
saplxvm07:db2aha 66>

7.1 Failover scenario #1: The primary is down


The following example describes the scenario when the primary is down. This scenario simulates a failure
when DB2 is inaccessible.

When the primary (on host saplxvm07) goes down, the standby (on host saplxvm08) will automatically take
over. The SAP workload is temporarily interrupted but is automatically failed over to the standby database
and continues without stopping the application. The following screenshots show changes in the cluster
during the failover:

47
Figure 12: Cluster status during failover

As shown in the above figure, the cluster resource for host saplxvm07 is in pending online state as TSA tries
to restart DB2. However, since it is inaccessible (db2sysc was renamed, so it cannot start DB2), failover
occurs, as shown in the figure below:

Figure 13: Cluster status after the primary went down and the standby took over

As shown in the figure above, failover has occurred, the database is up as the resource group
db2_db2aha_db2aha_AHA is set to Online. The database is up
asIBM.Application:db2_db2aha_db2aha_AHA is set to Online.

Furthermore, it also shows that IBM.Application:db2_db2aha_db2aha_AHA is running on host


saplxvm08. The VIP 9.26.166.97 is shown to be bonded to the network interface on host saplxvm08.

48
The resource group IBM.ResourceGroup:db2_db2aha_saplxvm07 has the status Pending Online.
This indicates that it cannot start DB2 on saplxvm07 (since db2sysc has been renamed). Once DB2 can
be started on saplxvm07, SA MP automatically detects that and assigns saplxvm07 as the principal
standby.
The example below shows the renaming of db2sysc to make DB2 accessible again.
Example:
saplxvm07:db2aha 74> mv /db2/db2aha/sqllib/adm/db2sysc.backup
/db2/db2aha/sqllib/adm/db2sysc
saplxvm07:db2aha 81> db2start
07/23/2014 11:34:10 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.

After DB2 becomes accessible on host saplxvm07, SA MP assigns it as the principal standby. The auxiliary
standby remains as it is. Once the failed system is brought back up and is back in the cluster, the system will
be in HADR catch up state. All the logs from the current primary (saplxvm08) must be replayed for the
system to be in PEER state. To make saplxvm07 the primary again, HADR takeover can be performed
using the Graceful Maintenance Tool (GMT) as described in section 5.3.2 of this document.

7.2 Failover scenario #2: Both the primary and principal standby are down
The following scenario describes a situation where both the primary and the principal standby are
unavailable. This is similar to a disaster recovery situation where the auxiliary standby must be brought
online.
When the primary (on host saplxvm07) and the principal standby are both unavailable, all applications will
be stopped as no automatic failover is available. A manual takeover must be initiated from the auxiliary
standby database.
The following figure shows the SA MP resources when both the primary and the principal standby are down.

Figure 14: Cluster status after both the primary and the standby went down

The following steps must be performed to make the auxiliary standby the new primary and to start SAP:
1. Stop the SAP central instance and all application servers.

49
2. The parameter SAPDBHOST in the SAP profile /sapmnt/AHA/profile/DEFAULT.PFL needs to be
updated to point to the auxiliary standby host saplxvm09. Currently, it is pointing to the virtual host name
saplxvmsap. The new values should look as follows:

saplxvm06:~ # grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL


SAPDBHOST = saplxvm09
j2ee/dbhost = saplxvm09

3. The parameter Hostname in the CLI driver file db2cli.ini needs to be updated to point to the auxiliary
standby host saplxvm09 instead of the virtual host saplxvmsap:

saplxvm06:ahaadm 92> cat /sapmnt/AHA/global/db6/db2cli.ini


; Comment lines start with a semi-colon.
[AHA]
Database=AHA
Protocol=tcpip
Hostname=saplxvm09
Servicename=5912
[COMMON]
Diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

The takeover HADR command is executed on the auxiliary standby to make the host saplxvm09 the new
primary. The takeover BY FORCE option must be used because the primary and the principal standby are not
available.
saplxvm09:db2aha 11> db2pd -db aha -hadr | grep HADR_ROLE
HADR_ROLE = STANDBY
saplxvm09:db2aha 12> db2 TAKEOVER HADR ON DATABASE AHA BY FORCE
DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
saplxvm09:db2aha 13> db2pd -db aha -hadr | grep HADR_ROLE
HADR_ROLE = PRIMARY
saplxvm09:db2aha 17> db2pd -db aha -hadr | grep HADR_STATE
HADR_STATE = DISCONNECTED

5. The SAP central instance and application servers can be started.

Note: The host saplxvm09 is now the primary and all applications are connected directly to this host. The SA MP cluster
and automatic failover are not in effect. Note that because the auxiliary standby is for DR purposes only and is
forced to use the HADR SUPERASYNC synchronization mode, the failover may come with the cost of inflight
transaction loss during this kind of widespread disaster. The takeover operation may take longer depending on the
amount of logs to be replayed from the buffer as well as from disk if log spooling is used.

Once the old primary, host saplxvm07, and the old principal standby, host saplxvm08, are brought back
up, HADR is not active on those hosts:
saplxvm07:db2aha 102> db2pd -db aha -hadr
Database AHA not activated on database member 0 or this database name cannot be found
in the local database directory.
Option -hadr requires -db <database> or -alldbs option and active database.

saplxvm08:db2aha 65> db2pd -db aha -hadr


Database AHA not activated on database member 0 or this database name cannot be found
in the local database directory.
Option -hadr requires -db <database> or -alldbs option and active database.

50
The following steps can be performed to include the principal standby and the old primary back into the
HADR cluster.
1. At the moment, the old primary on host saplxvm07 cannot be activated since the auxiliary standby has
become the new primary through forced takeover. The host saplxvm07 still is the primary, so any attempt
to activate it will result in an error because a primary is already running.
saplxvm07:db2aha 108> db2 activate db aha
SQL1776N The command cannot be issued on an HADR database. Reason code = "6".
saplxvm07:db2aha 109> db2 ? SQL1776N

6

This database is an old primary database. It cannot be started


because the standby has become the new primary through forced
takeover.

Therefore, HADR must be started on the old primary as a standby:


saplxvm07:db2aha 111> db2 start hadr on db aha as standby
DB20000I The START HADR ON DATABASE command completed successfully.

2. The principal standby on host saplxvm08 can be activated since it is still a standby:

saplxvm08:db2aha 67> db2 activate db aha


DB20000I The ACTIVATE DATABASE command completed successfully.

The changes are reflected in the new primary, on host saplxvm09, as shown in the db2pd output below:

saplxvm09:db2aha 18> db2pd -db aha -hadr


Database Member 0 -- Database AHA -- Active -- Up 0 days 02:53:57 -- Date 2014-07-23-
13.38.18.130220

HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm09
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm07
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 13:36:31.238696 (1406136991)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 239
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 17
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

51
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 07/23/2014 13:41:54.000000 (1406137314)
READS_ON_STANDBY_ENABLED = N

HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = SUPERASYNC
STANDBY_ID = 2
LOG_STREAM_ID = 0
HADR_STATE = REMOTE_CATCHUP
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm09
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 13:19:45.058717 (1406135985)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 37
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 3
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0

52
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = N

At the moment, host saplxvm09 is the primary, host saplxvm07 is the principal standby, and host
saplxvm08 is the auxiliary standby host. The cluster and automatic failover are still not active. The following
steps can be performed to go back to the original setup with the primary on host saplxvm07, the principal
standby on host saplxvm08, and the auxiliary standby host saplxvm09.
1. The SAP central instance and application servers must be stopped to make changes to the SAP profile
variables.
2. The cluster configuration must be deleted using option 4 - Delete Database Cluster of
sapdb2cluster.sh to avoid interference (SQL1770N).
3. The takeover HADR command is executed in the standby host saplxvm07.

saplxvm07:db2aha 125> db2pd -db aha -hadr | grep HADR_ROLE


HADR_ROLE = STANDBY
saplxvm07:db2aha 126> db2 TAKEOVER HADR ON DATABASE AHA
DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
saplxvm07:db2aha 127> db2pd -db aha -hadr | grep HADR_ROLE
HADR_ROLE = PRIMARY

4. The parameter SAPDBHOST in the SAP profile, /sapmnt/AHA/profile/DEFAULT.PFL needs to be


updated to point to the virtual host saplxvmsap instead of saplxvm09. The new values should look as
follows:
saplxvm06:~ # grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL
SAPDBHOST = saplxvmsap
j2ee/dbhost = saplxvmsap

5. The parameter Hostname in the CLI driver file needs to be updated to point to the virtual host
saplxvmsap instead of the saplxvm09:

saplxvm06:ahaadm 92> cat /sapmnt/AHA/global/db6/db2cli.ini


; Comment lines start with a semi-colon.
[AHA]
Database=AHA
Protocol=tcpip
Hostname=saplxvmsap
Servicename=5912
[COMMON]
Diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

6. Now that the original HADR configuration has been restored, option 2 - Create Database Cluster of
sapdb2cluster.sh can be used to create the cluster and enable automatic failover.

Note: The cluster configuration script generates the cluster_config.xml file in the /tmp directory and executes the
db2haicu -f /tmp/cluster_config.xml command in the principal standby and the primary hosts. Error
messages are logged in sapdb2cluster.log, /tmp/cluster_config.log, and db2diag.log files.

For errors related to the configuration, it is recommended to check the /tmp/cluster_config.xml file format
and values. More details can be found in the IBM Knowledge Center at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0052800.h
tml?lang=en

53
7. Now the SAP central instance and application servers can be started.

7.3 Failover scenario #3: The principal standby is down

If the principal standby is down, it will simply be out of the HADR cluster. The applications are not affected.
The standby simply needs to be activated and it will be back in the cluster.
Example:
saplxvm08:db2aha 68> db2pd -db aha -hadr

Database AHA not activated on database member 0 or this database name cannot be found
in the local database directory.

Option -hadr requires -db <database> or -alldbs option and active database.

saplxvm08:db2aha 70> db2 activate db aha


DB20000I The ACTIVATE DATABASE command completed successfully.
saplxvm08:db2aha 71> db2pd -db aha -hadr

Database Member 0 -- Database AHA -- Standby -- Up 0 days 00:00:08 -- Date 2014-07-


23-15.18.54.335704

HADR_ROLE = STANDBY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 0
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = saplxvm07
PRIMARY_INSTANCE = db2aha
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = saplxvm08
STANDBY_INSTANCE = db2aha
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 15:18:49.837590 (1406143129)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 0
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 0
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 1.446530
LOG_HADR_WAIT_ACCUMULATED(seconds) = 230.194
LOG_HADR_WAIT_COUNT = 1093
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)
STANDBY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)

54
STANDBY_REPLAY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 240
PEER_WINDOW_END = 07/23/2014 15:16:10.000000 (1406142970)
READS_ON_STANDBY_ENABLED = N

Note: Once HADR has been set up properly with multiple standbys, it is recommended to exercise different failover
scenarios and develop documented procedures to follow during an actual disaster.

55
8 Miscellaneous troubleshooting in an SA MP
environment
This section introduces some useful commands for operations within an SA MP environment.

8.1 HADR congestion


DB2 HADR works by sending database logs via TCP/IP from the primary to the standby. On the standby, the
logs are stored in a buffer controlled by the DB2 registry variable DB2_HADR_BUF_SIZE. If the standby cannot
keep up with the amount of logs being shipped by the primary, the buffer might fill up, in which case the
primary will no longer be able to ship logs. The cluster will be in a CONGESTED state.
Example:
saplxvm07:db2aha 59> db2pd -db aha -hadr | grep -i HADR_CONNECT_STATUS
HADR_CONNECT_STATUS = CONGESTED
HADR_CONNECT_STATUS_TIME = 07/23/2014 15:18:49.835990 (1406143129)

Note: In this case, the HADR_CONNECT_STATUS_TIME shows the congestion start time.

While in a CONGESTED state and with a synchronization mode of SYNC, NEARSYNC, or even ASYNC, transactions
on the primary are stopped. With SUPERASYNC, transactions continue while the cluster is in a CONGESTED
state.

Note: To solve this issue, it is recommended to increase DB2_HADR_BUF_SIZE on the standby (as well as on the
primary in case it becomes the standby after a takeover). By default, DB2_HADR_BUF_SIZE is twice the size of
the primary's LOGBUFSZ which is also the minimum value.

As mentioned earlier, HADR_SPOOL_LIMIT can be used to avoid HADR congestion. HADR_SPOOL_LIMIT allows
the standby to spool logs to disk if the buffer is full. This means that the primary can continue with transactions
without having to wait for the standby to flush out the logs from the buffer. This is especially effective if congestion
is occurring during peak time.

Example:
saplxvm07:db2aha 61> db2 get db cfg for aha | grep -i LOGBUFSZ
Log buffer size (4KB) (LOGBUFSZ) = 1024
saplxvm07:db2aha 63> db2 get db cfg for aha | grep -i HADR_SPOOL_LIMIT
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

Note: DB2 HADR-related registry variables can be checked using the db2set command. Unless the registry variables
are set to a user-defined value, the default values are used for each variable.

Example:
saplxvm07:db2aha 58> db2set -lr | grep -i hadr
DB2_HADR_BUF_SIZE
DB2_HADR_NO_IP_CHECK
DB2_HADR_PEER_WAIT_LIMIT
DB2_HADR_SOSNDBUF
DB2_HADR_SORCVBUF
DB2_HADR_ROS
saplxvm07:db2aha 50> db2 get db cfg for aha | grep LOGBUFSZ
Log buffer size (4KB) (LOGBUFSZ) = 1024
saplxvm07:db2aha 51> db2set DB2_HADR_BUF_SIZE=3072
saplxvm07:db2aha 52> db2set | grep HADR
DB2_HADR_BUF_SIZE=3072

56
More on the DB2 HADR log shipping method can be found under the following link:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20log
%20shipping
For DB2 Version 10.5 and higher, db2fodc also provides an option to collect congestion-related traces
automatically. The automatic congestion trace can be turned on and off using the following commands:
Example:
saplxvm07:db2aha 54> db2fodc -hadr -db AHA -detect
"db2fodc": Starting detection ...

db2fodc HADR congestion detect rules:


iteration=1 sleeptime=0(sec) triggercount=10 interval=30(sec) duration=-
1(hour)

db2fodc:
Hostname: saplxvm07 HADR congestion detect iteration: 1
saplxvm07:db2aha 50> db2fodc -detect off
"db2fodc": Stopping all FODC detections. Note that it can take up to 60
seconds to stop all detections.

More on the DB2 HADR automatic congestion detection tool can be found at http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/r0060632.html?l
ang=en

8.2 Manual creation and deletion of an SA MP cluster


In case of an error during the cluster creation using SAPs Cluster Setup Tool sapdb2cluster.sh, the DB2
tool db2haicu can be used instead. The XML configuration file cluster_config.xml can be used with
db2haicu to manually create the cluster. An example of the file was provided earlier in section 5.2.2. A
cluster must be created on the standby as user db2<sid> first, followed by the primary, as shown in the
example below.

Example:
saplxvm08:db2aha 57> db2haicu -f /tmp/cluster_config.xml
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file
called db2diag.log. Also, you can use the utility called db2pd to query the status of
the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see
the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in
the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2aha'. The
cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some
time as db2haicu will need to activate all databases for the instance to discover all
paths ...
Creating domain 'sap_ahadb2' in the cluster ...
Creating domain 'sap_ahadb2' in the cluster was successful.

57
Configuring quorum device for domain 'sap_ahadb2' ...
Configuring quorum device for domain 'sap_ahadb2' was successful.
Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network
'db2network' ...
Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network
'db2network' was successful.
Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network
'db2network' ...
Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network
'db2network' was successful.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
HADR database 'AHA' has been determined to be valid for high availability. However,
the database cannot be added to the cluster from this node because db2haicu detected
this node is the standby for HADR database 'AHA'. Run db2haicu on the primary for
HADR database 'AHA' to configure the database for automated failover.
All cluster configurations have been completed successfully. db2haicu exiting ...

saplxvm07:db2aha 57> db2haicu -f /tmp/cluster_config.xml


Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file
called db2diag.log. Also, you can use the utility called db2pd to query the status of
the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see
the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in
the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2aha'. The
cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some
time as db2haicu will need to activate all databases for the instance to discover all
paths ...
Configuring quorum device for domain 'sap_ahadb2' ...
Configuring quorum device for domain 'sap_ahadb2' was successful.
Network adapter 'eth0' on node 'saplxvm07' is already defined in network 'db2network'
and cannot be added to another network until it is removed from its current network.
Network adapter 'eth0' on node 'saplxvm08' is already defined in network 'db2network'
and cannot be added to another network until it is removed from its current network.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
Adding HADR database 'AHA' to the domain ...
Adding HADR database 'AHA' to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...

saplxvm07:db2aha 49> db2haicu -delete


Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

Removing HADR database 'AHA' from the domain ...


Removing HADR database 'AHA' from the domain was successful.
Removing DB2 database partition '0' from the cluster ...
Removing DB2 database partition '0' from the cluster was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...
saplxvm08:db2aha 52> db2haicu delete > /tmp/db2haicu_delete.txt

58
8.3 SA MP cluster resource group
The rgreq command can be used to start, stop, cancel, lock, unlock, or move an SA MP resource group.
Example:
The following command is used to unlock the resource group db2_db2aha_db2aha_AHA-rg:

saplxvm08:~ # rgreq -o unlock db2_db2aha_db2aha_AHA-rg

Note: Because of APAR IC98315: VIRTUAL IP RESOURCE (IBM.SERVICEIP) CANNOT BE FOUND, one of the
resource groups in the DB2 HADR cluster configuration may remain locked after a graceful cluster switch in DB2
10.5 GA. The issue has been resolved in DB2 10.5 FP1.

8.4 Collection of traces


The lssam command with the T option can be used to write a trace on screen.

saplxvm07:~ # lssam T

Traces can also be collected for a particular resource manager (RM) using the lssrc tool.
lssrc -ls IBM.RecoveryRM
lssrc -ls IBM.GblResRM
lssrc -ls IBM.StorageRM

To collect a trace, first find out where the trace files are located using the following command:
saplxvm07:~ # lssrc -ls IBM.RecoveryRM | grep trace_summary
/var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary -> spooling not enabled
saplxvm07:~ # lssrc -ls IBM.GblResRM | grep trace_summary
/var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary -> spooling not enabled

The following commands can be used to format the trace to a more readable text and store them to the
specified location /tmp:
saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary >
/tmp/RecoveryRM_trace.out
saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary >
/tmp/ GblResRM_trace.out

The samlog command is a handy tool that can be used to collect, format, merge, and display SA MP-related
logs.
Example:
saplxvm07:~ # samlog -t 15m | more
samlog called at 2014-09-25 15:30:35 on saplxvm07 with options
System time offset between local host and saplxvm08 is +5.29 seconds. You may ad
just system times in cluster.
saplxvm07 0.00 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary
saplxvm08 +5.29 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary
-------------------------------------------------------------------------

A list of IBM Tivoli System Automation command references can be found using the following link:
http://www-
01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/samprgcharmcmds.html?lang
=en

59
8.5 HADR simulator
The DB2 HADR simulator can help plan, measure, and diagnose an HADR environment quickly and
efficiently. The tool can be downloaded for free from the IBM developerWorks wiki page:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20sim
ulator?section=Introduction. The wiki page also provides a detailed description of how to use the simulator.

8.6 Split-brain condition


If both databases in an HADR cluster become primaries independently, the HADR connection is lost and
applications can connect to both databases. The databases will be inconsistent and this condition is referred
to as an HADR split-brain. The following situations can lead to a split-brain condition:
If the standby becomes the primary and the original primary is brought back up using the START
HADR command with the AS PRIMARY BY FORCE option.

The TAKEOVER HADR command is issued on the standby with the PEER WINDOW ONLY option and the
primary is not brought down before the peer window expires.
After a forced takeover, the HADR-related configuration parameters (hadr_remote_host,
hadr_remote_inst, and hadr_remote_svc) are automatically updated on the new primary and
its standbys including the old primary. If the primary is not shut down before a forced takeover from
the standby, it might result in a split-brain condition as the automatic reconfiguration does not take
place until the old primary is shut down and restarted as standby.
To avoid a split brain situation during a force takeover, the standby sends a disabling message, also called a
poison pill, to the primary. The primary is shut down and cannot be reactivated unless the poison pill is
cleared by a START HADR command. More information on proper takeover scenarios in a multiple standby
HADR configuration can be found at the following link: http://www-
01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0059999.html?l
ang=en

Note: Once a split-brain condition is encountered, HADR must be configured from scratch using a database backup from
the host that has the most up-to-date logs. In such a situation, it is recommended to contact SAP support.

60
9 Conclusion
The DB2 HADR feature delivers a complete Disaster Recovery (DR), High Availability (HA) and Continuous
Availability solution, providing customers with greater data protection with minimum performance impact.
HADR also comes with a variety of configuration options to satisfy different business needs.
For example, SYNC mode can be used for guaranteed database log shipping as opposed to SUPERASYNC
mode which provides zero performance impact because the primary does not wait for an ACK after logs are
shipped. Log spooling helps achieve higher performance, reduced chance of congestion, and greater data
protection.
Automatic failover using SA MP along with SAPs Cluster Setup Tool and Graceful Maintenance Tool make it
easy for customers to monitor and maintain the HADR cluster.
With HADR support for IBM DB2 BLU Acceleration, DB2 HADR can now be used in SAP BW environments
with DB2 column-organized tables providing essential high performance as well as crucial DR and HA
capabilities. Moreover, DB2 HADR and BLU Acceleration are provided as a DB2 feature and can be enabled
out of the box.
With improvements to the DB2 LOAD command, customers can now take advantage of the faster LOAD
operation without compromising data replication.
Under ideal conditions, the tools and processes described in this paper can be used to implement an SAP
Business Suite system that is continuously available, protected from wide-spread disasters, and facilitates
micro-outages to help perform database maintenance with zero downtime.

61
10 Related Content
Note 1555903 - DB6: Supported DB2 Database Features
Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)
Note 1746101 - DB6: High Availability with SAP on DB2 using SA MP
Note 1443426 - DB6: Graceful Cluster Switch
Note 960843 - DB6: Cluster Setup Tool
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR

62
Copyright
2014 SAP SE SE or an SAP SE affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any
form or for any purpose without the express permission of SAP SE.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE and its affiliated companies (SAP SE Group) for informational
purposes only, without representation or warranty of any kind, and SAP SE Group shall not be liable for
errors or omissions with respect to the materials. The only warranties for SAP SE Group products and
services are those that are set forth in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an additional warranty.
SAP SE and other SAP SE products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP SE in Germany and other countries.
Please see
http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark
for additional trademark information and notices.

63

Das könnte Ihnen auch gefallen