Sie sind auf Seite 1von 124

IBM

SAP R/3 on DB2 for OS/390: Disaster Recovery


Jan Baisden, Judy Ruby-Brown, Stephanie Schmidt

International Technical Support Organization

http://www.redbooks.ibm.com

SG24-5343-00
IBM
SG24-5343-00
International Technical Support Organization

SAP R/3 on DB2 for OS/390: Disaster Recovery

October 1999
Take Note!

Before using this information and the product it supports, be sure to read the general information in
Appendix C, “Special Notices” on page 97.

First Edition (October 1999)

This edition applies to SAP R/3 on DB2 for OS/390, SAP R/3 Release 4.0B, OS/390 Release 2.6 (5645-001), AIX
Release 4.3.1 (5765-603), and IBM DATABASE 2 Server for OS/390 Version 5.1 (DB2 for OS/390) Release 4.1
(5655-DB2), and to all subsequent releases and modifications until otherwise indicated in new editions or
Technical Newsletters.

Comments may be addressed to:


IBM Corporation, International Technical Support Organization
Dept. HYJ Mail Station P099
522 South Road
Poughkeepsie, New York 12601-5400

When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any
way it believes appropriate without incurring any obligation to you.

 Copyright International Business Machines Corporation 1999. All rights reserved.


Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is
subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Contents

Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1. Introduction to SAP R/3 . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Overview of SAP R/3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 SAP R/3′s Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 SAP R/3′s Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Database Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 OS/390 Features for SAP R/3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 DB2 for OS/390 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 SAP R/3 on DB2 for OS/390 Structure . . . . . . . . . . . . . . . . . . . . . 9
1.6 Network Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1 Physical Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.2 Communication Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 2. Disaster Recovery Planning . . . . . . . . . . . . . . . . . . . . . . . 15


2.1 Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Design Process of a Disaster Recovery Solution . . . . . . . . . . . . . . . 17
2.3 Scope of the Recovery Solution . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Data Backup and Recovery Processes . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Data Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Data Backup Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 Data Transport and Secure Storage . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Readiness of the Recovery Site . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Managing and Operating the Recovery Site . . . . . . . . . . . . . . . . . . 27
2.6 Description of the Recovery Configuration . . . . . . . . . . . . . . . . . . . 28
2.6.1 Distance between Prime Site and Recovery Site . . . . . . . . . . . . 28
2.6.2 Configuration Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.3 Network Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.4 Workload Distribution across Two Sites . . . . . . . . . . . . . . . . . . 30

Chapter 3. Hardware, Software and Communications Configurations . . . . . 33


3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Database Server on S/390 . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.3 Peripheral Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 OS/390 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 AIX and Windows NT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 SAP R/3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.4 Complementary Software . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Communication between Database Server and Application Server . 41
3.3.2 Communication - Application Server to Presentation Servers . . . . . 42

Chapter 4. Backup/Recovery Considerations in Disaster Recovery . . . . . . 45

 Copyright IBM Corp. 1999 iii


4.1 Utilities of the DB2 for OS/390 Environment . . . . . . . . . . . . . . . . . . 45
4.1.1 The COPY Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2 OS/390 DFSMSdss (Data Set Dump/Restore) . . . . . . . . . . . . . . . 48
4.2 Backup of Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Backup of Application Servers on RS/6000 . . . . . . . . . . . . . . . . 49
4.2.2 Backup of Application Servers on RS/6000 SP . . . . . . . . . . . . . . 49
4.2.3 Backup of Application Servers on Windows NT . . . . . . . . . . . . . 50
4.2.4 Transport and Correction System . . . . . . . . . . . . . . . . . . . . . 50
4.3 Recovery Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 DB2 for OS/390 Point-in-time Disaster Recovery . . . . . . . . . . . . . . . 51
4.4.2 Point-in-Time Recovery Using DB2 Conditional Restart . . . . . . . . 53
4.4.3 Preparing for Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Recovery of Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.1 Recovery of Application Servers on RS/6000 . . . . . . . . . . . . . . . 58
4.5.2 Recovery of Application Servers on RS/6000 SP . . . . . . . . . . . . . 58
4.5.3 Recovery of Application Servers on Windows NT . . . . . . . . . . . . 59
4.5.4 Installing New Application Servers . . . . . . . . . . . . . . . . . . . . . 59

Chapter 5. Restarting from Remote Locations . . . . . . . . . . . . . . . . . . . 61


5.1 Components at the Remote Site . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1 Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.2 Database Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.3 Connectivity Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.4 Communications Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.5 Peripheral Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.6 Workstations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Steps to Recover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Remote Site Recovery from Disaster at a Local Site . . . . . . . . . . . . . 64
5.3.1 Steps to Recover (Non-Data Sharing) . . . . . . . . . . . . . . . . . . . 64
5.3.2 Steps to Recover (Data Sharing Only) . . . . . . . . . . . . . . . . . . . 72
5.4 Advanced Disaster Recovery Planning . . . . . . . . . . . . . . . . . . . . . 85
5.4.1 Backup and Vaulting Procedures . . . . . . . . . . . . . . . . . . . . . . 85
5.4.2 Using a Tracker Site for Disaster Recovery . . . . . . . . . . . . . . . 85
5.4.3 Geographically Dispersed Parallel Sysplex (GDPS) . . . . . . . . . . . 87
5.4.4 Asynchronous Remote Copy (XRC) Procedures . . . . . . . . . . . . . 88

Appendix A. Items Needed at the Recovery Site . . . . . . . . . . . . . . . . . . 91


A.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.3 Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.4 Plans and Manuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Appendix B. Sample Backup and Vaulting Plans . . . . . . . . . . . . . . . . . 95

Appendix C. Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Appendix D. Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 99


D.1 International Technical Support Organization Publications . . . . . . . . . 99
D.2 Redbooks on CD-ROMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
D.3 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

How to Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


IBM Redbook Fax Order Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

iv SAP R/3 on DB2 for OS/390: Disaster Recovery


Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

ITSO Redbook Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Contents v
vi SAP R/3 on DB2 for OS/390: Disaster Recovery
Figures

1. SAP R/3 Basis and Application Layers . . . . . . . . . . . . . . . . . . . . 2


2. SAP R/3 Three-Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . 3
3. SAP R/3 Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Structure of SAP R/3 on DB2 for OS/390 . . . . . . . . . . . . . . . . . . . 9
5. ICLI Architechture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. Examples of Physical Connections . . . . . . . . . . . . . . . . . . . . . . . 12
7. Disaster Frequency by Type . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8. On-Site Recovery versus Disaster Recovery . . . . . . . . . . . . . . . . . 17
9. Trade-offs in Disaster Recovery Solution Design . . . . . . . . . . . . . . 20
10. Data Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
11. Network Connections in Normal Operation . . . . . . . . . . . . . . . . . . 42
12. Network Connections in Post-Disaster Situations . . . . . . . . . . . . . . 43

 Copyright IBM Corp. 1999 vii


viii SAP R/3 on DB2 for OS/390: Disaster Recovery
Tables

1. Actions to Take When LOAD Is Interrupted . . . . . . . . . . . . . . . . . . 83


2. Actions to Take When REORG Is Interrupted . . . . . . . . . . . . . . . . . 83
3. Vaulting Procedure 24 Hours Data Currency . . . . . . . . . . . . . . . . . 95
4. Vaulting Procedure 30 Minutes Data Currency . . . . . . . . . . . . . . . 95

 Copyright IBM Corp. 1999 ix


x SAP R/3 on DB2 for OS/390: Disaster Recovery
Preface

This redbook will help you plan and install a disaster recovery solution for SAP
R/3 on DB2 for OS/390. It is one in a series of redbooks about the SAP R/3 on
DB2 for OS/390 solution and the associated environment.

Many companies are migrating their critical business applications from the
legacy mainframe environment to SAP R/3. As these companies depend on
information and processes managed by SAP R/3, the availability of the SAP R/3
system, even in the case of a disaster, is essential.

In the context of this book, a disaster is defined as an extended service


interruption of the information technology (IT) services of an organization that
cannot be corrected within an acceptable predetermined time frame, and that
necessitates the use of alternate equipment or an alternate site for recovery.

In this redbook we discuss general issues of disaster recovery, backup and


restore processes for SAP R/3 on DB2 for OS/390, and the disaster recovery
process for SAP R/3 on DB2 for OS/390.

The Team That Wrote This Redbook


This redbook was produced by a team of specialists from around the world
working at the International Technical Support Organization, Poughkeepsie
Center.

Jan Baisden is a Senior Market Support Representative at the International


Technical Support Organization, Poughkeepsie Center. He concentrates on ERP
Solutions and particularly on AIX support of S/390 uses in those solutions.
Before joining the ITSO five years ago, Jan worked in the International Systems
Center in Gaithersburg, Maryland, supporting projects in Latin America and Asia.

Judy Ruby-Brown is a Certified Consulting I/T Specialist for IBM′s Advanced


Technical Support in the Dallas Systems Center. She has worked at IBM for 25
years. For the last 10 years, she has supported DB2 for OS/390 at the Dallas
Systems Center. Her focus has been on disaster recovery, high availability, and
DB2 Data Sharing in the Parallel Sysplex. She has presented on these subjects
at many professional conferences and has been a reviewer for other redbooks.

Stephanie Schmidt is a Business Recovery Consultant with IBM Global Services


in Germany. She develops Business Recovery strategies for customers running
SAP R/3 on AIX. Before joining Business Recovery Services, she was a Services
Specialist for SAP R/3 on AIX and was involved in planning and implementing
SAP R/3. Stephanie has concentrated on systems management, data backup,
and high availability.

Thanks to the following people for their invaluable contributions to this project:

International Technical Support Organization, Poughkeepsie Center


Dave Benin
Rich Conway
Vasilis Karras
Frank Kyne
Gregor Neaga

 Copyright IBM Corp. 1999 xi


Ken Trowell
Bill White

IBM Poughkeepsie
Mary Ellen Cowles
Mary Ann Ritosa

International Technical Support Organization, San Jose Center


Alison Pate

IBM Germany
Namik Hrle
Andreas Maier

IBM United Kingdom


David Clitherow

IBM Dallas Systems Center


Lee Siegmund

IBM Santa Theresa Laboratory


Jeanne Kays

Comments Welcome
Your comments are important to us!

We want our redbooks to be as helpful as possible. Please send us your


comments about this or other redbooks in one of the following ways:
• Fax the evaluation form found in “ITSO Redbook Evaluation” on page 107 to
the fax number shown on the form.
• Use the online evaluation form found at http://www.redbooks.ibm.com/
• Send your comments in an Internet note to redbook@us.ibm.com

xii SAP R/3 on DB2 for OS/390: Disaster Recovery


Chapter 1. Introduction to SAP R/3

This chapter provides a general description of SAP R/3 and an overview of the
architecture of that system. The specific solution SAP R/3 on DB2 for OS/390 is
then explained in terms of the SAP R/3 architecture. Since DB2 for OS/390 is
such an important part of this solution, features of DB2 for OS/390 are introduced
as a foundation for later chapters. For more details about SAP R/3 on DB2 for
OS/390, see Implementing SAP R/3 in an OS/390 Environment Using AIX and
Windows NT Application Servers, SG24-4945, SAP R/3 on DB2 for OS/390:
Planning Guide SAP R/3 Release 4.0B, SC33-7962, and SAP R/3 on DB2 for
OS/390: Connectivity Guide, SC33-7965.

1.1 Overview of SAP R/3


SAP R/3′s suite of client/server data processing products is based on the
concept of combining all the business activities and technical processes of a
company into a single, integrated software solution. The power of SAP software
lies in real-time integration, linking a company′s business processes and
applications, and supporting immediate responses to change throughout the
organization—on a departmental, divisional or global scale. Its applications
cover a wide variety of areas, including financial, asset management, controlling,
production planning, project system, quality assurance, and human resources.

1.1.1 SAP R/3′s Architecture


SAP R/3 is designed around software services rather than hardware platforms.
Note the distinction between software services, which logically have no
dependency on hardware, and servers, which are machines. There are three
categories of services:
• Presentation Services:
SAP R/3 graphical interfaces on Windows, OS/2, MAC, Motif or JAVA
platforms.
• Application Services:
SAP R/3 application logic running on one or more systems, including batch
and interactive SAP programs. SAP R/3 also provides monitoring utilities.
• Database Services:
Vendor-provided database systems. SAP R/3 uses the database systems to
store data from various application servers.

An application service is designed and implemented in layers, isolating the SAP


R/3 application logic from the operating system-dependent services. A
middleware layer, called the basis layer, communicates with the operating
system and the network. Figure 1 on page 2 illustrates the layering of the
application service.

 Copyright IBM Corp. 1999 1


Figure 1. SAP R/3 Basis and Application Layers

The client/server architecture employed by SAP R/3 removes many configuration


details about the network and individual machines in use by SAP R/3 from
administration tasks. In addition, SAP has been careful to implement this
architecture at the level of work items and services. An application task can be
scheduled on any application server within the same SAP system. Further, this
application task can use services on the server it is on and from other servers in
the SAP system. The SAP R/3 system routes these requests to the appropriate
server.

Services are provided by administrative units called instances that group


together components of SAP R/3. When first installed, the SAP R/3 system has a
central instance, which has services such as dialog, update, enqueue, batch,
message, gateway, and spool. After installation, these services can be moved or
replicated to other application servers in order to balance workloads.

This architecture allows a more dynamic approach to managing workloads,


because customers can organize their SAP R/3 systems into tiers. Some
installations have the application services, database service, and presentation
service on the same machine. This is a single-tier system. Usually, though, the
presentation service is moved to workstations, making the SAP R/3 system a
two-tier system. Others wish to optimize database performance; they place the
database service on a system separate from the other services. This is a
three-tier system. Figure 2 on page 3 illustrates the three-tiered architecture of
SAP R/3.

2 SAP R/3 on DB2 for OS/390: Disaster Recovery


Figure 2. SAP R/3 Three-Tier Architecture

SAP R/3 customers can use SAP-supplied utilities to add more machines for
application and presentation services to the existing SAP R/3 system. Thus, SAP
R/3 can support centralized or decentralized computing with its distributed
client/server architecture.

This architecture is implemented with a group of services designed to provide


simple, consistent interfaces to SAP R/3 programs and support the largest
possible number of operating systems, databases and networks.

In multi-tier configurations, TCP/IP is usually used for network communication.


As explained in 1.5, “SAP R/3 on DB2 for OS/390 Structure” on page 9, SAP R/3
on DB2 for OS/390 is a three-tier structure in which DB2 for OS/390 and the
OS/390 operating system supply database services. The network communication

Chapter 1. Introduction to SAP R/3 3


between the application services (running on AIX or Windows NT) and DB2 for
OS/390 is handled by a special high-speed protocol. There are three different
types of this high-speed protocol, as explained in 1.6.2, “Communication
Protocols” on page 13. Note that communication between application servers
and presentation servers is still done through TCP/IP.

1.1.2 SAP R/3′s Applications


SAP AG provides a variety of software packages for handling applications with
SAP R/3. The application software packages (and the common SAP AG
abbreviations for them) presently available are:
SD Sales & Distribution
MM Materials Management
PP Product Planning
QA Quality Assurance
PM Plant Maintenance
HR Human Resources
FI Financial Accounting
CO Controlling
AM Fixed Assets Management
PS Project System
OC Office and Communication

These applications are contained within one database, but not all are used by
any one company. Some companies use only one of the applications; others
have several SAP R/3 systems for different applications. It is obvious that the
way an SAP R/3 system is managed depends on the criticality of the applications
used by that company.

In the Industry Solutions (IS) area, SAP AG makes a broad range of specific
industry solutions available. These are based on standard SAP R/3 applications.
To see SAP AG offerings for industries, consult the SAP Web site at:
http://www.sap-ag.de
This should lead to a page for “Industry Solutions” currently found at:
http://www.sap-ag.de/products/industry/index.htm
For any specific application package, SAP AG should be consulted regarding
availability on a specific hardware or operating system configuration.

1.2 Database Integrity


Two principles that are basic to database integrity are locking and the concept of
a unit of work.

Locking prevents concurrent users from accessing inconsistent data. For


example, by getting a lock for a database record before modifying it, you prevent
more than one user transaction from modifying the same record at the same
time.

4 SAP R/3 on DB2 for OS/390: Disaster Recovery


Locking in SAP R/3 is done by the Enqueue Server that runs as a process in the
central instance of the SAP Application Server. It is the application′ s
responsibility to request an enqueue (ENQ) for a data object before accessing it.

Changes to a database made by an application are not visible to other


applications until the changes are committed. The application can commit its
changes explicitly, or the commit can occur implicitly when the application
terminates. If the application terminates abnormally, then all changes to the
database made since the last commit point are “rolled back.” The processing
between the start of the application and the commit point is called a logical unit
of work (LUW). Database integrity is maintained by ensuring that all changes to
a database made during a unit of work are either committed or rolled back;
there can be no partial updates.

Each dialog step of a business transaction in SAP R/3 can be processed on a


different work process in the application server host, and would therefore use a
different ICLI thread to communicate with DB2. The database changes made in
a dialog step are all committed at the end of the dialog step. This is acceptable
to some applications, but most business transactions consist of multiple dialog
steps. In other words, the SAP R/3 LUW is expected to consist of the complete
business transaction, while DB2 treats each dialog step as a discrete unit of
work. The SAP R/3 system has special Update Services to manage this
difference in scope; see Figure 3 on page 6.

Chapter 1. Introduction to SAP R/3 5


• Steps 1, 3, and 4 intend to update the application database; the
actual writing is to VBLOG (or the set of tables used for VBLOG:
VBHDR, VBMOD, and VBDATA).
• Step 2 is user interaction; no database processing occurs.
• Step 4 performs an insert with no (additional) database input.
• The SAP R/3 Enqueue Server has the responsibility to lock out other users′
access to affected application data from the time the intent to update
is recognized until the SAP R/3 COMMIT WORK is complete.
• Note the difference in the DB2 units of work (DB2 uw1-uw4) and the
SAP R/3 LUW (the entire dialog).
• This is a common technique used in SAP R/3 programming; however,
it is not the only way dialogs function.

Figure 3. SAP R/3 Transaction

An SAP/R3 LUW starts when the transaction starts. As the transaction changes
data, all updates are consolidated through VB Protocol entries in a database

6 SAP R/3 on DB2 for OS/390: Disaster Recovery


table called VBLOG (or VB tables that replace VBLOG: VBHDR, VBMOD, and
VBDATA). At the end of the dialog step, these updates to VBLOG are committed
to DB2. In the last dialog step of the business transaction, an SAP COMMIT
WORK is executed by the ABAP/4 program.

The execution of the transaction now moves to the update task, which processes
all of the entries in VBLOG (or the VB tables) for this SAP LUW. All of the
changes to all of the databases modified by this SAP LUW are made at this time,
in a single DB2 unit of work. The entries in the VBLOG (or the VB tables) are
deleted in this same unit of work.

After a failure, DB2 will recover the database to a consistent state, rolling back
those units of work that were “in flight” at the time of failure, and committing
those units of work that had completed. Note that this database state, even
though consistent from the DB2 point of view, may have incomplete SAP LUWs.
The application server host must be restarted to process the VBLOG data,
backing out changes for business transactions that are not complete (that is,
have not executed a COMMIT WORK).

1.3 OS/390 Features for SAP R/3


OS/390 is an integrated enterprise server operating system environment. It
incorporates into one product an open communication server, distributed data
and file services, Parallel Sysplex support, object-oriented programming, and
open application interfaces.

OS/390 continues to build on the classic strengths of MVS— reliability,


continuous availability, serviceability, data integrity, workload management, and
security. OS/390 gives you a scalable system that supports massive transaction
volumes and large numbers of users with high performance, as well as
advanced system and network management.

Through its support of UNIX interfaces via the OS/390 UNIX System Services,
OS/390 becomes a database server for SAP R/3, allowing you to profit from
client/server technology benefits such as distributed processing and extensive
scalability. SAP R/3 application programs and user data, including data and
process models, are stored on the database server. SAP R/3 uses DB2 for
OS/390 as the database server, which can manage large amounts of data on
behalf of many users.

IBM adds a OS/390 UNIX System Services program, called Integrated Call Level
Interface (ICLI), which passes DB2 data to the SAP R/3 Database Interface
(DBIF). Communication services support TCP/IP for general access to the
internet.

The strengths that OS/390 and System/390 bring to the SAP R/3 environment
include:
• Reliability, availability, and serviceability
SAP R/3 customers need continuous data availability and integrity. OS/390
reliability and availability is unsurpassed and it has a history of unmatched
security and integrity. SAP R/3 benefits from these underlying
characteristics.
• Scalability

Chapter 1. Introduction to SAP R/3 7


The System/390 platform ranges from small uniprocessors to 10-way
processors to Parallel Sysplex environments, which allow you to connect up
to 32 OS/390 systems. The platform can thus support thousands of users.
The architecture of the System/390 I/O subsystem and the OS/390 operating
system allow data to be transferred into memory from many devices
simultaneously, allowing the processing of data requests for many users at
high data rates. The requests may require accessing data residing in
multiple-terabyte repositories.
• System management
OS/390 has many system management capabilities, providing data security,
strong operations tools, and the ability to manage diverse workloads.
System/390 has proven procedures and tools to manage systems in a very
efficient way.
• Cost of ownership
System/390 is acknowledged by consultants such as IDC, GartnerGroup,
Xephon, ITG, and others as having one of the lowest overall costs of
ownership in a client/server environment. CMOS technology and software
pricing actions have drastically reduced the cost of System/390 enterprise
computing.

1.4 DB2 for OS/390 Features


DB2 for OS/390 is engineered to deliver the high performance and high levels of
availability, integrity, and security needed for your business applications. The
strengths DB2 for OS/390 brings to the SAP R/3 environment include:
• Continuous operation and high availability
DB2 for OS/390 can operate for long periods without interruption. With data
sharing, work can be transferred between DB2 for OS/390 subsystems within
a Parallel Sysplex to handle a planned or unplanned outage. Online
reorganization provides greater availability during database unload and
reload processes. See the redbook, High Availability Considerations: SAP
R/3 on DB2 for OS/390, SG24-2003, for more detail on high availability
planning.
• Data sharing in a Parallel Sysplex environment
DB2 for OS/390 exploits the Parallel Sysplex environment through data
sharing, which allows applications running on more than one DB2 for OS/390
subsystem to read from and write to the same set of data concurrently.
• High data integrity
DB2 for OS/390 provides high data integrity through capabilities such as a
sophisticated lock manager and integration with IBM system security
products. DB2 for OS/390 also protects data from subsystem, media, and
application failures with integrated recovery schemes.
• Very large database support
DB2 for OS/390 works with the System/390 I/O subsystem to allow the rapid
parallel processes needed for very large database backup, reorganization,
and recovery of data. The maximum table space size with DB2 for OS/390
V6 is sixteen terabytes (TB).

8 SAP R/3 on DB2 for OS/390: Disaster Recovery


• Database and system administration aids
To help database administrators manage their database environments, DB2
for OS/390 offers an integrated set of tools and functions, including flexible
security mechanisms, an extensive set of logging and recovery utilities, trace
facilities for tuning, and functions and tools to monitor and tune subsystems.
DB2 for OS/390 can use hardware compression to drive down the cost of
data storage.
• Other features
In addition to the preceding items, several features have been added to DB2
for OS/390 to increase the usability for applications such as SAP R/3 The
most important of these is a feature that allows dynamic caching of SQL
statements, saving interpretation overhead.

DB2 UDB for OS/390 Version 6 has become available since this redbook was
begun. This version has several features that particularly apply to
backup/restore processing, such as parallelism in the COPY and RECOVER
utilities, fast log apply, and parallel index build. We discuss those features in
Chapter 4, “Backup/Recovery Considerations in Disaster Recovery” on page 45
and Chapter 5, “Restarting from Remote Locations” on page 61 when the
features are particularly important in the steps we have recommended. You
should be aware, however, that the practical experience and work that was the
basis for this book was done with DB2 for OS/390 V5.

1.5 SAP R/3 on DB2 for OS/390 Structure


The SAP R/3 on DB2 for OS/390 implementation is a three-tier structure. (Note
that when application servers on OS/390 are available, the physical structure is a
two-tier landscape, but the logical design is three tiers. That is, the application
servers on OS/390 are logically separated from the database server which may
be on the same machine or within the same Parallel Sysplex.) Presentation
services run on workstations connected to application services that are running
on AIX, Windows NT, Sun Solaris, or (in SAP 4.6) S/390. The SAP R/3 database
server runs on an OS/390 system connected to AIX, Windows NT, or Sun Solaris
application servers over a network connection. The use of OS/390 applications
servers will not require a network connection. We do not discuss Sun Solaris
applications servers in this redbook; we had no hardware that would allow us to
gain experience. The different options regarding network connectivity are
explained in 1.6.1, “Physical Connection” on page 12.

Figure 4. Structure of SAP R/3 o n DB2 for OS/390

Chapter 1. Introduction to SAP R/3 9


The SAP R/3 system is comprised of all of the hardware and software
components used in the SAP R/3 on DB2 for OS/390 solution, including the
database, application, and presentation server(s) and the services that they
provide.

Figure 4 on page 9 shows important components for SAP R/3 on DB2 for
OS/390:
DBIF The Database Interface (DBIF) of SAP R/3 has been modified to
support DB2 for OS/390. The DBIF resides on the application server
and is responsible for acceptinq SQL statements from the
applications. DBIF then forwards the SQL to the Database Service
Layer (DBSL).
DBSL The Database Service Layer (DBSL) of SAP R/3 has the responsibility
of adapting SQL to the specific requirements of a DBMS (in this case,
DB2 for OS/390). Additionally, the DBSL forwards the adapted SQL to
the appropriate communication software (in this case, ICLI).
ICLI For communication with the DB2 for OS/390 database service, the
DBSL uses a component called the Integrated Call Level Interface
(ICLI). ICLI consists of a client and server component, which allows
AIX and Windows NT application servers to access an OS/390
database server remotely across a network. The DBSL uses only a
subset of database functions and ICLI delivers exactly that subset.
The server component of ICLI is a program based on OS/390 UNIX
System Services; the client component is a Keep Alive executable
along with a program that either resides in an AIX shared library
(ibmiclic.o) or a Windows NT dynamic link library. The ICLI
components are provided as a part of OS/390 UNIX System Services;
users should consult SAP notes to determine the ICLI level (and IBM
service identifiers) they require. Figure 5 on page 11 shows how an
ICLI connection between application server and OS/390 database
server is established.

10 SAP R/3 on DB2 for OS/390: Disaster Recovery


Figure 5. ICLI Architechture

1.6 Network Connectivity


The three-tier structure that the implementation of SAP R/3 on DB2 for OS/390
uses requires a high-speed, high-bandwidth communication connection between
the database server and the application server. SAP R/3 on DB2 for OS/390:
Connectivity Guide, SC33-7965 is now available; it explains possible hardware
and software configurations.

Chapter 1. Introduction to SAP R/3 11


On the current release of OS/390, the options for network connectivity can be
divided into two separate areas:
• Physical connection
• Network protocol

The most advanced connectivity products now available are Gigabit Ethernet and
the software/hardware products provided in OSA Express.

1.6.1 Physical Connection


For the physical connections between the application server and the database
server, three types of connection hardware are supported:
• ESCON channel
• FDDI LAN
• Gigabit Ethernet LAN (GbE)

Note
All information about FDDI LANs in this chapter also applies to Gigabit
Ethernet (GbE) LANs.

The ESCON channel option is now available for a connection between OS/390
and AIX or OS/390 and Windows NT. Figure 6 shows examples of possible
physical connections.

Figure 6. Examples of Physical Connections

12 SAP R/3 on DB2 for OS/390: Disaster Recovery


1.6.2 Communication Protocols
Three communication protocols are available for communication between the
SAP R/3 work processes on the application server and the database server:
• TCP/IP
• High-Speed UDP
• Enhanced ESCON

Both High-Speed UDP and Enhanced ESCON are UDP-based protocols. Neither
is recommended now that TCP/IP has become available.

1.6.2.1 TCP/IP
With TCP/IP (Transmission Control Protocol/Internet Protocol) support, the
flexibility of SAP R/3 configurations is considerably improved. You can use any
network supported by IBM TCP/IP that meets the SAP R/3 speed requirements.

This allows you to use the same protocol for channel connections, FDDI, GbE, or
other connectivity hardware. Thus you may have less expensive secondary
connections (such as using FDDI as a backup for a channel connection) without
changing the controlling software or protocol.

TCP/IP can be used over an ESCON channel (if an ESCON channel adapter is
installed in the application server). TCP/IP can also be used over GbE LANs or
FDDI LANs through a OSA-2 adapter on S/390 (if the appropriate LAN adapter is
installed on the application server). No special device drivers are required,
however when you install the LAN adapter or the channel adapter you may be
instructed to install a supplied device driver rather than one of those on your AIX
installation medium.

TCP/IP also has expandability advantages over the other two protocols. As new
devices, architectures, and features are created, TCP/IP contains the required
support earlier that specialized software.

1.6.2.2 High-Speed UDP


IBM has implemented a high-speed protocol named High-Speed UDP that can
use both OSA-2 and ESCON connections. This support allows Windows NT
application servers, in addition to AIX servers, to be connected to the same
OS/390 system through an OSA-2 feature and an FDDI LAN.

The High-Speed UDP protocol provides efficient communications for applications


such as SAP R/3 through its short OS/390 instruction path. The protocol uses
UDP as its method for transferring data packets. Over ESCON, this support uses
the HPDT MPC channel protocol, which requires special device drivers for AIX.
Over an FDDI LAN, this support uses the HPDT MPC mode to communicate with
the OSA-2 feature, which sends IP packets over the FDDI LAN. Thus, there are
no special requirements for application servers connected to an FDDI LAN.

Note that while High-Speed UDP is based on UDP, you will still have to configure
the normal full-function TCP/IP stack for it to work. TCP/IP is necessary during
the installation process for SAP R/3 since FTP is used. During normal operation
of the SAP R/3 system, TCP/IP is used for performance monitoring and CCMS
remote job submission. The logical file system support in OS/390 UNIX System
Services allows several different back-end stacks to coexist. The OS/390 system
transparently directs standard inbound and outbound data to the correct

Chapter 1. Introduction to SAP R/3 13


back-end stack. You will also have to define a VTAM Transport Resource List
Entry (TRLE) to exploit High-Speed UDP support.

This protocol is not recommended now that TCP/IP is implemented; it can be


anticipated that support will be withdrawn in a future release.

1.6.2.3 Enhanced ESCON


Enhanced ESCON support provides high-performance communications between
an OS/390 system and an AIX system that is attached through the ESCON
channel interface. Typically, this connection will be between an OS/390 and an
RS/6000 SP processor. In such cases, only a subset of the RS/6000 SP nodes
running SAP R/3 applications need to be connected to the System/390 through
the ESCON channel. The SAP R/3 applications on remaining RS/6000 SP nodes
access OS/390 by using the ESCON-connected nodes as gateways (which are
accessed through the High Performance Switch).

The Enhanced ESCON support provides an OS/390 internal communications


protocol that is roughly compatible with the standard AF_INET UDP protocol for
ESCON-connected RS/6000s.

Enhanced ESCON support consists of the following parts:


• An OS/390 physical file system (PFS) that provides enhanced ESCON
communications through a new I/O device driver for the ESCON channel
interface. This new PFS is called the AF_UEINT PFS.
• A complementary network interface driver for AIX that supports both the
existing CLAW and the new Enhanced ESCON communication interfaces.

This protocol is not recommended now that TCP/IP is implemented; it can be


anticipated that support will be withdrawn in a future release.

14 SAP R/3 on DB2 for OS/390: Disaster Recovery


Chapter 2. Disaster Recovery Planning

This chapter discusses disaster recovery planning in general. Not all topics
mentioned here will be covered in the rest of the manual.

A disaster recovery solution is driven by an organization′s business


requirements and the associated information technology (IT) environments. The
disaster recovery design describes, in broad terms, the overall characteristics
and major elements of the solution. Several main aspects of the design drive
the specific technology elements of the solution. This chapter discusses these
main aspects, which are:
• Risk management
• Design process for a disaster recovery solution
• Scope of the recovery solution
• Data backup and recovery processes
• Management and operation of the recovery site
• Recovery site configuration

It may be that reduced levels of service are acceptable, for some or all of the
critical applications, in the event of a disaster. Service levels for disaster mode
need to be negotiated, defined and then formally documented in a service level
agreement. Statements should be included to cover aspects such as recovery
targets, availability, capacity, and performance.

2.1 Risk Management


Although it is impossible to define a complete list of all types of disasters,
several categories can be identified:
• Local site disasters are events that are limited to a specific area, room, or
location of a building (for example, the computer room). This type of
disaster can be the result of:
− Fire
− Flooding
− Catastrophic machine failure
− Sabotage
− Power failure
• Site disasters affect the whole building and can be caused by such events as:
− Bombings
− Explosions
− Fire
− Flood
− Power outages
• Area disasters generally affect the vicinity or area where the building is
located. This area may cover a radius of several miles and can be caused
by:
− Bombings
− Earthquakes
− Environmental contamination

 Copyright IBM Corp. 1999 15


− Explosions
− Outbreaks of disease
− Plane crashes
− Volcanic eruptions
− Wind or snow storms
− Terrorist attacks

In some of the above situations, the IT equipment may still be intact and usable,
but simply inaccessible. With preplanning, you may be able to run the data
center from a remote location for a short period of time.

Recent statistics on the most common types of disasters that occur show that
hardware failures and natural disasters such as hurricanes are the most
common causes of disasters (see Figure 7).

Figure 7. Disaster Frequency by Type. Source: I B M Business Recovery Services. This


data is based on 202 disaster incidents since 1991.

It stands to reason that as more preventive measures are put in place, the less
chance there is of a situation resulting in a disaster to an organization.
However, no matter how sophisticated these preventive measures are, there will
always be some risk of an outage. You reduce the risk of disasters as you
increase the cost of preventing such disasters.

Certain measures should be taken to help prevent disasters from affecting your
organization or to minimize their impact when they are unavoidable. The best
way to determine which measures your organization should include is to conduct
a risk analysis to determine where the organization′s major vulnerabilities are.
A risk analysis identifies major threats to customers as they relate to the target
city site and the probability of those threats.

The minimum preventive measures are good on-site recovery procedures to help
avoid having routine problems escalate into a disaster. Examples are spare
hardware, regular backups, and a skilled operations staff. In addition, you may

16 SAP R/3 on DB2 for OS/390: Disaster Recovery


have to consider things like building fortification, improved fire protection,
rigorous access control and other operational procedures and corporate policies.
The time and cost involved in implementing and maintaining such measures can
be justified through comparison with the time and cost involved in dealing with
the situation at the time of a disaster.

If, in spite of the preventive measures, a potentially disastrous incident occurs,


you will have to determine if it requires on-site recovery or disaster recovery
procedures, as shown in Figure 8. The situation is not necessarily obvious, so
the decision criteria have to be carefully prepared as part of planning and
implementing a disaster recovery solution.

Figure 8. On-Site Recovery versus Disaster Recovery

2.2 Design Process of a Disaster Recovery Solution


Disaster recovery is a systems management discipline that ensures that
processes and procedures are in place to restore computer operations after the
declaration of a disaster by executive management. These processes and
procedures should be documented and tested, and ready to be implemented
during an extended system outage.

In order to develop a disaster recovery plan, the following steps are needed:

Chapter 2. Disaster Recovery Planning 17


1. Conduct Environment Analysis (EA):
The Environment Analysis is an evaluation of the technical environment and
underlying network infrastructure to understand what resources, processes
and tools are present in the current or planned environment. The EA
addresses, from a technical perspective, how a system would be replaced. It
takes into account the technical operating environment, evaluating the
following key areas:
• Hardware and software configuration
• The production environment
• Backup and recovery procedures
• Backup services for critical data including system, application and
network data
• Off-site storage and vaulting of critical data and documents
2. Conduct Business Impact Analysis (BIA)
The Business Impact Analysis is a process that determines critical business
functions and associated critical resources for the enterprise. This is done
by:
• Identifying all the business functions within a particular business unit
• Identifying all applications that support those functions
• Determining the critical point in the business cycle of each process
• Determining whether the function is critical to the business
• Estimating the potential loss
• Establishing a recovery window
• Ranking the business functions
3. Prioritize and finalize recovery strategy
This step defines a strategy that addresses the overall characteristics and
components of an implementation, such as:
• Personnel requirements and responsibilities
• Managing and operating the recovery site
• Description of the recovery configuration
• Network and site interconnection
• Data backup and recovery processes
• Application recovery
• Estimated recovery time
• Strategy for testing
4. Organize and document the plan
In this step a detailed plan and set of procedures that should be followed at
the time management declares a disaster is developed. The plan should
meet the recovery requirements identified in the EA and BIA to recover the
environment at the selected recovery site. The plan should consist of:
• Disaster definition
• Team responsibility

18 SAP R/3 on DB2 for OS/390: Disaster Recovery


• Contact information
• Critical documentation
• Unique procedures
• Media plan
• Recovery site inventory
• Backup/recovery process
• Implementation plan
• Test plan
• Maintenance plan
• Relocation/migration plan
5. Test recovery plan
The exercise of the disaster recovery plan is to ensure that it will function as
desired in the event of a disaster. All portions of the plan need to be
exercised on a regular basis simulating, as closely as possible, the actual
production environment.
6. Maintain recovery plan
To ensure that the recovery plan is up-to-date, periodic reviews and
revisions are necessary.

This book focuses on disaster recovery technology and therefore covers the
technical aspects of the recovery strategy (step 3). It assumes that the business
requirements and their associated IT requirements have been defined, and it
does not cover testing in detail.

The overall aim is to develop a disaster recovery solution that exactly matches
the requirements of the business. However, it is important to remember that
some compromises may be needed, as requirements may conflict. The four
main factors that need to be traded off in any solution are:
• Type of disasters that need to be covered
• Amount of data that can be lost
• Speed of recovery
• Overall costs
(See Figure 9 on page 20). Throughout the design process, you need to be
aware of these factors and balance them to develop a solution that best meets
the business needs at an acceptable cost. This trade-off triangle is discussed in
more detail in Fire in the Computer Room - What Now?, SG24-4211.

Chapter 2. Disaster Recovery Planning 19


Figure 9. Trade-offs in Disaster Recovery Solution Design

2.3 Scope of the Recovery Solution


The scope of the disaster recovery solution is defined by the ″risk analysis″ and
″business impact analysis″ exercises, and the subsequent analysis to determine
the associated IT requirements. This provides the main input to the solution
design process. The definition of scope should include:
• What types of disaster are included and excluded
• The applications that must be recovered
• The sequence in which applications must be recovered
• The maximum recovery timing for each application
• The data that must be recovered
• The currency of the data once it is recovered

The scope influences all of the other major design elements such as the location
of the recovery site, the ownership of the recovery site, the configuration of the
recovery facilities, and the processes for maintaining or recovering data at the
recovery site.

2.4 Data Backup and Recovery Processes


Possibly the most important step in the design process is the definition of the
processes that will be used to back up, and subsequently recover, all of the
critical data.

The backup and recovery methodology must be married with the business
objectives. The business requirement drives the solution. For example, a
requirement to have service reestablished within three hours dictates different
backup and recovery options from those possible when the application must be
operational within 24 hours. The methods used to back up the data, the way it is
transported and stored off-site, and the techniques for recovering the data, will

20 SAP R/3 on DB2 for OS/390: Disaster Recovery


all place restrictions and requirements on the technical specification of the
remote site.

There are many different processes for backing up and recovering data and
there are several hardware/software products and features that support the
different processes. All processes and products have their relative merits, and
most organizations use a combination of approaches to cover their critical data.
There are certain key factors that influence which is the best option. Among
these factors are:
• Type and amount of data
• Frequency of backups
• Speed of recovery
• Level of currency after recovery

2.4.1 Data Categories


Given predetermined targets for speed of recovery and currency of data, the
other major considerations when designing a data backup and recovery strategy
are placement, usage, and volatility. Critical data can be grouped into three
general categories based on the volume and frequency of change, as shown in
Figure 10 on page 22. The categories are:
• System
• Infrastructure
• Application
This includes both data and software, since programs are also data in the
context of backup and recovery. A more detailed discussion of these categories
can be found in Fire in the Computer Room - What Now?, SG24-4211.

System data consists largely of the system software that is normally purchased
from vendors and tailored. It is typically isolated on specific disk volumes and is
changed only when service is applied or new releases are installed.

Infrastructure data includes subsystem data supporting the applications, stored


in Database Management Systems (DBMS). This type of data is often seen as
system data, but changes more frequently than system data, and therefore it
needs to be treated separately.

Application data includes all of the data that is needed to run the applications. It
is the most volatile, the most valuable, and the most challenging to recreate.
This data is typically spread across numerous volumes; it is common to find data
from multiple applications residing on the same volume, and sometimes also
co-residing with noncritical data. When considering backup and recovery
options, it may be necessary to subdivide this data into “DBMS managed data”
and “non-DBMS managed data.” The speed and currency targets are usually
most stringent for application data, and may also be different for individual
applications. A variety of processes may need to be employed, as frequent
backups will be required to enable fast recovery and minimal data loss.

In an SAP R/3 on DB2 for OS/390 environment, system data consists of the
OS/390, AIX and/or Windows NT environment. The infrastructure data consists of
the DB2 for OS/390 system and the SAP R/3 executables on the application
servers; the application data consists of the tables.

Chapter 2. Disaster Recovery Planning 21


Figure 10. Data Categories

Before designing the backup/recovery processes and selecting the products, it is


essential that the recovery requirements are defined in detail, for each data
category and individual application. It is also important to consider any
relationships between data sets within an application, applications, and even
different data categories.

If there are dependencies, then backup and recovery of this data must be
coordinated, to ensure that data integrity is maintained. In an SAP R/3 on DB2
for OS/390 environment, that means that the backups of all components must be
at the same level; for example, the backup of the application servers must have
the same release as the SAP R/3 data in the SAP R/3 objects stored in the
database.

For DB2 for OS/390, the data contained in tables used by SAP/R3 must also be
the of the same level as the SAP/R3 objects.

2.4.2 Data Backup Options


As discussed in 2.4.1, “Data Categories” on page 21, there are several different
backup options. They provide different levels of recovery capability (speed and
data loss), and have different levels of impact on the primary applications.
Copies must be made of the data, and where loss of data is to be minimized,
copies of the logs also are made. The following sections briefly describe these
options.

2.4.2.1 Point-in-Time Copies


Point-in-time copies back up the data as it exists at a specific instant.
Traditionally, this has meant disallowing updates for the duration of the copy. It
can be accomplished by:
• Stopping the DBMS and performing volume dumps
• Stopping the application
• Performing various types of image copies for DB2 for OS/390 while read
access is allowed to the data:

22 SAP R/3 on DB2 for OS/390: Disaster Recovery


Volume Dumps: There are two basic types of point-in-time copies: logical and
physical.
1. Logical copies back up individual or multiple data sets, irrespective of
placement or the physical device format. Backup can take longer, but
recovery is more flexible; for example, the data can be recovered to a
different device type.
2. Physical copies back up data at a disk volume level and are usually faster
than logical copies. However, data must be recovered to a similar
compatible volume and this method also does not guarantee consistency
within a data set that spans several volumes.

If volume dumps are used, the procedure for point-in-time recovery in an SAP
R/3 on DB2 for OS/390 environment is described in Database Administration
Experiences: SAP R/3 on DB2 for OS/390, SG24-2078. The procedure requires
that DB2 for OS/390 be stopped, so that data is consistent. Information about
stopping DB2 for OS/390 is found in 4.4.1.1, “Establishing a Point of Consistency”
on page 51.

Concurrent Copies: Point-in-time copies can also be created with little impact
to applications. The combination of DFSMS/MVS and the 3990 Storage
Controllers support a concurrent copy function, which can be used for all data. It
can be invoked by specifying the keyword CONCURRENT on the DB2 for OS/390
COPY utility. A very small outage (seconds) occurs while the extents for all the
named table spaces are marked. Then DB2 starts the table spaces for normal
read/write access as the data sets are dumped. Tracking and registration of the
copies occurs through updates to a DB2 system catalog table.

Virtual concurrent copy (or “SnapShot”) routines are also invoked by the
CONCURRENT keyword of the DB2 for OS/390 COPY utility. A very small outage
(seconds) occurs while the data sets are “snapped” to certain work data sets.
Then DB2 for OS/390 starts the data sets for normal read/write access as the
data sets are dumped. Tracking and registration of the copies occur through
updates to a DB2 for OS/390 system catalog table.

Invocation: Concurrent Copy or SnapShot may be invoked natively, that is,


through a DF/DSS batch job, to perform the same functions. The disadvantage is
that much of the DB2 recording performed by the COPY utility is the user
responsibility, that is:
• Ensuring the DFDSS copies physically finish
• Copying the SnapShot data to tape
• Locating the tapes for recovery and restoring them to DASD when they are
required.
In addition, because DB2 for OS/390 has no record of the copies, there is
opportunity for manual errors to occur.

2.4.2.2 Backing Up with No Application Impact


A third form of copy can be made where there is no impact on the application.
Since the COPY utility operates while DB2 is active, updates can occur at the
same time a copy is made if this COPY operand is used:
• SHRLEVEL CHANGE
The image copy is “fuzzy” and logs must be applied to bring the table spaces to
consistency, both with respect to each table space and with other objects in the
DBMS.

Chapter 2. Disaster Recovery Planning 23


2.4.2.3 Incremental Copies
Incremental copies back up only the changed parts of the data and only work in
conjunction with an earlier full copy of the data. Again, options are available for
both DBMS data and non-DBMS data. Incremental copies are fast if only a small
portion of the data has changed, but where significant portions of the data have
changed, incremental copies may take longer than a full copy. Merging the
incremental copies with prior full copies involves more operational procedures
than full copies do.

2.4.2.4 Copying DBMS Log Data


In order to minimize data loss, for DBMS data a copy of the log data can be used
to forward recover from a base point-in-time copy of the data. Log data is
continuously written to active logs. When the log is filled, a switch occurs to the
next active log. An offload process is initiated to produce the DB2 archive log
(normally two copies for safety) while the data remains available on the active
log. The archive log can be copied and taken off site where it can be used for
disaster recovery.

The minimum amount of data loss that could occur is that which resides on the
last active log at the local site.

RRDF: For DB2 for OS/390 it is possible to maintain a real-time copy of log data
at a remote site. As log blocks are written at the prime site, they are also
transmitted to the recovery site and vaulted. The data loss for DBMS data can
be almost eliminated. One product that performs this function is Remote
Recovery Data Facility (RRDF), distributed by E-Net Software. Once at the
recovery site, the log records are normally stored as RRDF archives until they
are needed, when they are converted to DB2 archive logs. More information
about RRDF may be obtained at:

http://www.ubiquity.com.au/content/production/suppliers/enet/rrdf01.htm

2.4.2.5 Data Mirroring


It is also possible to create and maintain disk volume-level mirror copies of data
and logs at a remote site. This option is described under “disk-to-disk remote
copy” in 2.4.3, “Data Transport and Secure Storage.”

2.4.3 Data Transport and Secure Storage


Taking backups of data for disaster recovery is of no use unless the backup data
is transported off-site and stored securely. The data can be copied to tape
cartridges for manual transport or transmitted electronically, although electronic
transmission is relatively expensive for large data volumes. With either method,
the data could be transported directly to the recovery site, or to a third secure
remote site, which is commonly referred to as a “data vault.”

The most appropriate option will depend on the required recovery speed, data
currency and the cost. With all electronic transfer options, there are
performance and distance considerations.

There are three basic options for electronic data transfer:


• Processor-to-processor connection
This requires a network connection. Software is needed to manage the
transmission, reception, and storage of the data.

24 SAP R/3 on DB2 for OS/390: Disaster Recovery


This type of connection is rarely used as a complete backup solution;
obviously an entire database could not be stored in a processor. However,
the logical capability of a processor-to-processor connection can be powerful
in combination with disk hardware and DB2 for OS/390 features.
• Processor-to-remote device
Backup data is written directly to tape or disk devices, which are located
remotely and are channel attached to the primary processor. Enterprise
System Connection (ESCON) channels, channel extender devices, or both
can be used depending on the distance to the remote site. The technologies
usually impose certain distance or performance limitations.
A remote device can be connected to a central-site processor using ESCON
or Fiber Connection (FICON) technology. With either, as the distance
between the sites increases, the data rate achievable decreases.
Using ESCON the reduction in aggregate data rate is proportional to distance
up to a distance of 9 km. At that distance, much more significant data rate
decrease occurs (known as “droop”). Connections may exist up to a
distance of 43 km.
Using FICON, the reduction in aggregate data rate is proportional to distance
with no “droop” up to the maximum distance of 100 km. For more
understanding of FICON planning considerations, see FICON Planning Guide,
SG24-5445; a redbook planned for availability in 1999, but not generally
available when we wrote this redbook.
• Disk-to-disk remote copy
The 3990 Model 006 Storage Controller and the 2105 Enterprise Storage
Server provide functions for creating and maintaining copies of entire
“primary” disk volumes on corresponding remote “secondary” disk volumes.
There are special remote copy connections between the primary and
secondary 3990s, which enable all updates to data on primary volumes to be
sent to the remote site and mirrored on the secondary volumes.
There are two implementations of remote copy:
• Peer-to-Peer Remote Copy (PPRC), is a synchronous product; that is, I/O at
the central site processor is not complete until both the primary and
secondary DASD operations are complete. PPRC is based on disk-to-disk
connections. Note that the ESCON connections used for PPRC are dedicated;
they cannot be used for other processor-to-DASD I/O functions.
• Extended Remote Copy (XRC), is an asynchronous product; with XRC the
processor at the central site does not wait for the completion of remote I/O.
XRC does not use a direct processor-to-processor connection, but since the
processor at the secondary site is connected to the DASD at the primary
site, the features of XRC are provided by software (System Data Mover -
SDM) running on the romote processor. XRC is based on
processor-to-remote device connections. Note that the ESCON or FICON
connections used for XRC are not dedicated; they can be used for other
processor-to-DASD I/O functions.

Information about PPRC and XRC can be found in Remote Copy Administrator′ s
Guide and Reference, SC35-0169. This redbook also contains considerations for
choosing between PPRC and XRC. Planning considerations for both PPRC and
XRC can be found in Planning for IBM Remote Copy, SG24-2595. PPRC and XRC
operations with the 2105 Enterprise Storage Server are explained in
Implementing the Enterprise Storage Server in Your Environment, SG24-5420.

Chapter 2. Disaster Recovery Planning 25


2.4.4 Readiness of the Recovery Site
In addition to transporting and securing backup data off-site, there are additional
steps that can be taken to reduce the probability of data loss and to reduce the
recovery time. This is sometimes referred to as “improving the readiness.”
Different options can be selected for individual applications. The levels of
readiness for DB2 for OS/390 can be categorized as follows:
• None
No provision is made for disaster recovery.
• Periodic backup
Point-in-time copies of backup data are securely stored off-site.
This solution can be a location where there is a tape vault and where
equipment can be made ready for recovery of the local OS/390 environment,
including DB2 for OS/390. The DASD volumes are restored from tape and no
special DB2 procedures are required to restart the environment. This type of
solution can involve data loss of one day to a week. Backups are normally
taken during an outage at the local site.
• Ready-to-roll-forward
In addition to periodic backups, archive logs are also stored off-site and
recovery will be to the last log data set received.
This solution can be a location where there is a tape vault and where
equipment can be made ready for recovery of the local OS/390 environment.
Compared to the previous environment, loss of data is normally up to one
day. The backups can be taken without an outage to the users. Procedures
for recovery are more complex.
• Ready-to-roll-forward with RRDF
This solution is essentially the same as the previous item. The exception is
that there is no data loss. In addition the RRDF archives are normally not
converted to DB2 archives until they are needed. DB2 recovery procedures
are very similar to those of ready-to-roll-forward. The leading disaster
recovery vendors are likely to perform this service.
• Roll-forward
A shadow copy of the data is maintained on disk at a recovery site by
periodically applying the copy logs.
For DB2 for OS/390 this solution is called a DB2 tracker site. It has at least
one OS/390 system on which an exact mirror of the local site DB2 is
activated when it is desired to apply archive logs. Data loss can be less
than 24 hours, depending on how many tracker cycles are run per day. DB2
recovery procedures are very similar to those of ready-to-roll-forward.
• Realtime-roll-forward
Similar to roll-forward, except that updates are transmitted electronically and
applied in real time, but asynchronously.
DB2 for OS/390 does not have an inherent ability to perform this task. XRC,
described previously, can provide this capability when all the data and DB2
logs are mirrored. Based on the distance between the two sites and the
bandwidth capacity, asynchronous delay can be a few milliseconds to
several seconds. DB2 procedures for restart are simple.

26 SAP R/3 on DB2 for OS/390: Disaster Recovery


• Realtime-remote-update
The shadow copy of the data is synchronously updated prior to sending the
transaction response or completing a program or task on the primary
system.
For DB2 for OS/390, this environment would be found in a Geographically
Dispersed Parallel Sysplex (GDPS). It is a combination of PPRC-licensed
DASD and controllers and automation provided through contracted services.
The minimum configuration is a Base Sysplex (sysplex timers are required).
DB2 for OS/390 can be a single subsystem or part of a data sharing group.
There need be no data loss. DB2 restart procedures are simple.

If your recovery site is owned by a third party, the agreement with the third-party
supplier must include the level of readiness.

2.5 Managing and Operating the Recovery Site


In order to decide the best way to manage and operate the recovery site, it is
important to consider all activities that are performed by operations and support
personnel. You should consider day-to-day operation, testing exercises, and
actual disaster situations. If live systems are going to be running at the recovery
site for data backup and readiness or due to split workload, then these will need
special consideration.

The standard “operations” activities include tasks such as:


• System startup and shutdown
• System monitoring
• Application control
• Problem tracking, determination, and fixing
• Tape and print handling

Wherever possible, manual operations activities should be automated or, better


still, eliminated. This reduces costs, minimizes the risk of human error, and can
make disaster recovery easier and faster. There are several hardware and
software products that enable automated and remote operations.

System startup and shutdown activities and all system console interactions can
now be largely automated and handled remotely. Automated Tape Libraries can
be used to almost totally eliminate manual tape handling. There will, however,
always be manual activities that need to be performed. You must define what
skills and how many people are needed to manage the recovery site.

When designing and implementing a remote recovery site, there are three basic
options for managing and operating it:
• Operations personnel at both sites
With operations personnel at both sites, recovery is usually easier and
faster, but this will typically need more people and therefore will be more
costly.
• Recovery site unmanned and operated remotely
The recovery site could be remotely operated, on a day-to-day basis, from
the prime site. In the event of a disaster, operations personnel would have
to be transported to the recovery center or a backup operations command
center. The travel time may be on the critical path of the recovery.

Chapter 2. Disaster Recovery Planning 27


Alternatively, the prime site and the recovery site could both be remotely
operated from a third secure operations command center site. Of course,
the operations center could be a single point of failure for both sites and
special backup arrangements have to be considered. Among these special
backup considerations may be redundant communications facilities, ability to
operate independently of the command center, and remote startup facilities
for applications servers.
• No operations at the recovery site
If no workload is running at the recovery site on a day-to-day basis, then
operations personnel will only be required in the event of a disaster and for
testing. In this scenario, they would still need to travel to the recovery
center and operate locally, as described above.
This is the general approach if the recovery site is owned by a supplier and
is shared with other companies.

2.6 Description of the Recovery Configuration


A key element of the design is the physical recovery configuration. This
encompasses the recovery site, the actual facilities that it houses, any
interconnections between the primary and recovery sites, and the way in which
the network will be connected or switched to the recovery site.

2.6.1 Distance between Prime Site and Recovery Site


The location of the recovery site is a significant decision that affects many other
aspects of the overall solution. If an existing company-owned site is going to be
used, then the recovery solution may need to be tailored to fit certain limitations.
If, however, a new site can be selected (a new company-owned site or a supplier
site), then the location will be dependent on a number of factors, such as:
• Maximum hardware connectivity distance
• Transport time for personnel
• Usage of the recovery site for Q/A or test activities (in times when recovery
operation is not being used)
• Site security features available

As the distance between the prime site and the recovery site increases, so does
the protection against environmental disasters.

If, however, very fast recovery or minimal data loss or both are required, then
the distance between the prime site and the recovery site may need to be short.
There are technical distance limitations associated with some data backup and
recovery methodologies that support advanced data readiness solutions. The
limits frequently change as new hardware solutions become available. At the
time this book was published, the PPRC limitation was 43 km with ESCON
connections, 100 km with FICON connections. For more information on PPRC
consult Planning for IBM Remote Copy, SG24-2595.

The cost of bandwidth also increases significantly as distance increases. This is


a major consideration when high-capacity connections are required for
transmitting large volumes of data.

If the disaster recovery plan involves having personnel working at the recovery
site, then there are special considerations. First, the time taken to assemble and

28 SAP R/3 on DB2 for OS/390: Disaster Recovery


transport the initial recovery teams can significantly affect the recovery speed.
Second, long-term relocation of personnel can be a problem if significant
distances are involved.

If the business depends on physical interaction between the central IT site and
other locations, this also is a recovery consideration. One such interaction is
print output distribution.

2.6.2 Configuration Overview


Based on the IT requirements of the critical business processes, a suitable
technical configuration must be designed to deliver the required level of service
for critical applications. If the recovery site is company-owned, you should also
consider splitting workload across the sites. This is discussed in more detail in
2.6.4, “Workload Distribution across Two Sites” on page 30. It also needs to be
considered when designing the recovery configuration.

Sufficient processing power, storage, and disk capacity must be provided, but the
design must also cover all other hardware and software components that are
required to run the critical applications. Special consideration must be given to
dependencies on specific device types and features, microprogram levels,
software levels, and so on. The same basic approaches that are used for
planning, designing, and managing the prime site configuration should be used
for the recovery configuration. In particular, allowance should be made for
future growth of the business applications when planning the capacity and
performance aspects of the configuration.

As SAP R/3 on DB2 for OS/390 is a multi-tier configuration, the capacity planning
for the backup hardware must be done for all components. The database server
on S/390 is the central part of the installation and therefore must be planned
carefully, whereas the application servers are more or less replaceable. This
means that you can have fewer application servers in your backup site than in
your production site. Moreover, you can have application servers on different
platforms or you can install new application servers in the case of a disaster.
Performance is an extremely important consideration when planning the backup
configuration. For more information about hardware configurations see 3.1,
“Hardware” on page 35.

As well as defining what resources are required, you must also define when the
resources are actually needed, with consideration to the sequence of events in
the recovery process. The most critical applications may start running alongside
the restore processes for the less critical data, and this may actually generate a
temporary peak capacity requirement. Some applications may not be recovered
and fully operational for several days or even weeks, and certain resources
could be provided later.

2.6.3 Network Connectivity


There are two aspects to network design for disaster recovery. The first is the
need to connect or switch the production network into the recovery site, in the
event of a disaster or for testing. The second is the possible network connection
between the prime site and the recovery site, to allow data transfer. Although
these are logically separate, depending on the requirements, they may be
satisfied by a single network solution.

Chapter 2. Disaster Recovery Planning 29


In an SAP R/3 on DB2 for OS/390 environment we must furthermore plan the
connection between database and application server in the disaster recovery
site. As explained in 1.6, “Network Connectivity” on page 11, the three types of
connection hardware supported are ESCON channel, FDDI LAN and Fast Ethernet
LAN. It is possible to have a different connection in the backup site but you
should consider complexity and restore time when choosing the hardware
connection for the backup site. See 3.3, “Communications” on page 41 for more
information about communication configuration.

It is possible to design a network infrastructure with full resilience and no single


point of failure. In the event of a disaster, network access can be immediately
routed to the remote site. This usually involves duplication of many resources
and therefore has an associated cost. A lower-cost option is to use limited
duplicate hardware and a dial-up or switchable circuit bandwidth option. In the
event of a disaster, there is a time delay involved in switching or reconfiguring
the network to the remote site. For most disaster recovery solutions (apart from
the very advanced data readiness options), the network setup delay would not
affect the overall critical path.

When designing for ongoing data transfer between the two sites on a day-to-day
basis, you must consider the bandwidth requirements and profiles of the types of
data that will be transmitted. Requirements can be diverse, such as remote
operations messaging, and extended channel connections for remote backup
devices.

2.6.4 Workload Distribution across Two Sites


If your company owns a second site to be used for disaster recovery, you will
also need to decide how best to allocate all of your workload across the two
sites. The issues involved in this decision are beyond the scope of this book, but
the rest of this section describes it in general terms.

The simplest option is to run the entire workload at the prime site and ensure
that there is sufficient resource ready at the recovery site. In many cases,
however, a business cannot justify an idle recovery site dedicated to disaster
recovery.

Another option is to separate the noncritical workload and run it at the recovery
site. In the event of a disaster at the prime site, the noncritical workload is
displaced and the critical workload is recovered using the resources at the
recovery site. Usually, all workload eventually becomes critical as an outage
continues, and therefore some provision for recovery of the noncritical workload
must also be made.

If it is not possible to define critical and noncritical workload at least test and
quality-assurance systems can be run at the recovery site. These systems are
usually made inactive in disaster situations. Note that this possibility provides a
convenient method of practicing for disaster; as a part of normal test and quality
assurance functions, the resources to be used in disaster recovery are
exercised.

A third option is to split the workload, so that there is a mixture of critical


workload, noncritical workload, and (optionally) spare disaster recovery capacity
at both sites. In practice, this can be a very complex option. It can be difficult to
get a suitable match of workloads, such that recovery from any disaster scenario
can be easily achieved.

30 SAP R/3 on DB2 for OS/390: Disaster Recovery


When considering a split of workload, give special consideration to application
interactions. Also give careful consideration to connectivity requirements.
Weight your solution in favor of simplicity. Recovering from a real disaster has
the greatest chance of success when the least amount of complexity exists.

Chapter 2. Disaster Recovery Planning 31


32 SAP R/3 on DB2 for OS/390: Disaster Recovery
Chapter 3. Hardware, Software and Communications Configurations

This chapter describes which possibilities you have to substitute the components
of an SAP R/3 on DB2 for OS/390 environment in the case of a disaster. There
are several factors that must be considered when planning the disaster recovery
configuration for your environment. These factors are:
• Technical feasibility
When planning the disaster recovery configuration you must be aware of
technical constraints. As an example, consider the connection from an
application server to the database server: it is perfectly possible to have a
LAN such as normal Ethernet or token-ring to be able to communicate, but
the speed and load requirements of SAP R/3 would rule out the usage of
those connections. The recovery site must be able to function for critical
applications at reasonable speeds and with minimal differences from the
central site. Otherwise, there is a risk that users will hold transactions or
use manual procedures; the recovery site becomes useless.
The technical feasibility in an SAP R/3 on DB2 for OS/390 environment is
discussed in this chapter. While the user of SAP/R3 may choose to declare
certain business functions noncritical, it is not possible to separate SAP/R3
data for the applications. Therefore, in this redbook we will assume that all
data from the DB2 for OS/390 database server must be recovered. This
means that the recovery site needs to have essentially the same capacity as
is maintained at the local site.
• Performance needed in the case of a disaster
This factor takes into account that many companies can do their business
with less capacity in the first period of a post-disaster scenario. For
example, they can define certain key users that must have access to the
systems within a defined time (such as 24 hours) while the rest of the users
must regain access after a longer period (such as five days or even after the
whole environment is recovered to the original place). It is essential that the
key users are able to perform the critical business of the company. This
chapter also discusses the role of performance when planning the disaster
recovery configuration (for example, using a smaller S/390 system or fewer
application servers).
• Recovery time
The requirements for recovery time for an SAP R/3 on DB2 for OS/390
environment can range from minutes to several days. This means that some
companies might need a high availability solution spread over two sites
using mirrored DASD (see High Availability Considerations: SAP R/3 on DB2
for OS/390, SG24-2003), whereas others can recover their environment from
tape on dedicated backup hardware (for example from a vendor) in the case
of a disaster. In this redbook we focus on recovery of the DB2 database
server in a ready-to-roll-forward environment, using tape archives and image
copies. In our experience, this solution is the most commonly employed.
Backup is achieved without disruption and there is no requirement for
bandwidth or hardware on a daily basis at the recovery site. Hardware costs
are moderate compared to those that use mirrored DASD. Transport is
assumed to be manual. In 5.4, “Advanced Disaster Recovery Planning” on
page 85, we give some hints regarding how the recovery time can be
reduced by adding more resources or changing procedures.

 Copyright IBM Corp. 1999 33


The environments described in 2.4.4, “Readiness of the Recovery Site” on
page 26 are listed in order of fastest to slowest recovery:
1. Realtime-remote-update: DASD mirroring using PPRC and GDPS
2. Realtime-roll-forward: DASD mirroring using XRC
3. Periodic backup: backup copy on tape, no archived log
4. Roll-forward: tracker site - apply archives to shadowed data
5. Ready-to-roll-forward: backup copy on tape and archive logs off-site
6. Ready-to-roll-forward: backup copy on tape and RRDF archives
Readers may argue for a different order; nevertheless, we believe a
reasonable case can be made for the one shown.
Rationale: Use of a tracker site may not provide faster recovery than
ready-to-roll-forward if logs are taken only once a day and/or if heavy
logging is the rule. On the other hand, if DB2 UDB for OS/390 V6 is used, it
may be considerably faster, due to two features:
1. The ability to perform log-only recovery for indexes
2. The parallelization of the log apply phase of recovery
The RRDF solutions will take longer to recover because it will probably pass
more log data than the item preceding it in the list. Ready-to-roll-forward
scenarios generally minimize the log data passed by forcing a log switch
immediately after the backup copies have been made.
• Data currency
When a disaster occurs and the disaster recovery plan is invoked, the data
will typically be restored to a level that existed at some time before the
disaster. As a result, there will be data transactions that will not be
recovered at this point. The loss of data can be reduced with techniques
such as electronic vaulting. When using the ready-to-roll-forward approach,
the data loss can be expected to be up to 24 hours.
The environments described in 2.4.4, “Readiness of the Recovery Site” on
page 26 are listed in order of most to least currency:
1. Realtime-remote-update: DASD mirroring using PPRC and GDPS
2. Realtime-roll-forward: DASD mirroring using XRC
3. Ready-to-roll-forward: backup copy on tape and RRDF archives
4. Roll-forward: tracker site - apply archives to shadowed data
5. Ready-to-roll-forward: backup copy on tape and archive logs off-site
6. Periodic backup: backup copy on tape, no archived log
Readers may argue for a different order; nevertheless, we believe a
reasonable case can be made for the one shown.
Rationale: The tracker site will lose less data if logs are transported multiple
times a day. If electronic vaulting of archives is performed, environments of
item 4 and item 5 will have the same currency.
• Costs
It is obvious that the costs relate directly to the performance you need and
the recovery time in a post-disaster scenario. It is the objective of the
business impact analysis to determine the break-even point between these
factors. Costs will only be discussed in general in this chapter, since they
vary widely from environment to environment.
The environments described in 2.4.4, “Readiness of the Recovery Site” on
page 26 are listed here in order of most to least cost:

34 SAP R/3 on DB2 for OS/390: Disaster Recovery


1. Realtime-remote-update: DASD mirroring using PPRC and GDPS
2. Realtime-roll-forward: DASD mirroring using XRC
3. Ready-to-roll-forward: backup copy on tape and RRDF archives
4. Roll-forward: Tracker site - apply archives to shadowed data
5. Ready-to-roll-forward: backup copy on tape and archive logs off-site
6. Periodic backup: backup copy on tape, no archived log
Readers may argue for a different order; nevertheless, we believe a
reasonable case can be made for the one shown.
Rationale:: The RRDF solutions require network bandwidth for shipping log
records to a second site and licensing fees for the product, but the data can
be vaulted. The tracker site requires an active OS/390 and DB2 for OS/390
at the recovery site, but has no intrinsic need for network connections.

3.1 Hardware
In this section we discuss the hardware configuration that is needed at the
recovery site to recover an SAP R/3 on DB2 for OS/390 environment. The
configuration described here will support recovery from tape backups and
archive logs for the DB2 for OS/390 database server. This means the recovery
time might range to 24 hours depending on the availability of the recovery site
and the amount of data that must be restored.

If your business needs faster recovery even in the case of a disaster and
ready-to-roll-forward (recovery from tape) is no option for you, please refer to
High Availability Considerations: SAP R/3 on DB2 for OS/390, SG24-2003.

3.1.1 Database Server on S/390


DB2 is a self-defining subsystem; that is, all objects, definitions, authorizations,
plans, and packages are defined within its structures (called the DB2 Catalog
and Directory). This means that the local DB2 must be recreated at the recovery
site. It is not possible for you to recover one DB2′s data using another.

The premise of DB2 for OS/390 disaster recovery is that the local environment is
recreated at the recovery site. This means that the local OS/390 and all
subsystems that are used at the local site are recovered in the event of disaster.
This can include such subsystems as DFSMS, Security Server, integrated catalog
facility (ICF) catalogs, tape management subsystem, and ancillary subsystems.
Your OS/390 capacity planning personnel normally take care of calculating the
capacity and arranging the configuration at the recovery site to support those
business-critical applications that you need to run at the recovery site.

For SAP R/3 on DB2 for OS/390 we assume that only SAP/R3 data is stored on
the affected DB2 for OS/390. From the standpoint of the DB2 for OS/390
subsystem(s) that are the database servers for SAP/R3, all data will be
recovered and used in a post-disaster environment. Therefore, the DASD used
by DB2 should have similar capacity at both sites. The implementation of
DFSMS may allow greater flexibility in DASD devices. Tape drives used for
archives must be able to read the archive logs created at the local site. Your
hardware staff will know which devices are necessary.

If your local processor supports hardware compression and the equipment at the
recovery site does not, the compression is simulated by OS/390 software at the

Chapter 3. Hardware, Software and Communications Configurations 35


recovery site. Performance will suffer as a result, but the recoveries can be
performed.

Up to now, we have discussed DB2 for OS/390 in general, and have not
considered more than one subsystem. From now on we will divide our DB2 for
OS/390 recovery discussion into two environments: data sharing and non-data
sharing. Each procedure will be described under its own topic; if you do not use
DB2 data sharing, then you can skip topics that address it.

3.1.1.1 DB2 Data Sharing


If your enterprise supports a Parallel Sysplex with DB2 data sharing, your
OS/390 staff will recreate the Parallel Sysplex environment for you. The exact
configuration may not be the same at the recovery site. You need a separate
Coupling Facility and Sysplex timer unless one of the following is true:
• You use an Integrated Coupling Migration Facility (ICMF) LPAR serving as a
Coupling Facility (CF). It executes the Coupling Facility Control Code (CFCC)
but simulates the coupling links which are used to connect the OS/390 LPAR
to the CF. The OS/390 subsystem on which all DB2s are started serves the
same function as the Sysplex timer.
• Your vendor runs VM/ESA 2.3. OS/390 can run as a VM guest. Links are
simulated similarly to ICMF.
For various reasons ICMF would suffice for testing but would not perform well in
an actual disaster.

There is no option to disable DB2 data sharing for disaster recovery. All DB2
members must be started and must complete restart in order to release all
retained locks. Then all but one member can be stopped and recovery can
complete using the same procedures as are used in a non-data sharing
environment, described in 5.3, “Remote Site Recovery from Disaster at a Local
Site” on page 64.

The Coupling Facility Resource Management policy (known as CFRM) must be


different at the recovery site, since the CF serial number changes. While even a
one member data sharing group always uses the SCA and LOCK structures, it
does not use the Group Buffer Pool (GBP) structures.
Note: During restart of the group, the GBPs are allocated, but will not be used.
Their sizes can be minimal. You still require the same amount of CF storage for
the SCA and LOCK structures, but this is usually less than 100 MB.

Shared DASD is required for the following items of all DB2 for OS/390 members:
• Active logs
• Bootstrap data sets
• DB2 data
• DASD archive logs
Tape drives used for image copies and archives should be accessible to all
members, as well.

Processors used for OS/390 support very flexible configurations. If sufficient


capacity exists at the recovery site, all DB2s may be started under one OS/390.

Since we are considering business resumption of the SAP/R3 database server,


the same capacity is also needed following recovery of the data of the data
sharing group. This means the same amount of DB2 CF storage, processor

36 SAP R/3 on DB2 for OS/390: Disaster Recovery


capacity, and the same number of DB2 members. If some of the SAP/R3
business function is not to be implemented initially at the recovery site, (that is,
Financials, but not Human Resources), then you might only need to have one
DB2 member active.

3.1.2 Application Servers


Planning the disaster recovery configuration for the application servers means
determining the following factors:
• Number of application servers
• Configuration of application servers
• Architecture of application servers

The number of application servers in your post-disaster environment depends on


the number of application servers you have in your production environment and
the performance needed after the disaster. If your disaster plan is to run fewer
applications or with degraded performance, you can reduce the number of dialog
and update servers.

Since the central instance is essential for the SAP R/3 on DB2 for OS/390
environment, it must be treated with special focus. That means that there must
be a detailed disaster recovery plan, including the description of backup
hardware and backup/recovery procedures specifically for the central instance.
The dialog instances can be treated in the same way as the central instance or
they can be reinstalled in the case of a disaster. For more information about
backup and recovery procedures of application servers see Chapter 4,
“Backup/Recovery Considerations in Disaster Recovery” on page 45.

Generally the configuration of the backup application servers should be similar to


the production servers concerning memory, number of CPUs, internal and
external disks and communication adapters.

In accordance with performance considerations, the memory and the number of


CPUs can differ. This means it can be both more or less. As the workload
between application servers can easily be balanced the overall objective is to
have adequate performance. This makes the planning for the application
servers very flexible. For example, you can have more application server
machines, each with less memory and a reduced number of CPUs, to provide the
same performance as in your normal operation.

The final configuration depends on several circumstances, for example whose


hardware you use for recovery (your own or from a vendor) and when the
servers are obtained (availability and costs of certain features).

The disk space needed is determined by the production servers and can only be
reduced for the backup servers when you mirror the data. Normally, data is not
mirrored in a post-disaster environment.

When planning the backup site′s network infrastructure, normally the same
architecture is used as in the prime site. If the architecture is different at the
backup site (for example, because an existing site with existing network
infrastructure is used) you may need different communication adapters in your
backup servers.

Chapter 3. Hardware, Software and Communications Configurations 37


Of course the communication adapter must support the physical connection
between the application server and database server, for example ESCON
channel, FDDI LAN, or GbE LAN.

The normal disaster recovery plan for application servers is to take backups
from the servers and to restore them on the same architecture. Generally it is
possible to choose a different architecture for your recovery site from the one
you have in normal operation. This means you can decide to have your
application servers running on stand-alone RS/6000 machines instead of on
RS/6000 SP, or you can choose Windows NT instead of AIX.

It is not possible to create a backup from a RS/6000 machine and restore it on a


PC Server running with Windows NT or vice versa. But, in case of a disaster,
you can install new application servers on another platform, since there is no
vital data on the application servers. To shorten the recovery time you can also
prepare backup tapes for the appropriate platform that are used only in the case
of a disaster. Of course, you must describe the procedure in your disaster
recovery plan, especially when you made special adjustments to the SAP R/3
profiles.

Choosing another architecture for the disaster recovery configuration adds


complexity and should only be considered under special circumstances, such as
to use capacities already in place or offered by a supplier, thereby avoiding the
costs for new machines. If you choose to use another architecture, a detailed
recovery plan and careful testing is imperative.

If you choose to take backups of your RS/6000 machines for disaster recovery,
please keep in mind that at the moment there are two general architectures of
RS/6000: PCI-bus and Micro Channel. Backups made from one architecture are
not easily transferable to the other architecture. Therefore, you should tend to
choose the architecture of your production server for your backup server. This
is no problem in an RS/6000 SP environment; Parallel System Support Programs
(PSSP) control uses the correct drivers for the components.

Special focus is needed when you use ESCON channel since ESCON channel
hardware is only supported by certain application servers.

Backup procedures for application servers are described in 4.2, “Backup of


Application Servers” on page 49, and restore procedures are described in 4.5,
“Recovery of Application Servers” on page 58.

3.1.3 Peripheral Equipment


In most SAP R/3 on DB2 for OS/390 environments, an enterprise requires a
larger configuration than a database server and one or more application servers.
For example, the following peripheral equipment may be needed to support all
business processes:
• Tape drives and libraries
• Printers
• Telephones, fax machines, copy machines
• Archiving systems

Which periphery systems are needed is determined by the environment analysis.


Of course, these systems must be included in the disaster recovery plan.

38 SAP R/3 on DB2 for OS/390: Disaster Recovery


In Chapter 4, “Backup/Recovery Considerations in Disaster Recovery” on
page 45 we describe the procedures to recover an SAP R/3 on DB2 for OS/390
environment from tape. In such an environment, it is essential to have the type
of tape drive that can read the tapes; certain types of tapes can only be read by
certain types of tape drives.

Also the number of tape drives must be considered. If it is possible to restore


data in parallel, adding tape drives, controllers and channels to the disaster
recovery configuration will lead to shorter recovery time but higher costs for the
recovery site.

3.2 Software
All SAP R/3 applications are stored in the database. This means that restoring
the DB2 for OS/390 data is the main part of the disaster recovery procedure.

Besides DB2 for OS/390, there are SAP R/3 executables on the application
servers that need to be available. The backup and recovery procedures are
described in Chapter 4, “Backup/Recovery Considerations in Disaster Recovery”
on page 45. In this section we list the software you need at your recovery site.

3.2.1 OS/390
Since DB2 for OS/390 ready-to-roll-forward disaster recovery means recreating
the local environment at the recovery site, your OS/390 staff must provide all the
software used at the local site on which DB2 for OS/390 depends.

Your DBA staff may use tools to manage data or for performance monitoring for
the DB2 for OS/390 environment. In addition to the specific vendor product
libraries, some of the functions are stored in DB2 tables, so you must make
provision for their copy and restoration.

Consult your software suppliers regarding the requirements for licenses to


enable running at the recovery site. In the case where the recovery site has a
function in normal operation (for example, the recovery site may operate as a
quality assurance or a test facility) you should have licenses for software that
runs there.

3.2.2 AIX and Windows NT


AIX or Windows NT is needed to run the application servers. The normal
disaster recovery plan for application servers is to take backups from the
servers and to restore them on the disaster recovery machines. The backup
includes the operating system environment of AIX or Windows NT. You should
have installation media at your backup site to ensure your ability to make
updates for special peripheral equipment that may not have been in your original
environment.

The installation media are essential when your disaster recovery plan includes
installation of new application servers or if you need to install new application
server operating system software.

Chapter 3. Hardware, Software and Communications Configurations 39


3.2.3 SAP R/3
The backup of the SAP R/3 environment is included in the backups of the
application servers and the database server, but you should have the SAP R/3
installation package including the CD-ROMs and manuals at your disaster
recovery site to be able to install new application and presentation servers.

To be able to work with the R/3 system, a license is required. The installation of
the license is performed from the central instance. To install it, a license key
that depends on a system-specific keycode, must be obtained from SAP AG. The
license key itself is stored in the database.

Because the license depends on the hardware of the central instance you need a
new license on the backup machine. The license for the backup machine can be
installed in two ways:
1. You can install a temporary license key after recovery; this allows you to
work with the R/3 system for four weeks. The temporary license is
independent of the hardware and can be used during tests as well as in a
post-disaster scenario. If, in the case of a disaster, you need to run your
SAP R/3 system longer than four weeks on the backup machine, you have
enough time to get the license key from SAP AG; include getting this key as
an item in your disaster recovery plan.
2. If you recover on dedicated hardware in the case of a disaster you can
install a stationary license key. It is possible to install several licenses (for
different host candidates running the message server). The R/3 system will
search for the current license. The license key must be updated for new
releases.

For more information about license keys see R/3 Installation on UNIX DB2 for
OS/390, Material Number 51002659.

3.2.4 Complementary Software


In most SAP R/3 on DB2 for OS/390 environments, complementary software is
used to perform the following tasks:
• Systems management
• Performance management
• Backup and recovery
• Printing
• Archiving
When planning your disaster recovery configuration you must consider which of
these products you need in a post-disaster situation. There might be products
you do not need, others might be needed after a defined period, but others might
be essential.

Backup and recovery tools are normally needed to recover your data at the
recovery site, so they must be included in your disaster recovery plan.

In some environments archiving tools are only needed once a month, but in
other environments archiving is done continuously and without the archiving you
might run out of disk space. In the latter case you should also include the
archiving tools in your disaster recovery plan.

40 SAP R/3 on DB2 for OS/390: Disaster Recovery


These examples show that the planning for subsidiary software products must
also be done carefully and cannot be ignored. The processes and their
associated procedures and tools must be analyzed in order to assess which
ones are required to run and support the business applications to the required
service levels.

All of the essential components need to be defined and passed as a requirement


to the solution design phase. The basic rule is that if a component is needed to
run or support a business application at the primary site, then it must be
restored at the recovery site. In practice, most of the components are data, and
therefore it is a matter of making sure that they are backed up and restored
along with all of the other critical data.

3.3 Communications
In a multi-tier environment like SAP R/3 on DB2 for OS/390, communication
between the components is the basis for all operations. Therefore, you must
plan the disaster recovery for communications carefully.

3.3.1 Communication between Database Server and Application Server


The communication between the database server and the application servers is
essential for a running an SAP R/3 system.

The basis for this communication is the physical connection. As described in


1.6.1, “Physical Connection” on page 12, three types of connection are
supported:
• ESCON channel
• FDDI LAN
• Gigabit Ethernet LAN
In a normal disaster recovery plan the same physical connection is used in the
recovery site as at the primary site. This means the machines (S/390, RS/6000
and PC Server) are equipped with the appropriate features and adapters and the
cabling is in place.

Three communication protocols are available for communication between SAP


R/3 work processes on the application server and the database server:
• TCP/IP
• High-Speed UDP
• Enhanced ESCON
As the definition of these protocols are stored in operating system files, they are
restored with the appropriate operating system.

To choose another physical connection or communication protocol for the


disaster recovery configuration adds complexity and should only be considered
under special circumstances, such as using capacities already in place or
offered by a supplier and avoiding the costs for new resources. If you choose to
use another physical connection or communication protocol, a detailed recovery
plan and careful testing is imperative.

Chapter 3. Hardware, Software and Communications Configurations 41


3.3.2 Communication - Application Server to Presentation Servers
Companies using SAP R/3 on DB2 for OS/390 normally have one central data
center where the SAP R/3 system runs. The users can be at the same location
as the data center (local users) or they can be spread geographically (remote
users). The geographic spread may be across national boundaries; it could even
involve a global network. Figure 11 shows the network connections in normal
operation.

Figure 11. Network Connections in Normal Operation

In order to be protected from area disasters, most companies decide to have a


disaster recovery site at a secure distance from the primary site. This means
that in the case of a disaster, the users′ network connection to the SAP R/3
system must be redirected to the disaster recovery site.

The network topology should provide alternate paths between host locations
(primary site and disaster recovery site) and all remote user locations. Note that
local users to the primary site are remote users to the recovery site. In the case
of a disaster, there must be provision for connections of primary site users to the
recovery site, and operating instructions for any different procedures they
perform must be provided. Figure 12 on page 43 shows the network
connections in a post-disaster situation.

42 SAP R/3 on DB2 for OS/390: Disaster Recovery


Figure 12. Network Connections in Post-Disaster Situations

Redundant paths can be configured so that there is no interaction needed or


paths can be switched in the case of a disaster. If path switching is required in
the case of a disaster, it should take place automatically or under central
control.

Of course, redundant paths are more expensive than switching but they offer a
faster recovery time, whereas switching the network is cheaper but takes longer.

As the switching of the network can be done in parallel to the SAP R/3 restore,
you must calculate the duration of restoring your SAP R/3 on DB2 for OS/390
environment and the switching time to decide if switching the network in the
case of a disaster is an option for you. The overall objective is to meet the
required recovery time determined in the business impact analysis.

Chapter 3. Hardware, Software and Communications Configurations 43


44 SAP R/3 on DB2 for OS/390: Disaster Recovery
Chapter 4. Backup/Recovery Considerations in Disaster Recovery

This chapter describes in detail the backup procedures for disaster recovery for
the components of an SAP R/3 on DB2 for OS/390 environment, such as DB2 for
OS/390 and the application servers on the different platforms. The backup that
will be described is restricted to that required for a ready-to-roll forward
environment (tape image copies and archive logs).

As is the case with any software package, backup copies of system and
application software must be available at the recovery site. Backup and
recovery requirements for an SAP R/3 DB2 for OS/390 installation are somewhat
different from traditional software packages in that the SAP R/3 application
programs reside in the SAP R/3 database. Those program modules that are
resident are backed up as a matter of course when the SAP R/3 database is
copied. What remains are files and executables that reside outside of the SAP
R/3 database, and include:
• System software (OS/390, AIX, Windows NT, DB2 for OS/390)
• DB2 for OS/390 Catalog and Directory
• Central Instance/Application Servers
• Transport and Correction System

Important
To provide for effective recovery in the event of failure, any restoration must
guarantee that every component in the SAP R/3 environment is logically
consistent with the others in terms of the content and structure at a given
point in time.

For more information on maintaining backups and planning for recovery, please
refer to the following publications:
• BC SAP Database Administration Guide: DB2 for OS/390, Material Number
51001015
• DB2 UDB for OS/390 V6 Administration Guide, SC26-9003
• DB2 UDB for OS/390 V6 Utility Guide and Reference, SC26-9015
• R/3 Installation on UNIX DB2 for OS/390, Material Number 51002659
• Database Administration Experiences: SAP R/3 on DB2 for OS/390, SG24-2078

4.1 Utilities of the DB2 for OS/390 Environment


DB2 for OS/390 provides a number of utilities that will help you maintain the
integrity, accuracy, and usability of your SAP R/3 system if a disaster occurs:
QUIESCE Establishes a point of consistency or quiesce point for
one or more tablespace(s) and/or partition(s), and
records it in the DB2 for OS/390 catalog. Use to
establish a recovery point for point-in-time recovery.
(It is not possible to use QUIESCE for this purpose
with SAP R/3 4.5A because it is impossible to avoid
the situation where SAP R/3 transactions fail with

 Copyright IBM Corp. 1999 45


timeouts during QUIESCE, see 4.4.1.1, “Establishing a
Point of Consistency” on page 51 under the QUIESCE
utility). The quiesce point is the current log RBA, or if
you are running in a data sharing environment the
current Log Record Sequence Number (LRSN).
RECOVER TABLESPACE Recovers data to currency or to a prior point in time.
RECOVER INDEX (DB2 UDB for OS/390 Version 6 and above). Recovers
indexes using image copies to currency or to a prior
point in time.
REBUILD INDEX Rebuilds indexes from the referenced table.
COPY Generates either a full or incremental copy of a
tablespace or data set. A COPY done with the FULL
YES option results in a complete copy; a COPY done
with the FULL NO option results in a copy of only
those pages that have changed since the last COPY
was done.
The COPY utility is run with either SHRLEVEL CHANGE
or SHRLEVEL REFERENCE. SHRLEVEL CHANGE
allows read and update access to the tablespace
being copied, and bears functional equivalence to an
SAP R/3 DB2 for AIX online image copy. SHRLEVEL
REFERENCE prevents update access to the tablespace
being copied, and bears functional equivalence to an
SAP R/3 DB2 for AIX offline image copy.
Note: One difference in the latter is that an SAP R/3
DB2 for AIX offline image copy is taken after the SAP
R/3 system is shut down; in the SAP R/3 OS/390
implementation, DB2 for OS/390 must be running,
although the SAP R/3 application servers may or may
not be shut down.
(DB2 UDB for OS/390 Version 6 and above). COPY
also produces an image copy of an index.

These utilities are more fully described in DB2 UDB for OS/390 V6 Utility Guide
and Reference, SC26-9015.

4.1.1 The COPY Utility


In the DB2 for OS/390 environment, the COPY utility operates at the tablespace,
partition, or data set level. When you use COPY for backup, DB2 for OS/390
maintains a record of COPY activity in the DB2 for OS/390 catalog, and
coordinates restoration when the RECOVER utility is used. Multiple image
copies can be created during the same invocation of COPY, one for the local site
and one for the recovery site, which is intended to be taken off-site for disaster
recovery. There are several options available when using the COPY utility:
COPY FULL YES Performs a full image copy. This provides a complete
copy of the entire tablespace, partition, or data set, and
may be used in either a Recover to Currency or a
point-in-time recovery. Depending on circumstances,
additional items may be needed to perform the desired
recovery. These items may include incremental copies
and the log. A complete description is in BC SAP

46 SAP R/3 on DB2 for OS/390: Disaster Recovery


Database Administration Guide: DB2 for OS/390,
51001015.
COPY FULL NO Produces an incremental image copy. An incremental
image copy contains copies of only those pages that
have changed since the last image copy.
A user-selected percentage of pages may optionally be
specified, in which case the image copy will only be
taken if the user-selected threshold is exceeded. Image
copies are used for either a point-in-time recovery or
recover to currency, subject to restrictions listed in DB2
for OS/390 V5 Administration Guide.
Also, the MERGECOPY utility is used to combine the
results of incremental image copies, or to combine the
results of a full image copy with one or more
incremental image copies.
COPY CONCURRENT The concurrent option invokes DFSMS concurrent copy.
Specific hardware and software requirements for
concurrent copy are documented in DB2 for OS/390 V5
Administration Guide.
If your have SnapShot-enabled DASD, the specification
of the CONCURRENT operand will dynamically produce a
virtual concurrent copy. It is documented in
Implementing DFSMSdss SnapShot and Virtual
Concurrent Copy, SG24-5268.
In other words, CONCURRENT will produce either a
concurrent copy or a virtual concurrent copy, depending
on the capability of your hardware.
If you attempt this type of backup without the necessary
hardware and software in place, you may receive a
message similar to the following:
DSNU409I - DSNUBBUM NO HARDWARE SUPPORT FOR TABLESPACE
dbname.tsname
COPY INDEX With DB2 UDB for OS/390 Version 6, you can copy an
index. The copy can be made with either SHRLEVEL.
An incremental image copy, created with FULL NO,
cannot be produced.

A major advantage of the COPY utility is that DB2 for OS/390 records backup
activity in the DB2 for OS/390 catalog, and uses this information in the event that
recovery is required.

4.1.1.1 COPY with Read/Write Access Using SHRLEVEL CHANGE


The COPY utility, when invoked with the SHRLEVEL CHANGE option, will permit
read/write access to the tablespace being copied. This option of the COPY utility
is functionally equivalent to an online image copy in a DB2 for AIX environment.

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 47


Important
When the SHRLEVEL CHANGE option of COPY is used, the COPY can be
“fuzzy” with change activity that can include partial units of recovery. If this
image copy is later required for a recovery operation, log activity will be
applied by DB2 for OS/390 to bring the tablespace to consistency.

You can use a SHRLEVEL CHANGE copy for a valid point-in-time recovery
only if recovering to a QUIESCE point. You can use the COPY SHRLEVEL
CHANGE with the conditional restart technique of prior point-in-time recovery.

4.1.1.2 COPY - No Write Access: SHRLEVEL REFERENCE


When invoked with the SHRLEVEL REFERENCE option, COPY prohibits update
access to the tablespace, but read access is allowed for users. This type of
COPY operation is equivalent to an offline image copy in a DB2 for AIX
environment, with the notable exception that users can continue read access
during COPY execution. In a DB2 for AIX implementation, concurrent query
access is not possible during an offline backup.

Important
When the SHRLEVEL REFERENCE option of COPY is used, the point of
consistency for the tablespace is at the point in time that the COPY begins,
since DB2 for OS/390 will not permit modification to the tablespace for the
duration of the COPY. As with the DB2 for AIX implementation, a tablespace
copied in this manner is logically consistent at the time that the copy began.

Point-in-time recovery in an SAP R/3 environment effectively requires


recovery of all tablespaces in the subsystem to a point of consistency.
SHRLEVEL REFERENCE copies are not appropriate for recovery of multiple
tablespaces to a point in time. Although each copy is consistent (there are
no partially committed units of recovery), the multiple SHRLEVEL REFERENCE
copies may be inconsistent across copies. For example if a unit of recovery
updates Table1 and Table2, and Table1 is copied before the update is
committed while Table2 is copied after the update is committed, the copy of
Table1 is inconsistent with the copy of Table2 when both copies are used for
point-in-time recovery.

4.1.2 OS/390 DFSMSdss (Data Set Dump/Restore)


Using OS/390 DFSMSdss (hereafter referred to as DF/DSS) can also provide a
viable method for point-in-time recovery by restoring an SAP R/3 environment at
the device level. Every device involved in the SAP R/3 system must be copied
and restored.
Note: This should be done when DB2 for OS/390 is down, or when all objects
are stopped. Provision must also be made for system data sets (such as the
system catalog) that could be updated. There is a logical implication that all
system activity (non-SAP as well as SAP R/3) is stopped.

In this redbook, we do not assume this method is used to back up the DB2 for
OS/390 data. It can be used effectively for backing up program libraries.

48 SAP R/3 on DB2 for OS/390: Disaster Recovery


4.2 Backup of Application Servers
There are different disaster recovery procedures for SAP R/3 application servers.
The type of recovery influences the type of backup you choose to implement.

Once set up the application server changes only when the profiles are adjusted
or new releases are installed. For that reason you only need to make a backup
of the application servers after installation and after changes. Most SAP R/3
users decide to take a backup after changes and also frequently each month to
ensure that there is always a backup that is not older than one month.

4.2.1 Backup of Application Servers on RS/6000


To back up an application server running on RS/6000 you need a backup of two
components:
• AIX environment (rootvg)
• SAP R/3 environment (the directories /usr/sap/<SID>, /sapmnt/<SID>
and /usr/sap/trans or the mounts and links to those directories when they
reside on other machines)

The backup from the rootvg can easily be made by using SMIT ( smit mksysb) or
the command mksysb.

The backup from the SAP R/3 environment can be made in several ways. You
can create a tar tape containing the filesystems, you can back up the whole
volume group in which the filesystems reside or you can use the SAP tool
backoffl. Which of these procedures you use depends on your company′ s
strategies.

If you use ADSM to back up the application servers, this needs special attention.
The ADSM server must then be included in the disaster recovery strategy and
must be in place before you restore the SAP R/3 environment.

In many installations the SAP R/3 environment is included in the rootvg. Then
you only need to make the backup from the rootvg.

4.2.2 Backup of Application Servers on RS/6000 SP


Many SAP R/3 on DB2 for OS/390 customers have several application servers
installed on an RS/6000 SP. To back up an RS/6000 SP environment you need a
backup of three components:
• Images of all RS/6000 SP nodes
• SAP R/3 environment on SP nodes
• Control workstation

In most cases the images of the RS/6000 SP nodes reside on external disks of
the control workstation. This means that a complete backup of the control
workstation′s environment is necessary. To achieve this you need to run mksysb
from the control workstation to back up the operating environment. You also
need to run tar to back up all volume groups to have backups of the images.

The SAP R/3 environment can be backed up as described in 4.2.1, “Backup of


Application Servers on RS/6000.” In many installations the SAP R/3 environment

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 49


is included in the rootvg of the nodes. Then the backup of the SAP R/3
environment is included in the mksysb backup of the node images.

If you use ADSM to back up the node images or the SAP R/3 environment, this
needs special attention. The ADSM server must then be included in the disaster
recovery strategy and must be in place before you restore the SP and SAP R/3
environment.

4.2.3 Backup of Application Servers on Windows NT


Windows NT is supplied with a very basic backup/recovery system. However,
most companies that use the product choose to install a backup/recovery system
from a different software company. If you use Windows NT as the operating
system for your application servers, you should investigate backup/recovery
software available.

As with AIX, you should build backups of all logical disks required for your
application server operation.

4.2.4 Transport and Correction System


In a post-disaster scenario the focus is to restore the productive SAP R/3
systems. Therefore, the Transport and Correction System is not needed in most
companies. When you have a need for your development and quality assurance
systems in your specific environment after a disaster, you need to back up the
Transport and Correction System frequently.

When you want to restore your complete SAP R/3 landscape to a specific point
in time you need to back up the Transport and Correction directories that reside
on one of the application servers at the same time as the database. This
ensures that all corrections and transports made up to this time can be found in
the Transport and Correction System and can be processed correctly.

4.3 Recovery Alternatives


Although DB2 for AIX and DB2 for OS/390 both allow recovery, the operation for
SAP R/3 is different.

Database Recovery: A DB2 for AIX system is backed up and recovered at either
the database level or the tablespace level. In contrast, the SAP R/3 database in
a DB2 for OS/390 environment is neither backed up nor recovered at the DB2 for
OS/390 database level. Rather, image copies are taken at the tablespace,
partition, or data set level. Recovery of the DB2 for OS/390 databases that make
up an SAP R/3 database, therefore, is not done at the database level, which is
also in contrast to the DB2 for AIX implementation.

Tablespace Recovery: Both a DB2 for AIX system and a DB2 for OS/390 system
can be backed up and recovered at the tablespace level. DB2 for OS/390
provides an additional level of granularity by allowing backup and recovery to
occur at the partition level. Refer to DB2 for OS/390 V5 Utility Guide and
Reference for details.

50 SAP R/3 on DB2 for OS/390: Disaster Recovery


4.4 DB2 for OS/390 Point-in-time Disaster Recovery
A point-in-time recovery is defined as a logically consistent restoration of a
logically related set of objects to a specific moment of time. For example, if you
are required to do a point-in-time recovery to 9:30 a.m., September 16, your
result should reflect the exact state of every logically related database
component as of that time.

Attention
In R/3, unless you definitely know otherwise, you should assume that there is
only one logically related set of objects: all the R/3 databases and the
associated DB2 subsystem catalog and directory.

Important
The use of ADSM is not an alternative at this point, since ADSM does not
support the backup of DB2 for OS/390 tablespaces.

4.4.1.1 Establishing a Point of Consistency


Point-in-time recovery is done to a point of consistency or quiesce point, and a
quiesce point can be established in one of three ways:
1. Using the ARCHIVE LOG command
2. Using the QUIESCE utility
This utility is not useable with SAP R/3 4.5A; see the discussion that follows.
3. Using the STOP DB2 command
All three options will provide a quiesce point, but there are significant
differences between the options.

ARCHIVE LOG Command


This DB2 for OS/390 command performs the following
functions:
1. Truncates the current active log data sets
2. Starts an asynchronous task to off-load the data sets
3. Archives previous active log data sets not yet archived
4. Returns control to the user (immediately)
ARCHIVE LOG MODE(QUIESCE) Command
If the MODE(QUIESCE) option is used, DB2 for OS/390 will
attempt to establish a system-wide point of consistency by
suspension of all user update activity prior to the off-load
process (that is, when all active update users have
reached a commit point). User update activity is not
allowed to start during the time the command is running.
The TIME parameter allows you to override the default
timeout value by extension (or reduction) of the length of
time that DB2 for OS/390 has to complete the command,
and the WAIT option is used to direct DB2 for OS/390 to wait
for quiesce processing to complete before returning
control to the invoking console.
The ARCHIVE LOG MODE(QUIESCE) command, upon successful
completion, will post the system-wide quiesce point

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 51


(current log RBA) to the bootstrap data set (BSDS). If it
fails to achieve the system-wide quiesce within the time
allotted, it produces no archive log.
QUIESCE Utility This utility is used to establish a quiesce point for one or
more tablespaces.
Because of the internal relationships maintained between
tables in SAP R/3, a point-in-time recovery plan for an SAP
R/3 system must have a single quiesce point for all
tablespaces. However, the QUIESCE utility has a limit of
1165 tablespaces that can be specified in a single
command. As noted earlier, the SAP R/3 database
consists of approximately 7200 tablespaces. Therefore, the
QUIESCE utility cannot be used to establish a single
quiesce point for point-in-time recovery. In DB2 UDB for
OS/390 Version 6 the limitation is removed; this will not
remove the restriction for SAP R/3 since the timeout of
transactions (described in the following) still occurs.
QUIESCE will cause SAP R/3 transactions to fail with
timeout errors. Unlike ARCHIVE LOG QUIESCE, there is
not a way to cause QUIESCE to fail before SAP R/3
transactions.
STOP DB2 Command This DB2 for OS/390 command stops the DB2 for OS/390
subsystem, and has two options:
1. MODE(QUIESCE)
2. MODE(FORCE)
The quiesce point with either option is stored in the
bootstrap data set (BSDS).
The STOP DB2 MODE(QUIESCE) command will stop DB2 for
OS/390 when active units of recovery complete. This will
establish a quiesce point for the DB2 for OS/390
subsystem. STOP DB2 MODE(QUIESCE) is the default.
The STOP DB2 MODE(FORCE) command will also stop DB2 for
OS/390 and establish a quiesce point for the subsystem,
but it will cancel all DB2 for OS/390 threads and roll back
any uncommitted activity.

To decide which method you will use to establish your quiesce point, you must
first evaluate the characteristics and the corresponding options available with
each quiesce point alternative in terms of your recovery objectives. In summary,
each alternative establishes a quiesce point:
• ARCHIVE LOG MODE(QUIESCE) will establish a system-wide quiesce point and
record it in the bootstrap data set where it can be accessed by the PRINT
LOG MAP utility (DSNJU004). You also have control over the command
timeout value, and can thus control the effect that this command has on user
update processing.
• ARCHIVE LOG produces an archive log without consistency. When such a log
is used in disaster recovery, consistency to the last completed unit of work is
achieved during restart at the recovery site. There is no effect on local
users when the archive is produced.

52 SAP R/3 on DB2 for OS/390: Disaster Recovery


• The QUIESCE utility is not a viable alternative due to the limit of 1165
tablespaces in a single command and due to causing SAP transactions to fail
with timeouts. It would be possible to stop the SAP R/3 system and produce
a quiesce point by invoking the utility multiple times. We concluded that this
would not be practical.
• STOP DB2 will establish a quiesce point in the bootstrap data set, and it will
stop the subsystem.
As previously stated, we do not consider this method, since we do not wish
to incur any outage in developing our point of consistency.

Finally, since the process of taking a quiesce point may affect user access for
the duration of the operation, you must decide the best time to take it. You must
also decide which has the higher priority: uninterrupted user access, or the
establishment of the point-in-time recovery quiesce point itself. For flexibility
and to minimize the impact to SAP R/3 users, you should consider the use of the
ARCHIVE LOG command. Additional information may be found in BC SAP Database
Administration Guide: DB2 for OS/390 (51001015).

For disaster recovery, you will predefine the time and consistency point to which
you wish to recover. It is likely based on a business relationship within the
applications (that is, logical end of a business process).

The next section, 4.4.2, “Point-in-Time Recovery Using DB2 Conditional Restart,”
describes an alternative, where an inconsistent point is created without
disruption to local users. It is made a consistent point at the recovery site
through log truncation.

4.4.2 Point-in-Time Recovery Using DB2 Conditional Restart


The scenario requires a DB2 conditional restart, which is a part of DB2 not
frequently exercised by many users. Its key advantage is an “almost free”
establishment of a point of consistency.

More explanation of these procedures can be found in BC SAP Database


Administration Guide: DB2 for OS/390, SAP Material Number 51001015, in the
chapter “Database Management”, topic “Backup and Recovery Options”,
subtopic “Recovery with Conditional Restart”.

You should also be aware that it is important to frequently obtain OSS note
83000 from SAP. This note is updated with recommendations on
backup/recovery and is the master reference on that topic from SAP developers.

At a high level, the scenario may be defined as follows:


1. Identify a set of candidate points of consistency.
2. Select that candidate point of consistency which best meets your
requirements.
3. Make that best candidate point of consistency into a true point of
consistency.
This is the point at which you will do the conditional restart. The conditional
restart will make your candidate point of consistency the true point of
consistency on the DB2 Log.
4. Recover all tablespaces to the true point of consistency.

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 53


The conditional restart will position you to recover all of your tablespaces.
Because of the conditional restart, you will use a RECOVER to currency.
5. Now RECOVER or REBUILD all indexes on all of the tables as the indexes
must be synchronized with the data in the recovered tablespaces.

The first three steps listed are new and will receive the major part of our
attention here. Once those steps are complete, the remainder of this scenario
will be described in 5.3, “Remote Site Recovery from Disaster at a Local Site” on
page 64.

Identify a Set of Candidate Points of Consistency: Consider a list that contains


many items (or rows). Each list item has two entries in columns: the first
column is a timestamp, and the second column is the DB2 Log RBA associated
with that time. The list can be quite long (that is, showing many timestamps).
This list of timestamps is our set of candidate points of consistency.

The list of candidate points of consistency might have an entry for each hour in
the day or for each minute in the day. For each entry in the list, we have a
timestamp and the corresponding DB2 Log RBA. This allows you to map a
specific time to a DB2 Log RBA. For data sharing users it is a Log Record
Sequence Number (LRSN) that serves the same purpose.

How do you build a list of timestamps and the associated Log RBAs? You start
by defining a dummy database and tablespace. This will be a real DB2 database
and tablespace, but there will be no activity against the dummy tablespace. SAP
R/3 will not know about this tablespace.

Once the dummy tablespace is defined, you will initiate a user-developed


procedure that will periodically QUIESCE that dummy tablespace. Since you will
allow no activity against the dummy tablespace, the QUIESCE will be very fast.
The QUIESCE will cause the Log RBA and the timestamp to be entered into
SYSIBM.SYSCOPY. The entries within SYSIBM.SYSCOPY for the dummy
tablespace make up our list of candidate points of consistency. If you do the
QUIESCE each hour, there will be an entry for the dummy tablespace in
SYSIBM.SYSCOPY each hour.

You should be aware that the use of the dummy tablespace is for convenience
only; you can also look for checkpoint records that are stored with a time and
RBA. Most installations will have checkpoint records at 10-15 minute intervals,
so those records should provide usable references that map RBAs to a specific
time.

Select the Candidate Point of Consistency That Best Meets Your Requirements:
You must decide the point at which you want to recover if a disaster occurs.
This part cannot be automated. Suppose you determine that 6:00 PM will
become your system-wide recovery time. You make the determination that you
want to take your system back to that date and time.

You have one more task. Query SYSIBM.SYSCOPY for the dummy tablespace
entry before 6:00 PM (or look for checkpoint records at that time as noted
previously). Once you determine that entry from the list, note the DB2 Log RBA.
This can be provided from the local site by adding it to other recovery
information which will be sent to the recovery site, either in list form or through
creation of a separate data set.

54 SAP R/3 on DB2 for OS/390: Disaster Recovery


Where do you stand now? You have the Log RBA (or LRSN) of the time to which
you wish to recover. You are now ready to make that Log RBA, which relates to
a candidate point of consistency, a true point of consistency.

4.4.2.1 What to Expect at the Recovery Site


Though this process will be covered in more detail in 5.3, “Remote Site
Recovery from Disaster at a Local Site” on page 64, we have chosen to
complete the picture by providing a brief summary.

There is probably data inconsistency at the Log RBA you identified. You are
running an active SAP R/3 system and it is likely that at the time you have
identified, there was work in process (or in-flight units of recovery). However,
you can make that Log RBA a true point of consistency.

By doing a DB2 conditional restart, you can make the Log RBA you identified
into a point of consistency. You will use the CHANGE LOG INVENTORY DB2
utility to create a conditional restart control using the following statement:
CRESTART CREATE,FORWARD=YES,BACKOUT=YES,ENDRBA=XXXX
where XXXX is the true point of consistency you determined from your
SYSIBM.SYSCOPY query. For more information on the CHANGE LOG
INVENTORY DB2 utility, see DB2 UDB for OS/390 V6 Utility Guide and Reference,
SC26-9015.

The conditional restart will cause DB2 to truncate the log at your true point of
consistency. Log entries beyond that point will be disregarded. Additionally,
DB2 will remove from SYSLGRNGX and SYSCOPY any entries that occurred after
the true point of consistency.

Recover All Tablespaces to the True Point of Consistency: After the conditional
restart, this will be a recovery to currency and not a recovery to an RBA
(recovery to an RBA is common in most point-in-time recovery scenarios).

Recover/Rebuild All Indexes on the Tables That Have Been Reset to the Prior
Point of Consistency: The indexes must be made consistent with the data.

Important
Recovery time can be significantly reduced with a procedure that identifies
those tablespaces and indexes that actually changed after the RBA noted.
Recovery of only those tablespaces and indexes is necessary. You should
investigate the possibility of writing a program that performs a DB2 log scan
to find such tablespaces (remembering to recover the tablespaces and all the
indexes that exist on tables in those tablespaces). Some vendors provide
DB2 log analysis programs that have this function.

Execute Transaction SM13 on the SAP R/3 System: After the SAP R/3 Central
Instance is started, execute transaction SM13 to review aborted updates.
Resolve all aborted updates before the SAP R/3 system is opened for productive
use.
Note: Transaction SM13 should be executed as part of your daily activities.

Point-in-time Recovery Summary: The main benefit for a point-in-time recovery


using DB2 conditional restart is that there is effectively no impact on the user to
create the list of candidate points of consistency. The time required to actually

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 55


recover the tablespaces and indexes can be hours, depending on the frequency
of image copy and the amount of DB2 log processing required to bring them to
the point-of-consistency.

Since this scenario contains a conditional restart, anyone using it must first
practice it. An improperly done conditional restart usually results in the failure
of the disaster recovery attempt.

4.4.3 Preparing for Disaster Recovery


In the case of a total loss of a DB2 for OS/390 system, you cannot recover on
another DB2 for OS/390 system at a recovery site. You must recreate the same
DB2 for OS/390 system at the recovery site. To do this, you must regularly back
up the data sets and the log for recovery. As with all data recovery operations,
the objectives of disaster recovery are to lose as little data, workload processing
(updates), and time as possible.

DB2 for OS/390 has an installation option called SITE TYPE, and it is intended for
disaster recovery. The two choices are LOCALSITE and RECOVERYSITE. It is
designed to allow DB2 for OS/390 to call for the relevant image copy at the
correct site without unnecessary operational intervention. We assume the local
site is LOCALSITE and the site where the recovery is performed is
RECOVERYSITE.

Following is a list of essential disaster recovery elements and the steps you
need to take to create them. For ease of use, we assume that all data sets are
cataloged and will be tracked using an ICF catalog.
• Image copies
1. Make copies of all the SAP/R3 tablespaces, any vendor tools which are
used by your DBA group, and the DB2 catalog/directory, preferably in
that order. We assume they are made at least daily to assure adequate
performance at the recovery site.
Use the COPY utility to make copies for the local subsystem and
additional copies for disaster recovery. They can be made with one
invocation of the COPY utility, by specifying DDNAME with COPYDDN to
produce the copy for the local site and RECOVERYDDN option to produce a
copy for the recovery site.
Do not produce the copies by invoking COPY twice.
2. Send the image copies to the recovery site.
3. Record this activity at the recovery site when the image copies are
received.
• Archive logs
1. Make copies of the archive logs for the recovery site.
Use the ARCHIVE LOG command to archive all current DB2 active log data
sets.
There is an exposure if you take COPY 2 of the archive to the recovery
site. If the first copy of an archive becomes unreadable, then the second
copy is requested. DB2 will wait indefinitely until the second copy is
mounted, which can create logistical problems. A secondary problem is
that under certain unusual circumstances, COPY 2 may not be produced,

56 SAP R/3 on DB2 for OS/390: Disaster Recovery


and you would be missing a log range, without which recovery would
terminate. The exposure to this situation should be considered slight.
Nevertheless, some DB2 for OS/390 users do take COPY 2 to the
recovery site, instead of copying the archives, and accept the risk. If you
do, and if you are on DB2 UDB for OS/390 Version 6, you can set another
installation parameter at the recovery site such that DB2 UDB for OS/390
Version 6 calls initially for COPY 2. While it can provide a possible
recovery failure at the remote site, its operational simplicity at both sites
make it attractive.
2. Use the print log map utility to create a report of the bootstrap data set
(BSDS). This is optional, but handy.
3. Send the archive copy, the BSDS report, and any additional information
about the archive log to the recovery site.
4. Determine the point of consistency, unless you plan to recover to the end
of the last archive log that arrives at the recovery site.
There are four techniques to accomplish this. See 4.4.1.1, “Establishing
a Point of Consistency” on page 51 and 4.4.2, “Point-in-Time Recovery
Using DB2 Conditional Restart” on page 53 for a description of these
techniques.
5. Record this activity at the recovery site when the archive copy and the
report are received.
• Integrated catalog facility (ICF) catalog backups
1. Back up all DB2 related integrated catalog facility catalogs with the
IDCAMS EXPORT command on a daily basis.
2. Synchronize the backups with the cataloging of image copies and
archives. If the EXPORT occurs following the copies, all the latest volumes
will be cataloged.
3. Use the IDCAMS LISTCAT command to create a list of all the DB2 entries.
4. Send the IDCAMS backup and list to the recovery site.
5. Record this activity at the recovery site when the EXPORT backup and list
are received.
• DB2 libraries
1. Back up DB2 libraries to tape when they are changed. Include the
SMP/E load, distribution, and target libraries, as well as the most recent
user applications and DBRMs. It is possible that the OS/390 staff may
provide this service at the same time the other critical system libraries
are backed up.
2. Document your backups.
3. Send backups and corresponding documentation to the recovery site.
4. Record activity at the recovery site when the library backup and
documentation are received.
• Script for recovery procedures
This may be included in a dump of critical libraries, but be certain you do not
forget it.

For disaster recovery to be successful, all copies and reports must be updated
and sent to the recovery site daily. Data will be up to date through the last

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 57


archive sent. Once you establish your copy procedure and have it operating,
you must prepare to recover your data at the recovery site. See 5.3, “Remote
Site Recovery from Disaster at a Local Site” on page 64 for step-by-step
instructions on the disaster recovery process.

4.5 Recovery of Application Servers


Generally, there are two ways to recover application servers:
• Restore application servers from tape on the same architecture
• Install new application servers on the same or a different architecture
Installing new application servers in the case of a disaster adds complexity and
should only be considered when you must restore on a different architecture or
your tapes are disrupted.

4.5.1 Recovery of Application Servers on RS/6000


To recover an application server on a standalone RS/6000 machine you need the
mksysb tape and initialize (“boot”) the machine from this tape.

If the SAP R/3 environment is not stored in the rootvg, you need to recover the
SAP R/3 environment according to the backup procedure. This means you need
to recover it with a tool such as tar, backoffl, or others.

After restoring all data you must adjust some parameters, for example the
network environment, paging space, and date/time.

4.5.2 Recovery of Application Servers on RS/6000 SP


To recover the application servers on RS/6000 SP you must first reinstall the
control workstation. This is done by initializing (“booting”) the machine with the
mksysb tape. After that you must restore the control workstation′s filesystems
including the node images in the appropriate way. After restoring the node
images on the control workstation, the nodes can easily be reinstalled using
PSSP.

If the SAP R/3 environment is not stored in the rootvg of the nodes, you need to
recover the SAP R/3 environment according to the backup procedure. This
means you need to recover it with a tool such as tar, backoffl, or others.

After restoring all data you must adjust some parameters, for example the
network environment, paging space, and date/time. You must also recognize
that the names and IP addresses of application servers may be required to be
changed (for example, when the application servers are on a different LAN) and
they may use different network servers (name servers and gateways), so the
network connections must be reconfigured.

Network reconfiguration may impose some SAP R/3 parameter changes, if the
name of the central instance application server or the database server is
different from normal operation.

58 SAP R/3 on DB2 for OS/390: Disaster Recovery


4.5.3 Recovery of Application Servers on Windows NT
As was discussed in 4.2.3, “Backup of Application Servers on Windows NT” on
page 50, the backup/recovery system supplied with Windows NT may not contain
the features you need for a comprehensive backup system for your application
servers. If you use Windows NT, you should investigate backup/recovery
software available from software vendors for that purpose.

4.5.4 Installing New Application Servers


As the SAP R/3 application server does not contain any vital data you can also
reinstall one or more of the application servers in the case of a disaster. This
does not take much longer than the recovery of an application server, if you
have appropriately skilled people, but it is more complex than to restore systems
from tape.

The following steps must be performed to install an application server:


1. Install the operating system
2. Adapt the operating system environment
3. Set up the file systems
4. Install an R/3 instance
5. Adjust the SAP R/3 parameters in the SAP R/3 profiles

For the detailed installation procedure see R/3 Installation on UNIX DB2 for
OS/390, Material Number 51002659 and Implementing SAP R/3 in an OS/390
Environment Using AIX and Windows NT Application Servers, SG24-4945.

This procedure should also be described in your disaster recovery plan.

To reduce the recovery time, you can make a backup of the application server
environment on your recovery machines after the first test and use the backup
tapes to restore in the case of a disaster. Of course, you need to create new
tapes when a new release is installed.

Chapter 4. Backup/Recovery Considerations in Disaster Recovery 59


60 SAP R/3 on DB2 for OS/390: Disaster Recovery
Chapter 5. Restarting from Remote Locations

This chapter addresses restart actions when a central site becomes unavailable
and a remote site must assume those functions.

If possible, the remote site should be used to perform quality assurance or


applications test functions, to insure that hardware and software is maintained at
the same levels as the central site. If this is not possible (for example, when the
remote site is owned by a separate company), frequent rehearsals should be
scheduled to be certain that resources at the site are adequate and procedures
are correct.

5.1 Components at the Remote Site


This section describes which components are needed at the remote site to
recover the SAP R/3 environment.

5.1.1 Application Servers


The type of application servers at the remote location should be the same as at
the central site, so that operational procedures are familiar. System
maintenance, both on hardware and software, should be performed frequently.

The number of application servers depends on the disaster recovery strategy.


During the time that the remote site is being used, some applications may not be
kept in operation. If these “non-essential” applications do not operate, the
application load will be decreased from normal production, so the number of
application server machines can be less than at the central site.

Planning Note
In this case the application server load characteristics at the recovery site
will be different from the central site. You should review your decisions
regarding the number of SAP R/3 batch, dialog, and update processes that
are assigned to the individual application server machines.

In some cases, application server machines may be brought to the remote site
only when disaster use is necessary. If that is true in your case, remember that
system copies of central site application servers will be required. However,
some configuration time will be necessary in addition to restoring the copy:
1. TCP/IP system names and IP addresses will be different (to allow the central
site machines to be in the network as repair is done). SAP R/3 definitions
that specify these names and addresses will require modification.
2. Hardware configurations may be slightly different.
3. Connectivity information (such as gateway names and name servers) will be
different.

 Copyright IBM Corp. 1999 61


5.1.2 Database Server
The database server hardware and software should mirror that at the central
site. The exception to this is when applications are not deemed essential during
a disaster recovery scenario (see 5.1.1, “Application Servers” on page 61).
Then the database server at the remote site can have less processor power than
the server at the central site, since the application load is less.

The database server processor should be of the same architecture as at the


central site. This will insure that procedures and operations are not markedly
different from familiar ones. For the same reason, operating system levels
should be the same at the two sites.

The enterprise must recognize that the disk space requirement will not be
lessened; all SAP R/3 tables and views must be kept available.

5.1.3 Connectivity Features


The connectivity from application server to the database server should be of the
same type as at the central site, and the same protocol should be used. For
example, if FDDI connections are used with TCP/IP for application server to
database server communications at the central site, an FDDI LAN and TCP/IP
software should be available at the remote site. The hardware and software
support items for these functions must also be provided at the recovery site. In
the preceding example, application servers and the database server would
require TCP/IP software, and the machines would require FDDI adapters to
connect to the FDDI LAN.

5.1.4 Communications Facilities


The remote site must have adequate communications facilities so that network
users have the same access to this location that they have to the central site.
This consideration usually means high-speed communications lines, but it may
also imply that users at the central site whose normal communication is through
a LAN either have LAN extensions to the remote site or have LAN connections to
high-speed communications lines that allow communications to the remote site.

You should recognize that these alternate communications facilities require


communications specialists to plan the configuration for a disaster scenario. As
in all the component items in this section, success will be proportional to the
time spent planning for remote site use and effort expended in practice of the
recovery plan.

5.1.5 Peripheral Equipment


Obviously, all equipment necessary for normal operation (such as tape drives for
backup processing) must be provided at the remote site.

Consideration must be given to the printing function. If local printers were


available at the central site, either the remote site must also have similar
printers or must have an access path to the central site printers. If network
printers are involved, the communications concerns discussed in 5.1.4,
“Communications Facilities” apply to the communications facilities used to
connect to the printers.

62 SAP R/3 on DB2 for OS/390: Disaster Recovery


5.1.6 Workstations
Workstations should be placed at the remote site for checking that SAP R/3 is
available. User problems may need to be verified, or SAP R/3 transactions may
be required to indicate recovery status, plans to return to normal operation, or
planned availability of non-essential applications.

5.2 Steps to Recover


The steps described in 5.3, “Remote Site Recovery from Disaster at a Local Site”
on page 64 are published in DB2 UDB for OS/390 V6 Administration Guide,
SC26-9003, under the same title. It was chosen because it was the most current
“official” publication at the time this redbook was written. The scenario is
complex and should be implemented by staff with strong DB2 skills. It is
presented with a few of our changes and comments. Our comments are boxed
with the words “Redbook Recommendation.” The boxes allow you to separate
our comments from the published scenario.

Important
Where this scenario differs from that of a later release of DB2 for OS/390, or
a later PTF, use the most current one.

The assumption of the original scenario is that you will be recovering to the end
of the last archive log you have off-site. While this method assures you the most
currency, it is likely not to make “business” sense.

We have assumed that you intend to use the point-of-consistency you developed
at the local site as described in 4.4.1.1, “Establishing a Point of Consistency” on
page 51. We also assume you have the results of the query of
SYSIBM.SYSCOPY for the dummy tablespace, and the log RBA or ENDLRSN to
which you wish to recover your environment. The method used to truncate the
DB2 log is the same, a conditional restart. The process of determining the
ENDRBA or ENDLRSN is different for each end point: the end of the archive log,
as described in DB2 UDB for OS/390 V6 Administration Guide, SC26-9003, or of
the point-of-consistency, which is described in the following pages.

Redbook Main Differences


To enable you to reference the scenario in DB2 UDB for OS/390 V6
Administration Guide, SC26-9003, more easily, we are identifying our major
areas of change:
• We have created two procedures: one for data sharing and the other for
single DB2 operation.
• We have changed the truncation point of the logs, from the end of the
archive log to the point-of-consistency developed in 4.4.1.1, “Establishing
a Point of Consistency” on page 51.
• We have interspersed -TERM UTILITY statements with the DB2 catalog and
directory recovery job steps.

Chapter 5. Restarting from Remote Locations 63


5.3 Remote Site Recovery from Disaster at a Local Site
The procedures in this scenario differ from other recovery procedures in that the
hardware at your local DB2 site cannot be used to recover data. This scenario
bases recovery on the latest available archive log and assumes that all copies
and reports have arrived at the recovery site as specified in 4.4.3, “Preparing for
Disaster Recovery” on page 56.

Redbook Difference
While the point-of-consistency method described in 4.4.2, “Point-in-Time
Recovery Using DB2 Conditional Restart” on page 53 will not necessarily use
the most current archive log, that log will be used as a base point to identify
the archive log that contains our point-of-consistency.

For data sharing users, begin at 5.3.2, “Steps to Recover (Data Sharing Only)” on
page 72.

5.3.1 Steps to Recover (Non-Data Sharing)


1. If an integrated catalog facility catalog does not already exist, run job
DSNTIJCA to create a user catalog.
2. Use the access method services IMPORT command to import the integrated
catalog facility catalog.
3. Restore DB2 libraries, such as DB2 reslibs, SMP libraries, user program
libraries, user DBRM libraries, CLISTs, SDSNSAMP (or where the installation
jobs are), JCL for user-defined tablespaces, and so on.
4. Use IDCAMS DELETE NOSCRATCH to delete all catalog and user objects.
(Because step 3 imports a user ICF catalog, the catalog reflects data sets
that don′t exist on DASD.) Obtain a copy of installation job DSNTIJIN. This
job creates DB2 VSAM and non-VSAM data sets. Change the volume serial
numbers in the job to volume serial numbers that exist at the recovery site.
Comment out the steps that create DB2 non-VSAM data sets, if these data
sets already exist. Run DSNTIJIN.
Redbook Recommendation
Even though we are focusing in this section on recovery of the DB2
catalog and directory, we must DELETE NOSCRATCH all DB2 objects. They
include the DB2 catalog and directory, and user (SAP/R3) objects,
whether they are DB2-managed (STOGROUP) or user-managed (IDCAMS
DEFINEs). The ICF catalog that has just been imported contains the
registry of all the image copies and archive logs that will be used.
However, it also contains information about the volumes which contain all
DB2 tables, and that data is not on DASD at the moment. DELETE
NOSCRATCH merely makes the ICF catalog reflect the true (non-existent)
state of the DASD at the moment. If this is not done, it is likely that
volume mounts will be issued via SVC at the end of DB2 restart for
volumes on which the data resided at the local site and you will have to
bring DB2 down in order to correct the problem. Failure to DELETE
NOSCRATCH the DB2 objects is a common error.

5. Recover the BSDS:

64 SAP R/3 on DB2 for OS/390: Disaster Recovery


a. Use the access method services REPRO command to restore the contents
of the two BSDS data sets (allocated in the previous step). The most
recent BSDS image will be found in the first file (file number one) on the
latest archive log tape.
Redbook Recommendation
You can use ISPF 3.4 to display the BSDS, using
hlq.......Bnnnnnnn
The highest value of nnnnnnn is the BSDS you want.
Now use the print log map utility (DSNJU004) to list the current BSDS
contents. You already know the LOGRBA of the point-of-consistency
to which you wish to recover from a report you have received from
the quiesce of the dummy tablespace. You must now find the archive
log on which it is recorded.
• Look at the ARCHIVE LOG COPY 1 DATA SETS section (or that of
COPY 2). The most current archive logs are listed at the end of
the section, along with the STARTRBA and ENDRBA ranges of
each log. Find the range that contains your LOGRBA.
• Record the archive log that contains the point to which you wish
to recover your DB2 database (data set name, STARTRBA, and
ENDRBA).
• Now REPRO the BSDS corresponding to the archive log in the
previous bullet to the BSDS data sets. You may be overwriting
the first BSDS you restored.
Note: When you eventually truncate the log to your point-in-time
LOGRBA, you must use the BSDS that corresponds to that log.

b. Use the change log inventory utility (DSNJU003) to register this latest
archive log tape data set in the archive log inventory of the BSDS just
restored. This is necessary since the BSDS image on an archive log
tape does not reflect the archive log data set residing on that tape.
Redbook Recommendation
This is the first mention of the change log inventory utility. Various
change log inventory control statements are developed through the
rest of the subscripts of this step. Though they are explained in this
procedure separately, you can run them all in one batch invocation of
change log inventory.

c. Use the change log inventory utility to adjust the active logs:
1) Use the DELETE option of the change log inventory utility (DSNJU003)
to delete all active logs in the BSDS. Use the BSDS listing produced
in the step above to determine the active log data set names.
2) Use the NEWLOG statement of the change log inventory utility
(DSNJU003) to add the active log data sets to the BSDS. Do not
specify a STARTRBA or ENDRBA value in the NEWLOG statement. This
indicates to DB2 that the new active logs are empty.
If you are using dual BSDSs, make sure both of them are included in the
jobs.

Chapter 5. Restarting from Remote Locations 65


d. If you are using the DB2 distributed data facility, run the change log
inventory utility with the DDF statement to update the LOCATION and
LUNAME values in the BSDS.
e. Use the print log map utility (DSNJU004) to list the new BSDS contents
and ensure that the BSDS correctly reflects the active and archive log
data set inventories. In particular, ensure that:
• All active logs show a status of NEW and REUSABLE
• The archive log inventory is complete and correct (for example, the
start and end RBAs should be correct).
6. Optionally, you can restore archive logs to DASD. Archive logs are typically
stored on tape, but restoring them to DASD could speed later steps. Since
the archive logs are listed as cataloged in the BSDS, DB2 allocates them
using the integrated catalog and not the unit or volser specified in the BSDS.
If you are using dual BSDSs, remember to update both copies.
Redbook Recommendation
You do not need to update the BSDS if you restore archives to DASD.
1. Uncatalog the tape archive
2. Allocate (catalog) the DASD archive with the same name
3. Use IEBGENER to restore the tape archive to DASD
Do not change the block size from its value at the local site or the
recovery will likely fail. Hint: Set the block size locally at 24576 in the
DSNZPARM parameter BLKSIZE on the DSNTIPA panel, so that you can
get two blocks per track. Otherwise the restoration will consume about
twice as much DASD as you have planned.

7. Use the DSN1LOGP utility to determine which transactions were in process


at the point-of-consistency. Use the following job control language:
//SAMP EXEC PGM=DSN1LOGP
//SYSPRINT DD SYSOUT=*
//SYSSUMRY DD SYSOUT=*
//ARCHIVE DD DSN=last-archive,DISP=(OLD,KEEP),UNIT=TAPE,
LABEL=(2,SL),VOL=SER=volser1
(NOTE FILE 1 is BSDS COPY)
//SYSIN DD *
STARTRBA(yyyyyyyyyyyy) SUMMARY(ONLY)
/*
Where yyyyyyyyyyyy is the STARTRBA of the last complete checkpoint before
the point-of-consistency LOGRBA.
Redbook Recommendation
A complete checkpoint has a STARTRBA and an ENDRBA “pair.” It is
possible for the point-of-consistency to lie between them. If that is the
case, choose the next most current pair whose ENDRBA is less than the
point-of-consistency LOGRBA. You will have to look in the Checkpoint
Section of the print log map report, which gives both the starting and
ending RBAs for each pair.

DSN1LOGP gives a report. For sample output and information about how to
read it, see Section 3 of DB2 UDB for OS/390 V6 Utility Guide and Reference,
SC26-9015.

66 SAP R/3 on DB2 for OS/390: Disaster Recovery


Note whether any utilities were executing at the point-of-consistency. You
will have to determine the appropriate recovery action to take on each
tablespace involved in a utility job.
If DSN1LOGP showed that utilities are inflight (PLAN=DSNUTIL), you need
SYSUTILX to identify the utility status and determine the recovery approach.
See 5.3.2.1, “What to Do about Utilities in Progress” on page 82.
8. Modify DSNZPxxx parameters:
Redbook Recommendation
It is easier if you have created a new DSNZPARM, giving it a new name,
at the local site. Then you can skip this step now.

a. Run the DSNTINST CLIST in UPDATE mode See Section 2 of DB2 UDB for
OS/390 V6 Installation Guide, GC26-9008.
b. To defer processing of all databases select Databases to Start
Automatically from panel DSNTIPB. You are presented with panel
DSNTIPS. Type DEFER in the first field, ALL in the second and press Enter.
You are returned to DSNTIPB.
c. To specify where you are recovering select Operator Functions from
panel DSNTIPB. You are presented with panel DSNTIPO. Type
RECOVERYSITE in the SITE TYPE field. Press Enter to continue.
d. To optionally specify which archive log to use, select Operator Functions
from panel DSNTIPB. You are presented with panel DSNTIPO. Type YES
in the READ ARCHIVE COPY 2 field if you are using dual archive logging
and want to use the second copy of the archive logs. Press Enter to
continue. (This applies to DB2 UDB for OS/390 Version 6 or above.)
Redbook Recommendation
To enable fast log apply for recovery and restart, select ″Active Log
Data Set Parameters″ from panel DSNTIPB. You are presented with
panel DSNTIPL. Type 100 in the LOG APPLY STORAGE field to
reserve 100 MB storage in the DBM1 address space and press Enter.
(This applies to DB2 UDB for OS/390 Version 6 or above.)

e. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST


started in the first step).
At this point you have the log, but the tablespaces have not been recovered.
With DEFER ALL, DB2 assumes that the tablespaces are unavailable, but does
the necessary processing to the log. This step also handles the units of
recovery in process.
9. Use the change log inventory utility to create a conditional restart control
record. In most cases, you can use this form of the CRESTART statement:
CRESTART CREATE,ENDRBA=nnnnnnnnn000,FORWARD=YES,
BACKOUT=YES
where nnnnnnnnn000 equals the next highest control interval from the ENDRBA
of the point-of-consistency. That is, if the point-of-consistency is 123456, the
ENDRBA used in the CRESTART record is 124000.

Chapter 5. Restarting from Remote Locations 67


Redbook Recommendation
This change log inventory control statement can be included in the job
described in Step 5b on page 65.

DB2 discards any log information in the bootstrap data set and the active
logs with an RBA greater than or equal to nnnnnnnnn000 as listed in the
CRESTART statements above.
Use the print log map utility to verify that the conditional restart control
record that you created in the previous step is active.
10. Enter the command START DB2 ACCESS(MAINT).
Even though DB2 marks all tablespaces for deferred restart, log records are
written so that in-abort and inflight units of recovery are backed out.
In-commit units of recovery are completed, but no additional log records are
written at restart to cause this. This happens when the original redo log
records are applied by the RECOVER utility.
At the primary site, DB2 probably committed or aborted the inflight units of
recovery, but you have no way of knowing.
During restart, DB2 accesses two tablespaces that result in DSNT501I,
DSNT500I, and DSNL700I resource unavailable messages, regardless of
DEFER status. The messages are normal and expected. You can ignore
them.
The return code accompanying the message might be one of the following,
although other codes are possible:
00C90081 This return code occurs if there is activity against the object
during restart as a result of a unit of recovery or pending
writes. In this case the status shown as a result of -DISPLAY
is STOP,DEFER.
00C90094 Since the tablespace is currently only a defined VSAM data
set, it is in an unexpected state to DB2.
00C900A9 This codes indicates that an attempt was made to allocate a
deferred resource.
11. Resolve the indoubt units of recovery.
The RECOVER utility, which you will soon invoke, will fail on any tablespace
that has indoubt units of recovery. Because of this, you must resolve
in-doubt units of recovery first.
Determine the proper action to take (commit or abort) for each unit of
recovery. To resolve indoubt units of recovery see “Resolving Indoubt
Threads” in DB2 UDB for OS/390 V6 Administration Guide, SC26-9003. From
an install SYSADM authorization ID, enter the RECOVER INDOUBT command for
all affected transactions.
If you attempt this from an MVS console, you will receive messages resulting
from an attempt to do authorization checking when no tables exist yet.
Redbook Recommendation
After you identify any indoubt units of recovery, it is safe to issue RECOVER
INDOUBT ACTION(ABORT). The rationale is that you have already lost work;
the loss of one more unit of work is not significant.

68 SAP R/3 on DB2 for OS/390: Disaster Recovery


12. To recover the catalog and directory, follow these instructions:
The RECOVER function includes: RECOVER TABLESPACE, RECOVER INDEX, or
REBUILD INDEX (DB2 UDB for OS/390 Version 6 and above). If you have an
image copy of an index, use RECOVER INDEX. If you do not have an image
copy of an index, use REBUILD INDEX to reconstruct the index from the
recovered tablespace.
a. Recover DSNDB01.SYSUTILX. This must be a separate job step.
b. Recover all indexes on SYSUTILX. This must be a separate job step.
c. Your recovery strategy for an object depends on whether a utility was
running against it at the time the latest archive log was created. To
identify the utilities that were running, you must recover SYSUTILX.
You cannot restart a utility at the recovery site that was interrupted at
the disaster site. You must use the TERM command to terminate it. The
TERM UTILITY command can be used on any object except
DSNDB01.SYSUTILX.
Determine which utilities were executing and the tablespaces involved by
following these steps:
1) Enter the DISPLAY UTILITY(*) command and record the utility and the
current phase.
2) Run the DIAGNOSE utility with the DISPLAY SYSUTILX statement. The
output consists of information about each active utility, including the
tablespace name (in most instances). It is the only way to correlate
the object name with the utility. Message DSNU866I gives
information on the utility, while DSNU867I gives the database and
tablespace name in USUDBNAM and USUSPNAM respectively.
See 5.3.2.1, “What to Do about Utilities in Progress” on page 82 for
information on how to recover catalog and directory tablespaces on
which utilities were running.
d. Recover the rest of the catalog and directory objects starting with DBD01,
in the order shown in the description of the RECOVER utility in Section 2
of DB2 UDB for OS/390 V6 Utility Guide and Reference, SC26-9015.

Chapter 5. Restarting from Remote Locations 69


Redbook Recommendation
The preceding list is excerpted from the DB2 UDB for OS/390 V6
Utility Guide and Reference, SC26-9015, for informational purposes.
You should consult it for specific instructions.
In addition we have incorporated utility termination in the recovery
steps, in order to simplify the process. We have implemented a
-TERM step on the utilities up through those for SYSLGRNX. That is,
there is a -TERM step for DSNDB01.DBD01 with appropriate condition
codes to guarantee the execution of the following RECOVER step. In
that way the recover will be certain to run. Following the recovery of
DSNDB01.SYSLGRNX, all utilities may safely be terminated (but not
before).
1. DSNDB01.DBD01
• -TERM UTILITY on DBD01 (job step 1)
•Recover DSNDB01.DBD01 (job step 2)
•Rebuild all indexes on SYSUTILX (job step 2)
2. DSNDB06.SYSCOPY
• -TERM UTILITY on SYSCOPY(job step 3)

• Recover DSNDB06.SYSCOPY (job step 4)

• Rebuild all IBM-defined indexes on SYSCOPY (job step 4)

3. DSNDB01.SYSLGRNX
• -TERM UTILITY on SYSLGRNX (job step 5)

• Recover DSNDB01.SYSLGRNX (job step 6)

• Rebuild all indexes on SYSLGRNX (job step 6)

4. -TERM UTIL(*) (job step 7)


Note: It is now safe to terminate all utilities, since appropriate
updates can now be made to SYSCOPY and SYSLGRNX if they
are required (as a result of the termination).
The rest of the RECOVER statements can be run in one job step if it is
desired.
1. DSNDB06.SYSDBAUT
2. All IBM-defined indexes on SYSDBAUT
3. DSNDB06.SYSUSER
4. DSNDB06.SYSDBASE
5. All IBM-defined indexes on SYSDBASE and SYSUSER
Note: Following recovery of SYSDBASE and its indexes, you can
rebuild all indexes on the rest of the objects using REBUILD
INDEX(ALL). The latter statement results in simple JCL.
6. Other catalog and directory tablespaces and indexes. The
remaining catalog tablespaces in DSNDB06 are SYSGROUP,
SYSGPAUT, SYSPLAN, SYSPKAGE, SYSSTATS, SYSSTR, and
SYSVIEWS. Most indexes are listed in DB2 for OS/390 V5 SQL
Reference, SC26-8966. (Use DB2 UDB SQL Reference Vol 1 V6,
SC09-2847 and DB2 UDB SQL Reference Vol 2 V6, SC09-2848 for
DB2 UDB for OS/390 Version 6.) One index not listed there is
DSNVTH01. There are two remaining directory tablespaces,
DSNDB01.SCT02, which has index SYSIBM.DSNSCT02, and
DSNDB01.SPT01, which has indexes SYSIBM.DSNSPT01 and
SYSIBM.DSNSPT02.
7. All user-defined indexes on the catalog (if you plan to use them,
otherwise, you can omit this step).

70 SAP R/3 on DB2 for OS/390: Disaster Recovery


13. Use any method desired to verify the integrity of the DB2 catalog and
directory. Selected catalog queries in member DSNTESQ of data set
DSN610.SDSNSAMP can be used after the work file database is defined and
initialized.
14. Define and initialize the work file database.
a. Define temporary work files. Use installation job DSNTIJTM as a model.
b. Issue the command -START DATABASE work-file-database to start the work
file database.
15. If you use data definition control support, recover the objects in the data
definition control support database.
16. If you use the resource limit facility, recover the objects in the resource limit
control facility database.
17. Modify DSNZPxxx to restart all databases:
Redbook Recommendation
It is easier to have this DSNZPARM prepared at the local site. It only
differs from the one you just used in that all databases are to be
automatically restarted.

a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. From panel DSNTIPB select Databases to Start Automatically. You are
presented with panel DSNTIPS. Type RESTART in the first field, ALL in the
second and press Enter. You are returned to DSNTIPB.
c. (For DB2 UDB for OS/390 Version 6 and above). You must keep SITE
TYPE as RECOVERYSITE until all user objects are recovered, so do not
change this parameter now.
d. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST
started in the first step).
18. Stop and start DB2.
Note: Do not use ACCESS(MAINT), because you want other users to perform
the next recoveries.
19. Make a full image copy of the catalog and directory.
20. Recover user tablespaces (any tools tablespaces as well as SAP R/3 on DB2
for OS/390 tablespaces). See 5.3.2.1, “What to Do about Utilities in
Progress” on page 82 for information on how to recover tablespaces on
which utilities were running. You cannot restart a utility at the recovery site
that was interrupted at the disaster site. You have already terminated
utilities running against user tablespaces in step 12d on page 69.
a. Issue the SQL query
SELECT * FROM SYSIBM.SYSTABLEPART WHERE STORTYPE=′ E′ ;
to determine which, if any, of your tablespaces are user-managed. To
allocate user-managed tablespaces, use the access method services
DEFINE CLUSTER command.
b. If your user tablespaces are STOGROUP-defined, and if the volume serial
numbers at the recovery site are different from those at the local site,
use ALTER STOGROUP to change them in the DB2 catalog.

Chapter 5. Restarting from Remote Locations 71


c. Recover all user tablespaces and index spaces from the appropriate
image copies. If you do not copy your indexes, use the REBUILD INDEX
utility to reconstruct the indexes. (Applies to DB2 UDB for OS/390
Version 6 and above.)
d. Start all user tablespaces and index spaces for read or write processing
by issuing the command -START DATABASE with the ACCESS(RW) option.
e. Resolve any remaining check pending states that would prevent COPY
execution.
f. Run select queries with known results.
Redbook Recommendation
You can stop and start DB2 at this time with the DSNZPARM you use at
the local site.

21. Make full image copies of all tablespaces.


22. Finally, compensate for lost work since the point-of-consistency by rerunning
online transactions and batch jobs.

5.3.2 Steps to Recover (Data Sharing Only)


For users of data sharing, see Chapter 6 of DB2 UDB for OS/390 Data Sharing:
Planning and Administration, SC26-9007, for the specific disaster recovery
procedures that apply to data sharing.
1. Clean out old information from the coupling facility, if you have information in
your coupling facility from a practice operation. If you do not have old
information in the coupling facility, you can omit this step.
a. Enter the following MVS command to display the structures for this data
sharing group:
D XCF,STRUCTURE,STRNAME=grpname*
b. For group buffer pools and the lock structure, enter the following
command to force the connections off those structures:
SETXCF FORCE,CONNECTION,STRNAME=strname,CONNAME=ALL
Connections for the SCA are not held at termination, so there are no
SCA connections to force off.
c. Delete all the DB2 coupling facility structures by using the following
command for each structure:
SETXCF FORCE,STRUCTURE,STRNAME=strname
This step is necessary to clean out old information that exists in the
coupling facility from your practice startup when you installed the group.
2. If an integrated catalog facility catalog does not already exist, run job
DSNTIJCA to create a user catalog.
3. Use the access method services IMPORT command to import the integrated
catalog facility catalog.
4. Restore DB2 libraries, such as DB2 reslibs, SMP libraries, user program
libraries, user DBRM libraries, CLISTs, SDSNSAMP, or where the installation
jobs are, JCL for user-defined tablespaces, and so on.
5. Use IDCAMS DELETE NOSCRATCH to delete all catalog and user objects.
(Because step 3 imports a user ICF catalog, the catalog reflects data sets

72 SAP R/3 on DB2 for OS/390: Disaster Recovery


that do not exist on DASD.) Obtain a copy of installation job DSNTIJIN. This
job creates DB2 VSAM and non-VSAM data sets. Change the volume serial
numbers in the job to volume serial numbers that exist at the recovery site.
Comment out the steps that create DB2 non-VSAM data sets, if these data
sets already exist. Run DSNTIJIN.
Redbook Recommendation
Even though we are focusing in this section on recovery of the DB2
catalog and directory, we must DELETE NOSCRATCH all DB2 objects, both
DB2 catalog and directory, user (SAP/R3), whether DB2-managed
( STOGROUP), or user-managed (using IDCAMS DEFINE). The ICF catalog that
has just been imported contains the registry of all the image copies and
archive logs that will be used. However, it also contains information
about the volumes that contain all DB2 tables, and that data is not on
DASD at the moment. DELETE NOSCRATCH merely makes the ICF catalog
reflect the true (non-existent) state of the DASD at the moment. If this is
not done, it is likely that volume mounts will be issued via SVC at the end
of DB2 restart for volumes on which the data resided at the local site and
you will have to bring DB2 down in order to correct the problem. Failure
to use DELETE NOSCRATCH for the DB2 objects is a common error.

6. Recover the BSDS:


a. Use the access method services REPRO command to restore the contents
of the two BSDS data sets (allocated in the previous step). The most
recent BSDS image will be found in the first file (file number 1) on the
latest archive log tape. Do this for each member of the data sharing
group.

Chapter 5. Restarting from Remote Locations 73


Redbook Recommendation
You can use ISPF 3.4 to display the BSDS, using
hlq.......Bnnnnnnn
The highest value of nnnnnnn is the BSDS you want
Now use the print log map utility (DSNJU004) with the GROUP option to
list the current contents of all the BSDS data sets of all the DB2
members. You already know the LRSN of the point-of-consistency to
which you wish to recover from a report you have received from the
quiesce of the dummy tablespace. You must now find the archive log
on each member in which it is recorded.
• Look at the ARCHIVE LOG COPY 1 DATA SETS section (or that of
COPY 2). The most current archive logs are listed at the end of
the section, along with the STARTLRSN and ENDLRSN ranges of
each log. Find the range that contains your LRSN.
• Record the archive log that contains the point to which you wish
to recover your DB2 database (data set name, STARTRBA,
ENDRBA, STARTLRSN, ENDLRSN).
• Now REPRO the BSDS corresponding to the archive log in the
previous bullet to the BSDS data sets. You may be overwriting
the first BSDS you restored for each DB2 member.
Note: When you eventually truncate the log to your point-in-time
LRSN, you must use the BSDS that corresponds to that log.
Running DSNJU003 is critical for data sharing groups. Group
buffer pool checkpoint information is stored in the BSDS and
needs to be included from the most recent archive log.

b. Do this for each member of the data sharing group:


Use the change log inventory utility (DSNJU003) to register this latest
archive log tape data set in the archive log inventory of the BSDS just
restored. This is necessary since the BSDS image on an archive log
tape does not reflect the archive log data set residing on that tape.
Redbook Recommendation
This is the first mention of the change log inventory utility. Various
change log inventory control statements are developed through the
rest of the subscripts of this step. Though they are explained in this
procedure separately, you can run them all in one batch invocation of
change log inventory for each DB2 member. That is, if you have
three members of the data sharing group, you will run three change
log inventory utility jobs.

After these archive logs are registered, use the print log map utility
(DSNJU004) with the GROUP option to list the contents of all the SDSs. You
get output that includes the start and end LRSN and RBA values for the
latest active log data sets (shown as NOTREUSABLE).
Note: If there is a discrepancy among the print log map reports as to
the number of members in the group, record the one that shows the
highest number. (This is an unlikely occurrence.) This is the DB2 that
must be started first.

74 SAP R/3 on DB2 for OS/390: Disaster Recovery


c. Use the change log inventory utility to adjust the active logs for each
member of the group:
1) Use the DELETE option of the change log inventory utility (DSNJU003)
to delete all active logs in the BSDS. Use the BSDS listing produced
in the step above to determine the active log data set names.
2) Use the NEWLOG statement of the change log inventory utility
(DSNJU003) to add the active log data sets to the BSDS. Do not
specify a STARTRBA or ENDRBA value in the NEWLOG statement. This
indicates to DB2 that the new active logs are empty.
If you are using dual BSDSs, make sure both of them are included in the
jobs.
d. If you are using the DB2 distributed data facility, run the change log
inventory utility with the DDF statement to update the LOCATION and
LUNAME values in the BSDS.
e. Use the print log map utility (DSNJU004) to list the new BSDS contents
and ensure that the BSDS correctly reflects the active and archive log
data set inventories. In particular, ensure that:
• All active logs show a status of NEW and REUSABLE
• The archive log inventory is complete and correct (for example, the
start and end RBAs should be correct).
7. Optionally, you can restore archive logs to DASD. Archive logs are typically
stored on tape, but restoring them to DASD could speed later steps. Since
the archive logs are listed as cataloged in the BSDS, DB2 allocates them
using the integrated catalog and not the unit or volser specified in the BSDS.
If you are using dual BSDSs, remember to update both copies.
Redbook Recommendation
You do not need to update the BSDS if you restore archives to DASD.
1. Uncatalog the tape archive
2. Allocate (catalog) the DASD archive with the same name
3. Use IEBGENER to restore the tape archive to DASD
Do not change the block size from its value at the local site or the
recovery will likely fail. Hint: Set the block size locally at 24576 in
DSNZPARM parameter BLKSIZE on DSNTIPA panel, so that you can get
two blocks per track. Otherwise, the restoration will consume about
twice as much DASD as you have planned.

8. Use the DSN1LOGP utility to determine which transactions were in process


at the point-of-consistency. Use the following job control language for each
DB2 member:
//SAMP EXEC PGM=DSN1LOGP
//SYSPRINT DD SYSOUT=*
//SYSSUMRY DD SYSOUT=*
//ARCHIVE DD DSN=last-archive,DISP=(OLD,KEEP),UNIT=TAPE,
LABEL=(2,SL),VOL=SER=volser1
(NOTE FILE 1 is BSDS COPY)
//SYSIN DD *
STARTRBA(yyyyyyyyyyyy) SUMMARY(ONLY)
/*

Chapter 5. Restarting from Remote Locations 75


Where yyyyyyyyyyyy is the STARTRBA of the last complete checkpoint before
the point-of-consistency LRSN (not LOGRBA) from the previous print log
map.
DSN1LOGP gives a report. For sample output and information about how to
read it, see Section 3 of DB2 UDB for OS/390 V6 Utility Guide and Reference,
SC26-9015.
Redbook Recommendation
A complete checkpoint has a STARTRBA and an ENDRBA “pair.” It is
possible for the point-of-consistency to lie between them. If that is the
case, choose the next most current pair whose ENDRBA is less than the
point-of-consistency LRSN. You will have to look in the Checkpoint
Section of the print log map report, which gives both the RBAs and
LRSNs for each pair.

Note whether any utilities were executing at the point-of-consistency LRSN.


You will have to determine the appropriate recovery action to take on each
tablespace involved in a utility job.
If DSN1LOGP showed that utilities are inflight (PLAN=DSNUTIL), you need
SYSUTILX to identify the utility status and determine the recovery approach.
See 5.3.2.1, “What to Do about Utilities in Progress” on page 82.
9. Modify DSNZPxxx parameters:
Redbook Recommendation
It is easier if you have created a new DSNZPARM, giving it a new name,
at the local site. Then you can skip this step now.

a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. To defer processing of all databases select Databases to Start
Automatically from panel DSNTIPB. You are presented with panel
DSNTIPS. Type DEFER in the first field, ALL in the second and press Enter.
You are returned to DSNTIPB.
c. To specify where you are recovering select Operator Functions from
panel DSNTIPB. You are presented with panel DSNTIPO. Type
RECOVERYSITE in the SITE TYPE field. Press Enter to continue.
d. To optionally specify which archive log to use select Operator Functions
from panel DSNTIPB. You are presented with panel DSNTIPO. Type YES
in the READ ARCHIVE COPY 2 field if you are using dual archive logging
and want to use the second copy of the archive logs. Press Enter to
continue. (This applies to DB2 UDB for OS/390 Version 6 and above.)
Redbook Recommendation
To enable fast log apply for recovery and restart, select Active Log
Data Set Parameters from panel DSNTIPB. You are presented with
panel DSNTIPL. Type 100 in the LOG APPLY STORAGE field to
reserve 100 MB storage in the DBM1 address space and press Enter.
(This applies to DB2 UDB for OS/390 Version 6 or above)

e. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST


started in the first step).

76 SAP R/3 on DB2 for OS/390: Disaster Recovery


At this point you have the log, but the tablespaces have not been recovered.
With DEFER ALL, DB2 assumes that the tablespaces are unavailable, but does
the necessary processing to the log. This step also handles the units of
recovery in process.
10. Use the change log inventory utility to create a conditional restart control
record. In most cases, you can use this form of the CRESTART statement:
CRESTART CREATE,ENDLRSN=nnnnnnnnnnnn,FORWARD=YES,BACKOUT=YES
where nnnnnnnnnnnn equals the point-of-consistency LRSN.
Redbook Recommendation
This change log inventory control statement can be included in the job
described in Step 6 on page 73.
Use the same CRESTART control statement for each DB2 member, as all
DB2 members must have their logs truncated at the same point.
Note: DB2 members of the group which have been shut down normally
so long ago that their logs are not needed for recovery are an exception
in data sharing. They are not restarted at the recovery site.

DB2 discards any log information in the bootstrap data set and the active
logs with an LRSN greater than nnnnnnnnnnnn as listed in the CRESTART
statements above.
Use the print log map utility to verify that the conditional restart control
record that you created in the previous step is active.
11. Start one DB2 with ACCESS(MAINT). DB2 will prompt you to start each
additional DB2 subsystem in the group.
If there is a discrepancy among the print log map reports as to the number
of members in the group, record the one that shows the highest number.
(This is an unlikely occurrence.) This is the DB2 that must be started first.
Redbook Recommendation
A group restart will be performed following the truncation of all the
members′ logs. Expect to see at least one of these messages:
DSNR021I csect-name DB2 SUBSYSTEM MUST PERFORM GROUP RESTART
FOR PEER MEMBERS
DSNR022I csect-name DB2 SUBSYSTEM HAS COMPLETED GROUP RESTART
FOR PEER MEMBERS
If you do not see this, stop now and perform step 1 on page 72 in this
procedure to force existing structures and connections from the CF from
prior tests. Then you must redo step 10, which created the CRESTART
record, before restarting DB2 again. Failure to delete structures prior to
restart is a common cause of disaster recovery failure for data sharing
users.

Even though DB2 marks all tablespaces for deferred restart, log records are
written so that in-abort and inflight units of recovery are backed out.
In-commit units of recovery are completed, but no additional log records are
written at restart to cause this. This happens when the original redo log
records are applied by the RECOVER utility.
At the primary site, DB2 probably committed or aborted the inflight units of
recovery, but you have no way of knowing.

Chapter 5. Restarting from Remote Locations 77


During restart, DB2 accesses two tablespaces which result in DSNT501I,
DSNT500I, and DSNL700I resource unavailable messages, regardless of
DEFER status. The messages are normal and expected and you can ignore
them.
The return code accompanying the message might be one of the following,
although other codes are possible:
00C90081 This return code occurs if there is activity against the object
during restart as a result of a unit of recovery or pending
writes. In this case the status shown as a result of -DISPLAY
is STOP,DEFER.
00C90094 Since the tablespace is currently only a defined VSAM data
set, it is in an unexpected state to DB2.
00C900A9 This codes indicates that an attempt was made to allocate a
deferred resource.
Redbook Recommendation
You will observe the addition of data sets to LPL during forward and
backward phases of DB2 restart. All data sets that were being shared at
the time will be set to GRECP exception condition. This is not a problem
because the RECOVER utility, which you will soon invoke, will remove
this exception status.

12. Resolve the indoubt units of recovery.


The RECOVER utility, which you will soon invoke, will fail on any table space
that has indoubt units of recovery. Because of this, you must resolve them
first.
Determine the proper action to take (commit or abort) for each unit of
recovery. To resolve indoubt units of recovery see “Resolving Indoubt
Threads” in DB2 UDB for OS/390 V6 Administration Guide, SC26-9003. From
an install SYSADM authorization ID, enter the RECOVER INDOUBT command for
all affected transactions.
If you attempt this from an MVS console, you will receive messages resulting
from an attempt to do authorization checking when no tables exist yet.
Redbook Recommendation
After you identify any indoubt units of recovery, it is safe to issue RECOVER
INDOUBT ACTION(ABORT). The rationale is that you have already lost work;
the loss of one more unit of work is not significant. Since this command
is member-specific, you must issue it on each member for which an
indoubt unit of recovery exists

13. If you are going to run single-system data sharing at the recovery site, stop
all DB2s but one by using the STOP DB2 command with MODE(QUIESCE).
14. To recover the catalog and directory, follow these instructions:
The RECOVER function includes: RECOVER TABLESPACE, RECOVER INDEX, or
REBUILD INDEX. If you have an image copy of an index, use RECOVER INDEX. If
you do not have an image copy of an index, use REBUILD INDEX to reconstruct
the index from the recovered tablespace.
a. Recover DSNDB01.SYSUTILX. This must be a separate job step.
b. Recover all indexes on SYSUTILX. This must be a separate job step.

78 SAP R/3 on DB2 for OS/390: Disaster Recovery


c. Your recovery strategy for an object depends on whether a utility was
running against it at the time the latest archive log was created. To
identify the utilities that were running, you must recover SYSUTILX.
You cannot restart a utility at the recovery site that was interrupted at
the disaster site. You must use the TERM command to terminate it. The
TERM UTILITY command can be used on any object except
DSNDB01.SYSUTILX.
Redbook Recommendation
TERM UTIL does not have group scope if a utility is active. The utilities
in process at the point-of-consistency were categorized as “ACTIVE”
at the local site. Can they be terminated by one member?
Surprisingly, yes. Because TERM is able to obtain an internal lock, it
considers the utility to be in a stopped status, and therefore, all
utilities can be terminated from any DB2 member.

Determine which utilities were executing and the tablespaces involved by


following these steps:
1) Enter the DISPLAY UTILITY(*) command and record the utility and the
current phase.
2) Run the DIAGNOSE utility with the DISPLAY SYSUTILX statement. The
output consists of information about each active utility, including the
tablespace name (in most instances). It is the only way to correlate
the object name with the utility. Message DSNU866I gives
information on the utility, while DSNU867I gives the database and
tablespace name in USUDBNAM and USUSPNAM respectively.
See 5.3.2.1, “What to Do about Utilities in Progress” on page 82 for
information on how to recover catalog and directory tablespaces on
which utilities were running.
d. Recover the rest of the catalog and directory objects starting with DBD01,
in the order shown in the description of the RECOVER utility in Section 2
of DB2 UDB for OS/390 V6 Utility Guide and Reference, SC26-9015.

Chapter 5. Restarting from Remote Locations 79


Redbook Recommendation
The preceding list is excerpted from the DB2 UDB for OS/390 V6
Utility Guide and Reference, SC26-9015, for informational purposes.
You should consult it for specific instructions.
In addition we have incorporated utility termination in the recovery
steps, in order to simplify the process. We have implemented a
-TERM step on the utilities up through those for SYSLGRNX. That is,
there is a -TERM step for DSNDB01.DBD01 with appropriate condition
codes to guarantee the execution of the following RECOVER step. In
that way the recover will be certain to run. Following the recovery of
DSNDB01.SYSLGRNX, all utilities may safely be terminated (but not
before).
1. DSNDB01.DBD01
• -TERM UTILITY on DBD01 (job step 1)
•Recover DSNDB01.DBD01 (job step 2)
•Rebuild all indexes on SYSUTILX (job step 2)
2. DSNDB06.SYSCOPY
• -TERM UTILITY on SYSCOPY(job step 3)

• Recover DSNDB06.SYSCOPY (job step 4)

• Rebuild all IBM-defined indexes on SYSCOPY (job step 4)

3. DSNDB01.SYSLGRNX
• -TERM UTILITY on SYSLGRNX (job step 5)

• Recover DSNDB01.SYSLGRNX (job step 6)

• Rebuild all indexes on SYSLGRNX (job step 6)

4. -TERM UTIL(*) (job step 7)


Note: It is now safe to terminate all utilities, since appropriate
updates can now be made to SYSCOPY and SYSLGRNX if they
are required (as a result of the termination).
The rest of the RECOVER statements can be run in one job step if it is
desired.
1. DSNDB06.SYSDBAUT
2. All IBM-defined indexes on SYSDBAUT
3. DSNDB06.SYSUSER
4. DSNDB06.SYSDBASE
5. All IBM-defined indexes on SYSDBASE and SYSUSER
Note: Following recovery of SYSDBASE and its indexes, you can
rebuild all indexes on the rest of the objects using REBUILD
INDEX(ALL). The latter statement results in simple JCL.
6. Other catalog and directory tablespaces and indexes. The
remaining catalog tablespaces in DSNDB06 are SYSGROUP,
SYSGPAUT, SYSPLAN, SYSPKAGE, SYSSTATS, SYSSTR, and
SYSVIEWS. Most indexes are listed in DB2 for OS/390 V5 SQL
Reference, SC26-8966. (Use DB2 UDB SQL Reference Vol 1 V6,
SC09-2847 and DB2 UDB SQL Reference Vol 2 V6, SC09-2848 for
DB2 UDB for OS/390 Version 6.) One index not listed there is
DSNVTH01. There are two remaining directory tablespaces,
DSNDB01.SCT02, which has index SYSIBM.DSNSCT02, and
DSNDB01.SPT01, which has indexes SYSIBM.DSNSPT01 and
SYSIBM.DSNSPT02.
7. All user defined indexes on the catalog (if you plan to use them,
otherwise, you can omit this step).

80 SAP R/3 on DB2 for OS/390: Disaster Recovery


15. Use any method desired to verify the integrity of the DB2 catalog and
directory. Selected catalog queries in member DSNTESQ of data set
DSN610.SDSNSAMP can be used after the work file database is defined and
initialized.
16. Define and initialize the work file database for each DB2 member.
a. Define temporary work files. Use installation job DSNTIJTM as a model.
b. Issue the command -START DATABASE work-file-database to start the work
file database.
17. If you use data definition control support, recover the objects in the data
definition control support database.
18. If you use the resource limit facility, recover the objects in the resource limit
control facility database.
19. Modify DSNZPxxx to restart all databases:
Redbook Recommendation
It is easier to have this DSNZPARM prepared at the local site. It only
differs from the one you just used in that all databases are to be
automatically restarted.

a. Run the DSNTINST CLIST in UPDATE mode. See Section 2 of DB2 UDB
for OS/390 V6 Installation Guide, GC26-9008.
b. From panel DSNTIPB select Databases to Start Automatically. You are
presented with panel DSNTIPS. Type RESTART in the first field, ALL in the
second and press Enter. You are returned to DSNTIPB.
c. (For DB2 UDB for OS/390 Version 6 and above). You must keep SITE
TYPE as RECOVERYSITE until all user objects are recovered, so do not
change this parameter now.
d. Reassemble DSNZPxxx using job DSNTIJUZ (produced by the CLIST
started in the first step).
20. Stop and start DB2 (one member).
Note: Do not use ACCESS(MAINT), because you want other users to perform
the next recoveries.
21. Make a full image copy of the catalog and directory.
22. Recover user tablespaces. See 5.3.2.1, “What to Do about Utilities in
Progress” on page 82 for information on how to recover tablespaces on
which utilities were running. You cannot restart a utility at the recovery site
that was interrupted at the disaster site. You have already terminated any
utilities running against user tablespaces in item 14d on page 79.
a. Issue the SQL query
SELECT * FROM SYSIBM.SYSTABLEPART WHERE STORTYPE=′ E′ ;
to determine which, if any, of your tablespaces are user-managed. To
allocate user-managed tablespaces, use the access method services
DEFINE CLUSTER command.
b. If your user tablespaces are STOGROUP-defined, and if the volume serial
numbers at the recovery site are different from those at the local site,
use ALTER STOGROUP to change them in the DB2 catalog.

Chapter 5. Restarting from Remote Locations 81


c. Recover all user tablespaces and index spaces from the appropriate
image copies. If you do not copy your indexes, use the REBUILD INDEX
utility to reconstruct the indexes. (Applies to DB2 UDB for OS/390
Version 6 and above.)
d. Start all user tablespaces and index spaces for read or write processing
by issuing the command -START DATABASE with the ACCESS(RW) option.
e. Resolve any remaining check pending states that would prevent COPY
execution.
f. Run select queries with known results.
23. Stop and start DB2.
Redbook Recommendation
You can stop and start DB2 at this time with the DSNZPARM you use at
the local site.

24. Make full image copies of all tablespaces.


25. Finally, compensate for lost work since the last archive was created by
rerunning online transactions and batch jobs.

5.3.2.1 What to Do about Utilities in Progress


If any utility jobs were running after the last time that the log was off-loaded
before the disaster, you might need to take some additional steps. After
restarting DB2, only the following utilities need to be terminated with the TERM
UTILITY command:
• CHECK INDEX
• MERGECOPY
• MODIFY
• QUIESCE
• RECOVER
• RUNSTATS
• STOSPACE

It is preferable to allow the RECOVER utility to reset pending states. However, it


is occasionally necessary to use the REPAIR utility to reset them. Do not start
the tablespace with ACCESS(FORCE) since FORCE resets any page set exception
conditions described in “Data Base Page Set Control records” in appendix X of
DB2 UDB for OS/390 V6 Administration Guide, SC26-9003 (Volume 2).

For the following utility jobs, take the actions indicated.

Important
Remember that any inline image copy taken with COPY SPEC must be at the
recovery site or you will have to recover to a prior point in time.

CHECK DATA Terminate the utility and run it again after recovery is complete.
COPY After you enter the TERM command, DB2 places a record in the
SYSCOPY catalog table indicating that the COPY utility was
terminated. This makes it necessary for you to make a full image
copy. When you copy your environment at the completion of the
disaster recovery scenario, you fulfill that requirement.

82 SAP R/3 on DB2 for OS/390: Disaster Recovery


LOAD Find the options you specified in Table 1 on page 83, and take
the specified actions.

Table 1. Actions to Take When LOAD Is Interrupted


LOAD options
specified What to do
LOG YES If the RELOAD phase completed, then recover to the
current time, and recover the indexes.
If the RELOAD phase did not complete, then recover to a
prior point in time. The SYSCOPY record inserted at the
beginning of the RELOAD phase contains the RBA or LRSN.
LOG NO If the RELOAD phase completed, then the tablespace is
COPY SPEC complete after you recover it to the current time. Recover
the indexes.
If the RELOAD phase did not complete, then recover the
tablespace to a prior point in time. Recover the indexes.
LOG NO If the BUILD or SORTBLD phase completed, then recover to
COPY SPEC the current time, and recover the indexes.
SORTKEYS
If the BUILD or SORTBLD phase did not complete, then
recover to a prior point in time. Recover the indexes.
LOG NO Recover the tablespace to a prior point in time. You can
use TOCOPY to do this.

To avoid extra loss of data in a future disaster situation, run


QUIESCE on tablespaces before invoking LOAD. This enables
you to recover a tablespace using TORBA instead of TOCOPY.
REORG (user tablespace) For a user tablespace, find the options you specified in
Table 2, and take the specified actions.

Table 2 (Page 1 of 2). Actions to Take When REORG Is Interrupted


REORG options
specified What to do
LOG YES If the RELOAD phase completed, then recover to the
current time, and recover the indexes.
If the RELOAD phase did not complete, then recover to the
current time to restore the tablespace to the point before
REORG began. Recover the indexes.
LOG NO If the RELOAD phase completed, then recover to a prior
point in time. You can use TOCOPY or TORBA to do this.
If the RELOAD phase did not complete, then recover to the
current time to restore the tablespace to the point before
REORG began. Recover the indexes.
LOG NO If the RELOAD phase completed, then the tablespace is
COPY SPEC complete after you recover it to the current time. Recover
the indexes.
If the RELOAD phase did not complete, then recover to the
current time to restore tablespace to the point before
REORG began. Recover the indexes.

Chapter 5. Restarting from Remote Locations 83


Table 2 (Page 2 of 2). Actions to Take When REORG Is Interrupted
REORG options
specified What to do
LOG NO If the build or SORTBLD phase completed, then recover to
COPY SPEC the current time, and recover the indexes.
SORTKEYS
If the build or SORTBLD phase did not complete, then
recover to the current time to restore the tablespace to the
point before REORG began. Recover the indexes.
SHRLEVEL If the SWITCH phase completed, terminate the utility.
CHANGE Recover the tablespace to the current time. Recover the
indexes.
If the SWITCH phase did not complete, recover the
tablespace to the current time. Recover the indexes.
SHRLEVEL Same as for SHRLEVEL CHANGE.
REFERENCE

REORG (DB2 tablespace) For a catalog or directory tablespace, follow these


instructions:
Tablespaces with links cannot use online REORG. For those
tablespaces that can use online REORG, find the options you
specified in Table 2 on page 83, and take the specified actions.
If you have no image copies from immediately before REORG
failed, use this procedure:
1. From your DISPLAY UTILITY and DIAGNOSE output,
determine what phase REORG was in and which tablespace it
was reorganizing when the disaster occurred.
2. Run RECOVER on the catalog and directory in the order
shown in Section 2 of DB2 UDB for OS/390 V6 Utility Guide
and Reference, SC26-9015. Recover all tablespaces to the
current time, except the tablespace that was being
reorganized. If the RELOAD phase of the REORG on that
tablespace had not completed when the disaster occurred,
recover the tablespace to the current time. Because REORG
does not generate any log records prior to the RELOAD
phase for catalog and directory objects, the RECOVER to
current restores the data to the state it was in before the
REORG.
If the RELOAD phase completed, do the following:
a. Run DSN1LOGP against the archive log data sets from
the disaster site.
b. Find the begin-UR log record for the REORG that failed in
the DSN1LOGP output.
c. Run RECOVER with the TORBA option on the tablespace
that was being reorganized. Use the URID of the
begin-UR record as the TORBA value.
3. Recover or rebuild all indexes.
If you have image copies from immediately before REORG failed,
run RECOVER with the option TOCOPY to recover the catalog and
directory, in the order shown in Section 2 of DB2 UDB for OS/390

84 SAP R/3 on DB2 for OS/390: Disaster Recovery


V6 Utility Guide and Reference, SC26-9015. Recommendation:
Make full image copies of the catalog and directory before you
run REORG on them.

5.4 Advanced Disaster Recovery Planning


Given that you have successfully executed the procedure in the previous section,
can you shorten the recovery time or improve your currency or both? Some of
the procedures, which were identified initially in Chapter 2, “Disaster Recovery
Planning” on page 15, are described in more detail in the following sections,
though the discussion can still be termed “general.” To implement them you
may require knowledge that can be found in other redbooks, which are listed in
the bibliography.

5.4.1 Backup and Vaulting Procedures


To achieve vaulting you have different alternatives:
1. Manual vaulting
Manual vaulting is the most economical way to store your media off-site. It
is the method we have assumed is used for our disaster recovery in this
redbook.
The disadvantage of manual vaulting is the reduced data currency and the
longer recovery time as you normally ship tapes to the recovery site and
start the recovery from tape in the case of a disaster.
2. Electronic vaulting
The next “step up” involves electronic vaulting for backups and archive logs.
It is more expensive than manual vaulting because of the network costs.
Data loss is reduced to that of the current active log, though recovery can
take longer due to the extra log that must be passed. We assume you will
choose a later point-of-consistency in order to achieve consistency with SAP
R/3 and DB2 for OS/390. The recovery procedure used will be that of 5.3,
“Remote Site Recovery from Disaster at a Local Site” on page 64.
Most users vault the Remote Primary image copies and COPY 2 of the
archive log. In DB2 UDB for OS/390 Version 6 you can specify that COPY 2
of the archive is requested first.

5.4.2 Using a Tracker Site for Disaster Recovery


This section describes a different method for disaster recovery from that
described in 5.3, “Remote Site Recovery from Disaster at a Local Site” on
page 64. Unlike the point-of-consistency recovery we described in 5.3, “Remote
Site Recovery from Disaster at a Local Site” on page 64, use of a tracker site
requires that the point of truncation be the end of one or more archive logs.
Though the steps are similar to those you know, we refer you to DB2 UDB for
OS/390 V6 Administration Guide, SC26-9003, for the details on the procedure. In
this redbook, we will excerpt only the description and function of the tracker site.

Overview of the method: A DB2 tracker site is a separate DB2 subsystem or data
sharing group that exists solely for the purpose of keeping shadow copies of
your primary site′s data. No independent work can be run on the tracker site.

From the primary site, you transfer the BSDS and the archive logs, then the
tracker site runs periodic LOGONLY recoveries to keep the shadow data

Chapter 5. Restarting from Remote Locations 85


up-to-date. If a disaster occurs at the primary site, the tracker site becomes the
takeover site. Because the tracker site has been shadowing the activity on the
primary site, you do not have to constantly ship image copies, the takeover time
for the tracker site can be faster because DB2 recovery does not have to use
image copies.

The following topics are included in this section:


5.4.2.1, “Characteristics of a Tracker Site”
5.4.2.2, “Setting Up a Tracker Site”
5.4.2.3, “Establishing a Recovery Cycle at the Tracker Site” on page 87
5.4.2.4, “Maintaining the Tracker Site” on page 87
5.4.2.5, “The Disaster Happens: Making the Tracker Site the Takeover Site”
on page 87

5.4.2.1 Characteristics of a Tracker Site


Because the tracker site must use only the logs from the primary site for
recovery, you must not update the catalog and directory or the data at the
tracker site. The tracker site DB2 disallows updates. In summary:
• The following SQL statements are not allowed at a tracker site:
− GRANT or REVOKE
− DROP, ALTER, or CREATE
− UPDATE, INSERT, or DELETE
Dynamic read-only SELECT statements are allowed.
• The only online utilities that are allowed are REPORT, DIAGNOSE, RECOVER,
and REBUILD. Recovery to a prior point in time is not allowed.
• BIND is not allowed.
• TERM UTIL is allowed only for RECOVER, REPORT, and DIAGNOSE.
• The START DATABASE command is not allowed when LPL or GRECP status
exists for the object of the command. It is not necessary to use START
DATABASE to clear LPL or GRECP conditions, because you are going to be
running RECOVERY jobs that clear the conditions.
• The START DATABASE command with ACCESS(FORCE) is not allowed.
• Down-level detection is disabled.
• Log archiving is disabled.

5.4.2.2 Setting Up a Tracker Site


To set up the tracker site:
1. Create a mirror image of your primary DB2 subsystem or data sharing
group. This process is described in steps 1 through 4 of the disaster
recovery procedures in this chapter (as well as in DB2 UDB for OS/390 V6
Administration Guide). The reference cited includes such items as creating
catalogs and restoring DB2 libraries.
2. Modify the subsystem parameters as follows:
• Set the TRKSITE subsystem parameter to YES.
• Optionally, set the SITETYP parameter to RECOVERYSITE if the full image
copies this site will be receiving are created as remote site copies.
3. Use the access method services command DEFINE CLUSTER to allocate data
sets for all user-managed tablespaces that you will be sending from the

86 SAP R/3 on DB2 for OS/390: Disaster Recovery


primary site. Similarly, allocate data sets for any user-managed indexes that
you want to rebuild during recovery cycles. The main reason to rebuild
indexes for recovery cycles is for running queries on the tracker site. If you
do not require indexes, you do not have to rebuild them for recovery cycles.
For nonpartitioning indexes on very large tables, you can include indexes for
LOGONLY recovery during a recovery cycle, which can reduce the amount of
time it takes to activate the disaster site.
4. Send full image copies of all the primary site′ s DB2 data to the tracker site.
5. Tailor the installation job DSNTIJIN to create DB2 catalog data sets.

Important: Do not attempt to start the tracker site when you are setting it up.

5.4.2.3 Establishing a Recovery Cycle at the Tracker Site


When the tracker site has full image copies of all the data at the primary site,
you periodically send the archive logs and BSDSs from the primary site to the
tracker site and recover data from the log. See DB2 UDB for OS/390 V6
Administration Guide for details on how this can be accomplished.

5.4.2.4 Maintaining the Tracker Site


It is recommended that the tracker site and primary site be at the same
maintenance level to avoid unexpected problems. Between recovery cycles, you
can apply maintenance as you normally do, by stopping and restarting the DB2
or DB2 member.

If a tracker site fails, you can restart it normally.

Because bringing up your tracker site as the takeover site destroys the tracker
site environment, you should save your complete tracker site prior to takeover
site testing. The tracker site can then be restored after the takeover site testing,
and the tracker site recovery cycles can be resumed.

5.4.2.5 The Disaster Happens: Making the Tracker Site the


Takeover Site
If a disaster occurs at the primary site, the tracker site must become the
takeover site. See DB2 UDB for OS/390 V6 Administration Guide for details on
how this can be accomplished. After the takeover site is restarted, run
RECOVER jobs for log data or image copies that were enroute when the disaster
occurred.

5.4.3 Geographically Dispersed Parallel Sysplex (GDPS)


Geographically Dispersed Parallel Sysplex (GDPS) is a multisite application that
provides the capability to manage:
• The remote copy configuration and storage subsystems
• Automated Parallel Sysplex tasks
• Failure recovery
This management is from a single point of control. For information about GDPS,
the following IBM Position Papers can be consulted:
• Geographically Dispersed Parallel Sysplex: The S/390 Multi-site Application
Availability Solution (Executive Summary), GF22-5114
• Geographically Dispersed Parallel Sysplex: The S/390 Multi-site Application
Availability Solution (Detail), GF22-5063
One way of obtaining these papers is through the following Web pages:

Chapter 5. Restarting from Remote Locations 87


www.s390.ibm.com/marketing/gf225114.html
www.s390.ibm.com/marketing/gf225063.html

5.4.4 Asynchronous Remote Copy (XRC) Procedures


The premise of XRC, as applied to DB2 installations, is that all DB2 objects, the
active logs, and BSDS data sets are shadowed through DASD devices at two
physical locations. Program libraries are also shadowed, either with the DB2
elements, or with other OS/390 system libraries. The System Data Mover (SDM)
software recognizes a source and target device. The source device is a volume
on either a 3990 Model 6 Control Unit or a 2105 Enterprise Storage Server (ESS);
the target device is a volume on either a 3990 Model 6 Control Unit, an ESS, or
RVA. They can be kept synchronized through timestamps obtained through
DFSMSdss associated with a “consistency group.” Copying is performed by the
system data mover (SDM) software on a track basis. The consistency group
provides assurance that all the tracks associated with the group will be
synchronized with the earliest time stamp of any of them. The synchronization is
performed by the SDM software.

How current will the data be? It depends on the bandwidth across which the
data flows and the distance it must travel. It can be from a few milliseconds to
several seconds in a well-tuned XRC system, with channel extenders. Due to the
latter, there is no distance limit at which the receiving control units must be
placed. For devices connected through ESCON directors, the current distance
limit is 43 km. Since the writes to the data are asynchronous, there is no
performance penalty paid at the local site.

The active system data mover, which is receiving tracks, is usually found at the
recovery site. The tracks are placed on devices of the same VOLSERs as exist
at the local site. There is no DB2 subsystem active. The data is not accessible
until a failure occurs. A consistency group can be comprised of a single DB2
subsystem or a DB2 data sharing group. In the latter case, a coupling facility is
required to be at the recovery site by the time the first DB2 is started. All DB2
subsystem and user data is expected to be in the same consistency group.
Since there is more complexity with DB2 data sharing, we have described each
procedure in a different section

5.4.4.1 Steps to Recover (Non-Data Sharing)


Most users test their recoveries by simply breaking the connections between the
primary site processor and the local DASD, then verifying that recovery
procedures function correctly.
1. None of the XRC-specific tasks necessary to bring the DASD volumes to a
point of consistency are described here. Please refer to Remote Copy
Administrator′s Guide and Reference for detailed description of the following:
• XSUSPEND - occurs as a result of breaking the connections between
control units. (This activity produces a timestamp which assures
replication integrity of the consistency group).
• XEND - operator command to terminate XRC.
• XRECOVER - operator command to change the secondary volumes to
primary volumes.
2. IPL the OS/390 image.
3. The DSNZPARM member should be that in use at the local site. There is no
DEFER ALL parameter, since there will be no conditional restart.

88 SAP R/3 on DB2 for OS/390: Disaster Recovery


4. If you are using the DB2 distributed data facility, run the change log
inventory utility with the DDF statement to update the LOCATION and
LUNAME values in the BSDS.
5. Start DB2.
Note: This is a normal restart, exactly the procedure that would be used if a
non-disaster DB2 or system crash occurred at the local site.
6. Following restart you will see DSNR002I. DB2 is now ready for new work.

5.4.4.2 Steps to Recover (Data Sharing)


Most users test their recoveries by simply breaking the connections between the
primary site processor and the local DASD, then verifying that recovery
procedures function correctly.
1. None of the XRC-specific tasks necessary to bring the DASD volumes to a
point of consistency are described here. Refer to Remote Copy
Administrator′s Guide and Reference, SC35-0169, for detailed description of
the following:
• XSUSPEND - occurs as a result of breaking the connections between
control units. (This activity produces a timestamp that assures
replication integrity of the consistency group.)
• XEND - operator command to terminate XRC.
• XRECOVER - operator command to change the secondary volumes to
primary volumes.
2. IPL the OS/390 images.
3. If data sharing members are in the same Parallel Sysplex, you must force all
DB2 structures in the coupling facilities at the recovery site. This is
necessary since the coupling facility is not synchronized with the DASD.
Important
Failure to force these structures is the most common reason for failure in
this disaster recovery procedure.

a. Enter the following MVS command to display the structures for this data
sharing group:
D XCF,STRUCTURE,STRNAME=grpname*
b. For group buffer pools and the lock structure, enter the following
command to force the connections off those structures:
SETXCF FORCE,CONNECTION,STRNAME=strname,CONNAME=ALL
Connections for the SCA are not held at termination, so there are no
SCA connections to force off.
c. Delete all the DB2 coupling facility structures by using the following
command for each structure:
SETXCF FORCE,STRUCTURE,STRNAME=strname
4. The DSNZPARM members used should be those in use at the local site.
There is no DEFER ALL parameter, since the DB2 restart is not conditional.
5. If you are using the DB2 distributed data facility, run the change log
inventory utility with the DDF statement to update the LOCATION, and
LUNAME values in the BSDS.
6. Start all DB2 members of the data sharing group.

Chapter 5. Restarting from Remote Locations 89


Note: This is a normal restart, exactly the procedure that would be used if a
non-disaster DB2 or system crash occurred at the local site.
Redbook Recommendation
A group restart will be performed since the structures were forced.
Expect to see at least one of these messages:
DSNR021I csect-name DB2 SUBSYSTEM MUST PERFORM GROUP RESTART
FOR PEER MEMBERS
DSNR022I csect-name DB2 SUBSYSTEM HAS COMPLETED GROUP RESTART
FOR PEER MEMBERS
If you do not see this, stop now and perform step 3c on page 89 in this
procedure to force existing structures and connections from the CF.
Failure to delete structures prior to restart is a common cause of disaster
recovery failure for data sharing users.

7. You will observe the addition of pages to the logical page list (LPL) during
forward and backward phases of DB2 restart. All data sets that were being
shared at the time will be set to GRECP exception condition.
8. After the restart(s) has completed, DSNR002I is displayed.
9. Issue the DISPLAY DATABASE(*) SPACE(*) RESTRICT LIMIT(*) command to list
the objects in GRECP/LPL.
10. Issue the following commands to recover the DB2 catalog and directory:
START DATABASE(DSNDB01) SPACE(*)
START DATABASE(DSNDB06) SPACE(*)
11. To recover from GRECP/LPL, issue the START commands in either of these
forms:
START DATABASE(*) SPACE(*)
This will recover any objects in GRECP/LPL
START DATABASE(data base name) SPACE(*)
This command can be issued for different databases on different members
for better recovery performance, if desired. The data base name is obtained
from the command issued in step 9. This recovery can take from a few
minutes to hours (if many thousands of data sets are in GRECP/LPL status).
It is a factor of the number of data sets being shared and the amount of log
data that must be passed since the last DB2 member checkpoints.
12. DB2 is now available for new work.

90 SAP R/3 on DB2 for OS/390: Disaster Recovery


Appendix A. Items Needed at the Recovery Site

To ensure a fast and smooth recovery, it is essential to have all required items
needed to recover the SAP R/3 on DB2 for OS/390 environment at the recovery
site. Some of the items might be kept at the recovery site, while others might be
brought to the recovery site in the case of a disaster. Of course, the latter must
be kept or stored outside the primary site to ensure that they are not affected or
damaged during the disaster. Everything at the primary site must be considered
to be lost or damaged in the case of the disaster and cannot be part of a
disaster recovery plan.

A.1 Hardware
All hardware necessary to recover your SAP R/3 on DB2 for OS/390 environment
must be installed at the recovery site or must be brought to the recovery site in
the case of a disaster. This includes:
• Database server (S/390)
• Application servers (RS/6000, RS/6000 SP or PC Server)
• Network infrastructure (cabling, hubs etc.)
• Peripherals (tape drives, printers etc.)

The configuration of the recovery hardware depends on the configuration at your


primary site and must be planned carefully to ensure that all components meet
the requirements (such as performance and disk space) and are compatible.

A.2 Software
To restore the data of the SAP R/3 on DB2 for OS/390 environment you need
backups of all data categories:
• System
• Infrastructure
• Application

Of course, which tapes you need depends on your backup and restore
procedures. When you use electronic vaulting some of the data might be
already on DASD. Based on the architecture of DB2 for OS/390 and SAP R/3 you
need the following backups:
• OS/390 environment
• DB2 for OS/390 environment
• SAP R/3 tablespaces
• Archive logs
• JCL to recover DB2 for OS/390 environment
• Operating system environment of the application servers (AIX or Windows
NT)
• SAP R/3 environment of the application servers
• Transport and Correction System
• Complementary software as needed

Besides the backups that contain your vital data and therefore are not
replaceable, you should have installation media from the software you use at the

 Copyright IBM Corp. 1999 91


recovery site. There are several reasons why you might need installation media
at the recovery site:
• In the case of a disaster you might operate from the recovery site a long
period. Within this time it might be necessary to install additional features.
• A backup tape might be damaged and you might be able to recover this data
with the installation media.
• There might have been changes in your primary environment or in the
recovery environment between the last test and the disaster that can be
fixed with the installation media.
• If your disaster recovery plan is to reinstall application servers, the
installation media for the operating system and for SAP R/3 are imperative.

The following installation media should be at the recovery site in the case of a
disaster:
• OS/390
• DB2 for OS/390
• AIX or/and Windows NT
• SAP R/3 on DB2 for OS/390 CDs
• Communication software
• Complementary software

Make sure that the installation media have the required software level and that
there is the appropriate hardware to read the media at the recovery site.

A.3 Skills
In the event of a disaster, the assumption that some skilled personnel will
survive is key to the recovery effort. Depending on how your recovery site is
managed and operated, the availability of skilled personnel at the recovery site
in the case of a disaster might become a critical factor. Usually personnel of a
company having a disaster travel to the recovery site in the case of a disaster
and start the recovery when they reached the recovery site. All travel
arrangements required during a disaster must be coordinated centrally, for
example by a travel agency.

When travelling becomes the critical path in the recovery time line or the need
for additional skilled personnel arises, changing the management and operation
of the recovery site or platform-trained contractors from outside sources should
be considered.

In all cases it is imperative that the personnel is efficiently trained to recover the
SAP R/3 on DB2 for OS/390 environment, especially when outside sources are
part of the recovery plan.

A.4 Plans and Manuals


To ensure the fast and smooth restart after a disaster, a well-documented
recovery plan for all components of the SAP R/3 on DB2 for OS/390 environment
is necessary. This plan must be kept off-site. When you use a tool to develop
and maintain your disaster recovery plan, make sure that it is printed and
shipped off-site after every change.

92 SAP R/3 on DB2 for OS/390: Disaster Recovery


Besides the recovery plan, all procedures used for the normal operation must be
available at the recovery site. After a disaster it might be necessary to operate
from the recovery site for a long period, and normal operations must be done
from the recovery site.

For troubleshooting in a post-disaster scenario, it is also advisable to have the


user manuals and reference guides of all software components installed at the
recovery site. In an SAP R/3 on DB2 for OS/390 environment this includes:
• OS/390 manuals and guides
• DB2 for OS/390 manuals and guides
• SAP R/3 manuals and guides
• AIX and/or Windows NT manuals and guides
• Manuals and guides of complementary software

Appendix A. Items Needed at the Recovery Site 93


94 SAP R/3 on DB2 for OS/390: Disaster Recovery
Appendix B. Sample Backup and Vaulting Plans

The entries in Table 3 support a data currency of 24 hours in a post-disaster


scenario assuming that it is possible to take a full copy of the database with
SHRLEVEL REFERENCE every night. The tapes are shipped to the recovery site
(manual vaulting).

Table 3. Vaulting Procedure 24 Hours Data Currency


Component Backup Procedure Backup and Vaulting Generations of tapes
Frequency kept at recovery site
S/390 operating system
environment
AIX or Windows NT tape after changes (at least at least two
operating system once every month)
environment
SAP R/3 database FULL YES, SHRLEVEL e v e r y night at least two
REFERENCE
Archive log files dump on tape e v e r y night at least two
SAP R/3 environment on tape after changes (at least at least two
application servers once every month)

The entries in Table 4 support a data currency of 30 minutes in a post-disaster


scenario assuming that it is not possible to take a full copy with SHRLEVEL
REFERNCE every night. The vaulting procedure in this example is shipping the
tapes to the recovery site (manual vaulting). For the archive logs, electronic
vaulting is used.

Table 4. Vaulting Procedure 30 Minutes Data Currency


Component Backup Procedure Backup and Vaulting Generations of tapes
Frequency kept at recovery site
S/390 operating system
environment
AIX or Windows NT tape after changes (at least at least two
operating system once every month)
environment
SAP R/3 database FULL YES, SHRLEVEL e v e r y Saturday at least two
REFERENCE
SAP R/3 database FULL NO, SHRLEVEL e v e r y weekday at least two
CHANGE
Archive log files electronic transfer every 30 minutes 5 days
(electronic transfer)
SAP R/3 environment on tape after changes (at least at least two
application servers once every month)

 Copyright IBM Corp. 1999 95


96 SAP R/3 on DB2 for OS/390: Disaster Recovery
Appendix C. Special Notices

This redbook is directed at customers and analysts who need to plan and
implement disaster handling at sites where SAP R/3 is installed with DB2 on the
OS/390 platform as the database server. The redbook helps database
administrators, SAP basis consultants, and system programmers understand the
activities necessary for implementing a disaster recovery plan in such an
environment. The information in this publication is not intended as the
specification of any programming interfaces that are provided by any of the
products mentioned in this document. See the PUBLICATIONS section of the
appropriate IBM Programming Announcement for more information about what
publications are considered to be product documentation.

References in this publication to IBM products, programs or services do not


imply that IBM intends to make these available in all countries in which IBM
operates. Any reference to an IBM product, program, or service is not intended
to state or imply that only IBM′s product, program, or service may be used. Any
functionally equivalent program that does not infringe any of IBM′s intellectual
property rights may be used instead of the IBM product, program or service.

Information in this book was developed in conjunction with use of the equipment
specified, and is limited in application to those specific hardware and software
products and levels.

IBM may have patents or pending patent applications covering subject matter in
this document. The furnishing of this document does not give you any license to
these patents. You can send license inquiries, in writing, to the IBM Director of
Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785.

Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact IBM Corporation, Dept.
600A, Mail Drop 1329, Somers, NY 10589 USA.

Such information may be available, subject to appropriate terms and conditions,


including in some cases, payment of a fee.

The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
(″vendor″) products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer
responsibility and depends on the customer′s ability to evaluate and integrate
them into the customer′s operational environment. While each item may have
been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Customers
attempting to adapt these techniques to their own environments do so at their
own risk.

Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of these
Web sites.

 Copyright IBM Corp. 1999 97


Reference to PTF numbers that have not been released through the normal
distribution process does not imply general availability. The purpose of
including these reference numbers is to alert IBM customers to specific
information relative to the implementation of the PTF when it becomes available
to each customer according to the normal IBM PTF distribution process.

The following terms are trademarks of the International Business Machines


Corporation in the United States and/or other countries:

AIX AS/400
DATABASE 2 DB2
DFSMS DFSMS/MVS
DFSMSdss ESCON
IBM Netfinity
OS/390 Parallel Sysplex
RAMAC RS/6000
S/390 Scalable POWERparallel Systems
SP SP1
SP2 Sysplex Timer
System/390 VTAM
VM/ESA

The following terms are trademarks of other companies:

C-bus is a trademark of Corollary, Inc.

Java and HotJava are trademarks of Sun Microsystems, Incorporated.

Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks
or registered trademarks of Microsoft Corporation.

PC Direct is a trademark of Ziff Communications Company and is used


by IBM Corporation under license.

Pentium, MMX, ProShare, LANDesk, and ActionMedia are trademarks or


registered trademarks of Intel Corporation in the U.S. and other
countries.

UNIX is a registered trademark in the United States and other


countries licensed exclusively through X/Open Company Limited.

SET and the SET logo are trademarks owned by SET Secure Electronic
Transaction LLC.

Other company, product, and service names may be trademarks or


service marks of others.

98 SAP R/3 on DB2 for OS/390: Disaster Recovery


Appendix D. Related Publications

The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.

D.1 International Technical Support Organization Publications


For information on ordering these ITSO publications see “How to Get ITSO
Redbooks” on page 101.
• Implementing SAP R/3 in an OS/390 Environment Using AIX and Windows NT
Application Servers, SG24-4945
• Database Administration Experiences: SAP R/3 on DB2 for OS/390, SG24-2078
• High Availability Considerations: SAP R/3 on DB2 for OS/390, SG24-2003
• Fire in the Computer Room - What Now?, SG24-4211
• Implementing the Enterprise Storage Server in Your Environment, SG24-5420
• Implementing DFSMSdss SnapShot and Virtual Concurrent Copy, SG24-5268
• Planning for IBM Remote Copy, SG24-2595
• FICON Planning Guide, SG24-5445

D.2 Redbooks on CD-ROMs


Redbooks are also available on the following CD-ROMs. Click the CD-ROMs
button at http://www.redbooks.ibm.com/ for information about all the CD-ROMs
offered, updates and formats.

CD-ROM Title Collection Kit


Number
System/390 Redbooks Collection SK2T-2177
Networking and Systems Management Redbooks Collection SK2T-6022
Transaction Processing and Data Management Redbooks Collection SK2T-8038
Lotus Redbooks Collection SK2T-8039
Tivoli Redbooks Collection SK2T-8044
AS/400 Redbooks Collection SK2T-2849
Netfinity Hardware and Software Redbooks Collection SK2T-8046
RS/6000 Redbooks Collection (BkMgr Format) SK2T-8040
RS/6000 Redbooks Collection (PDF Format) SK2T-8043
Application Development Redbooks Collection SK2T-8037
IBM Enterprise Storage and Systems Management Solutions SK3T-3694

D.3 Other Publications


These publications are also relevant as further information sources:
IBM Publications

• DB2 for OS/390 V5 Utility Guide and Reference, SC26-8967


• DB2 for OS/390 V5 Administration Guide, SC26-8957
• DB2 for OS/390 V5 Command Reference, SC26-8960
• DB2AUG V 1.3 DB2 Automated Utility Generator Reference Guide,
SC23-0591

 Copyright IBM Corp. 1999 99


• SAP R/3 on DB2 for OS/390: Planning Guide SAP R/3 Release
4.0B, SC33-7962
• SAP R/3 on DB2 for OS/390: Connectivity Guide, SC33-7965
• DB2 UDB for OS/390 V6 Administration Guide, SC26-9003
• Remote Copy Administrator′s Guide and Reference, SC35-0169
• DB2 UDB for OS/390 V6 Utility Guide and Reference, SC26-9015
• DB2 UDB for OS/390 V6 Installation Guide, GC26-9008
• DB2 UDB for OS/390 Data Sharing: Planning and Administration,
SC26-9007
• DB2 for OS/390 V5 SQL Reference, SC26-8966
• DB2 UDB SQL Reference Vol 1 V6, SC09-2847
• DB2 UDB SQL Reference Vol 2 V6, SC09-2848
• White Papers - available at Web pages:
www.s390.ibm.com/marketing/gf225114.html
www.s390.ibm.com/marketing/gf225063.html
− Geographically Dispersed Parallel Sysplex: The S/390
Multi-site Application Availability Solution (Executive
Summary), GF22-5114
− Geographically Dispersed Parallel Sysplex: The S/390
Multi-site Application Availability Solution (Detail), GF22-5063
SAP Publications

• BC SAP Database Administration Guide: DB2 for OS/390, 51001015


• R/3 Installation on UNIX DB2 for OS/390, 51002659

100 SAP R/3 on DB2 for OS/390: Disaster Recovery


How to Get ITSO Redbooks
This section explains how both customers and IBM employees can find out about ITSO redbooks, redpieces, and
CD-ROMs. A form for ordering books and CD-ROMs by fax or e-mail is also provided.
• Redbooks Web Site http://www.redbooks.ibm.com/
Search for, view, download, or order hardcopy/CD-ROMs redbooks from the redbooks Web site. Also read
redpieces and download additional materials (code samples or diskette/CD-ROM images) from this redbooks
site.
Redpieces are redbooks in progress; not all redbooks become redpieces and sometimes just a few chapters
will be published this way. The intent is to get the information out much quicker than the formal publishing
process allows.
• E-mail Orders
Send orders by e-mail including information from the redbook fax order form to:

In United States: e-mail address: usib6fpl@ibmmail.com


Outside North America: Contact information is in the ″How to Order″ section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/

• Telephone Orders

United States (toll free) 1-800-879-2755


Canada (toll free) 1-800-IBM-4YOU
Outside North America Country coordinator phone number is in the ″How to Order″ section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/

• Fax Orders

United States (toll free) 1-800-445-9269


Canada 1-403-267-4455
Outside North America Fax phone number is in the ″How to Order″ section at this site:
http://www.elink.ibmlink.ibm.com/pbl/pbl/

This information was current at the time of publication, but is continually subject to change. The latest information
may be found at the redbooks Web site.

IBM Intranet for Employees


IBM employees may register for information on workshops, residencies, and redbooks by accessing the IBM
Intranet Web site at http://w3.itso.ibm.com/ and clicking the ITSO Mailing List button. Look in the Materials
repository for workshops, presentations, papers, and Web pages developed and written by the ITSO technical
professionals; click the Additional Materials button. Employees may access MyNews at http://w3.ibm.com/ for
redbook, residency, and workshop announcements.

 Copyright IBM Corp. 1999 101


IBM Redbook Fax Order Form
Please send me the following:

Title Order Number Quantity

First name Last name

Company

Address

City Postal code Country

Telephone number Telefax number VAT number

• Invoice to customer number

• Credit card number

Credit card expiration date Card issued to Signature

We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.

102 SAP R/3 on DB2 for OS/390: Disaster Recovery


List of Abbreviations
ADSM automated data storage GbE Gigabit Ethernet
management
GBP group buffer pool (DB2)
AG Aktiengesellschaft (German
HR Human Resources (SAP
for Joint-Stock Company)
Application)
AIX advanced interactive
I/O input/output
executive (IBM′s flavor of
UNIX) IBM International Business
Machines Corporation
AM Fixed Assets Management
(SAP Application) ICF integrated catalog facility

ASCII American National Standard ICMF integrated coupling migration


Code for Information facility
Interchange IDCAMS the program name for access
BIA business impact analysis method services (OP SYS)

BLKSIZE block size IP internet protocol (ISO)

BSDS bootstrap data set (DB2) IT information technology

CD-ROM (optically read) compact disk ITSO International Technical


- read only memory Support Organization

CF coupling facility JCL job control language (MVS


and VSE)
CFCC coupling facility control code
km symbol for kilometer
CFRM coupling facility resource
management LAN local area network

CMOS complementary metal oxide LPAR logically partitioned mode


semiconductor LPL logical page list (DB2)
CO Controlling (SAP Application) LRSN log record sequence number
DASD direct access storage device (DB2)

DBA database administrator LUW logical units of work

DBIF database interface (SAP R/3) MB megabyte, 1,000,000 bytes


(1,048,576 bytes memory)
DBMS database management
system MM Materials Management (SAP
Application)
DBRM database request module
OC Office and Communication
DD data definition
(SAP Application)
DDF Distributed Data Facility
PC personal computer
(DB2)
PM Plant Maintenance (SAP
EA environment analysis
Application)
ESCON enterprise systems
PP Product Planning (SAP
connection (architecture, IBM
Application)
System/390)
PPRC Peer-to-Peer Remote Copy
ESS Enterprise Storage Server
PS Project System (SAP
FDDI fiber distributed data
Application)
interface (100 Mbps fiber
optic LAN) PSSP AIX Parallel System Support
Programs (IBM program
FI Financial Accounting (SAP
product for SP1 and SP2)
Application)
PTF program temporary fix
FICON Fiber Connection
(architecture, IBM QA Quality Assurance (SAP
System/390) Application)

 Copyright IBM Corp. 1999 103


RAID Redundant Array of SP IBM RS/6000 Scalable
Independent Disks POWERparallel Systems
(RS/6000 SP)
RAMAC RAID Architecture with
Multi-Level Adaptive Cache SQL structured query language
RBA relative byte address SVC supervisor call (S/390
instruction)
RRDF Remote Recovery Data
Facility (from E-Net Software) TCP/IP Transmission Control
Protocol/Internet Protocol
RVA RAMAC Virtual Array
UNIX an operating system
SCA Shared Communications Area
developed at Bell
(MVS/XCF Coupling Facility)
Laboratories (trademark of
SD Sales & Distribution (SAP UNIX System Laboratories,
Application) licensed exclusively by
SDM System Data Mover (software X/Open Company, Ltd.)
supporting XRC) VSAM virtual storage access
method (IBM)
XRC extended remote copy

104 SAP R/3 on DB2 for OS/390: Disaster Recovery


Index

A D
abbreviations 103 data
acronyms 103 application 21
AF_UEINT physical file system 14 backup and recovery 22
application data 21 categories 21
application programs 1 infrastructure data 21
application server interrelationships 22
middleware 1 logical copies 23
recovery planning 29 physical copies 23
recovery site number 61 readiness 26
recovery site requirements 61 system data 21
application services 1, 2, 4, 9 transport and storage 24
ARCHIVE LOG command 51 types 21
archive logs 66 data backup 20
area disasters 15 data backup/recovery process 20
data recovery 20
data sharing 64, 72, 74, 88
B database
backup administration 9
of data 20 server 7
options 22 service 1
backup/recovery OSS note 53 services 3
basis layer 1 Database Interface (DBIF) of SAP R/3 10
bibliography 99 database locking 4
BSDS 52, 65, 75, 77, 87, 88 database server
business impact analysis 18 hardware and software 62
DB2 for OS/390
archive logs 66
C BSDS 52, 65, 75, 77, 87, 88
capacity 29
buffer pools 72, 74
capacity planning 29
catalog 64, 73
catalog 84
catalog and directory 69
catalog (DB2) 64, 69, 73
catalog integrity 71
catalog (ICF) 72
conditional restart 53, 56, 63
catalog integrity (DB2) 71
data sharing 64, 72, 74, 88
categories of data 21
DDF 75
change log inventory utility 65
features 8
client/server
log 55
of SAP R/3 2
resource unavailable message 68
products 1
restart message 77, 78
server 7
SDS 75
command center 28
staff skill requirement 63
COMMIT WORK 7
with SAP R/3 3
compression 9
DDF 75
concurrent copy 23
DFSMSdss 88
conditional restart 63
directory 84
conditional restart (DB2) 53, 56
directory (DB2) 69
consistency group 88
disaster
copies, point-in-time 22
application servers 29
COPY utility 45
area 15
cost of ownership 8
local site 15
coupling facility 72, 88
preventive measures 16
site 15
types 16

 Copyright IBM Corp. 1999 105


disaster recovery message (continued)
aspects 15 DSNT700I 78
capacity 29 DSNU866I 69, 79
decision 17 DSNU867I 69, 79
design 15 resource unavailable 68
design process 17
plan 17
scope 20 N
speed 26 network design 29
strategy 18
disaster scope 28
distance between sites 28
O
OSS note 83000 53
DSNZPARM 66, 67, 71, 72, 76, 81, 82

E P
Parallel Sysplex 8
environment analysis 18
periodic backup 26
point of consistency 63, 64, 66, 67, 72, 79, 85
G point-in-time copies 22
GDPS 87 PPRC 25
group buffer pools 72, 74 presentation services 1, 9
printing 62

I
ICF catalog 64 Q
incremental copy 24 quiesce 65, 78, 83
index QUIESCE utility 45, 52
High-Speed UDP 4
nonpartitioning 87
rebuild 87
R
R/3
REBUILD INDEX 72, 82
See SAP R/3
start 72, 82
readiness of data 26
SYSIBM.DSNCT02 80
readiness of recovery site 26
SYSIBM.DSNPT01 80
realtime-remote-update 27
SYSIBM.DSNPT02 80
REBUILD INDEX utility 45
user-defined 80
RECOVER INDEX utility 45
indoubt unit of recovery 68, 78
RECOVER TABLESPACE utility 45
infrastructure data 21
recovery
installation
configuration 28
central instance 2
of data 20
interconnection 29
recovery site
IS (Industry Solutions) 4
alternate functions 61
ISPF 65, 74
application load 61
communications facilities 62
L components 61
local site disasters 15 connectivity features 62
locking 4 management 27
log (DB2) 55 operations 27
log data 24 ownership 61
LUW (logical unit of work) 5 peripheral equipment 62
remote options 27
TCP/IP values 61
M remote operations 27
message remote update 27
DB2 restart 77, 78 resource unavailable message 68
DSNR021I 77 risk analysis 16
DSNT500I 78

106 SAP R/3 on DB2 for OS/390: Disaster Recovery


roll-forward 26 utility (continued)
RRDF 26 RECOVER TABLESPACE 78
recovery actions 82
REORG 83, 84
S REPAIR 82
SAP R/3 REPORT 86
architecture 1, 2, 10 restart 71, 81
structure 1 termination 70, 79, 80, 82
System/390 structure 9
TCP/IP values 61
utilities 3 V
scope of the recovery 20 vaulting 85
SDM 88 VB tables 7
SHRLEVEL CHANGE 46 VBDATA 7
SHRLEVEL REFERENCE 46 VBHDR 7
site disasters 15 VBLOG 7
SnapShot 23 VBMOD 7
software services 1 volume dumps 23
speed of recovery 26 VSAM 64, 73
SQL 86
start database 86
STOGROUP 71, 81 W
storage of data 24 Web page
system data 21 E-Net Software RRDF 24
GDPS 87
redbook CDs 99
T SAP Industry Solutions 4
tablespace 71, 81 SAP-AG 4
takeover site 86, 87 Windows NT 9
tape drives 62 workload split 30
TCP/IP
communication between application and
presentation servers 4 X
multi-tier configurations 3 XRC 25, 88
recovery site parameters 61
tracker site 85, 86, 87
transport of data 24
types of data 21

U
utility
-TERM UTILILITY statements 63
ACTIVE 79
change log inventory 65, 67, 74
COPY 23, 72, 82
DELETE NOSCRATCH 64
DIAGNOSE 69, 79, 86
display 79
displaying inflight 69
DSN1LOGP 66, 75, 84
IDCAMS 65, 72, 73, 86
IEBGENER 66, 75
inflight 67
print log map 65, 66, 68, 74, 75
REBUILD 86
REBUILD INDEX 72, 78, 82
RECOVER 68, 69, 70, 77, 79, 80, 82, 84, 86
RECOVER INDEX 78

Index 107
108 SAP R/3 on DB2 for OS/390: Disaster Recovery
ITSO Redbook Evaluation
SAP R/3 on DB2 for OS/390: Disaster Recovery
SG24-5343-00

Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete this
questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com/
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to redbook@us.ibm.com

Which of the following best describes you?


__Customer __Business Partner __Solution Developer __IBM employee
__None of the above

Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Overall Satisfaction ____________

Please answer the following questions:

Was this redbook published in time for your needs? Yes____ No____

If no, please explain:


_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

What other redbooks would you like to see published?


_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

Comments/Suggestions: (THANK YOU FOR YOUR FEEDBACK!)


_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

_____________________________________________________________________________________________________

 Copyright IBM Corp. 1999 109


SAP R/3 on DB2 for OS/390: Disaster Recovery SG24-5343-00
Printed in the U.S.A.

IBM
SG24-5343-00

Das könnte Ihnen auch gefallen