Beruflich Dokumente
Kultur Dokumente
Ver s i o n 4 . 0 S P 0 1
User Guide
July, 2012
Double-Take RecoverNow Version 4.0 SP01 User Guide
Copyright Vision Solutions , Inc. 20032012
All rights reserved.
The information in this document is subject to change without notice and is furnished under a license agreement. This
document is proprietary to Vision Solutions, Inc., and may be used only as authorized in our license agreement. No portion of
this manual may be copied or otherwise reproduced without the express written consent of Vision Solutions, Inc.
Vision Solutions provides no expressed or implied warranty with this manual.
The following are trademarks or registered trademarks of their respective organizations or companies:
Vision Solutions is a registered trademark and ORION Solutions, Integrator, Director, Data Manager, Vision Suite,
ECS/400, OMS/400, ODS/400, SAM/400, Double-Take GeoCluster, Double-Take RecoverNow, Double-Take SHARE,
RecoverNow and iTERA HA are trademarks of Vision Solutions, Inc.
DB2, IBM, i5/OS, iSeries, System i, System i5, Informix, AIX 5L, System p, System x, and System z, and WebSphere
International Business Machines Corporation.
HP-UXHewlett-Packard Company.
TeradataTeradata Corporation.
IntelIntel Corporation.
LinuxLinus Torvalds.
OracleOracle Corporation.
SybaseSybase, Inc.
All other brands and product names are trademarks or registered trademarks of their respective owners.
If you need assistance, please contact Vision Solutions SCP Certified CustomerCare team at:
CustomerCare
Vision Solutions, Inc.
Telephone: 1.800.337.8214 or 1.949.724.5465
Email: support@visionsolutions.com
Web Site: www.visionsolutions.com/Support/Contact-CustomerCare.aspx
Contents
iii
Contents
iv
Contents
Contents
Contents
vii
Contents
viii
Contents
ix
Contents
Contents
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
scconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
scsetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
scrt_ra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
scrt_rc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Session restore targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Session Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Process Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Procedure Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
scrt_vfb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
sccfgd_cron_schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
sccfgd_putcfg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
sccfgchk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
sztool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
xi
Contents
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
xii
Introduction
The Double-Take RecoverNow User Guide describes how to install,
configure, maintain and administer Double-Take RecoverNow (hereafter
referred to as RecoverNow), data replication software. The table below
shows the chapters in the RecoverNow User Guide.
Chapter
Description
15
Chapter
Description
Overview
This chapter describes the organization and architecture of RecoverNow, a
continuous data protection system designed for immediate data recovery.
RecoverNow is a software only product for IBM servers running on IBM
AIX 5L, AIX 6.1, and AIX 7.1 operating systems.
RecoverNow provides these features in enterprise application environments
using standard systems hardware, commodity storage components, and
common networking infrastructure.
The RecoverNow data replication solution:
16
17
without impact to the production server. It also enables the trade off of
storage capacity for time, which is at the core of the RecoverNow system
and the source of its advanced functionality.
The applications and associated files and volumes that you want to
protect.
Archiving systems.
Related Topics:
18
RecoverNow Datatap
The AIX Logical Volume Manager (LVM) maintains the hierarchy of
logical structures that manage disk storage. RecoverNow kernel extensions,
or datataps, reside logically above the LVM layer inside the operating
system kernel. Furthermore, these datataps are logically below the file
system level and handle block level transfers. The datatap receives a buf
structure from the file system layer in the case of a file system write
operation or from the application in the case of a raw Logical Volume write.
Data is then processed and sent onto the LVM (Logical Volume Manager)
layer. For read operations from storage, data passes through untouched.
19
The datatap is loaded on both the production server and the recovery server.
On the production server, the datatap is responsible for splitting data write
operations. Each write results in a write to the intended protected volume as
well as to a redo log.
RecoverNow Journal
Two specific structures are used to contain the redo and undo logs in the
RecoverNow architecture. The After Image Log File Container (AILFC)
and Before Image Log File Container (BILFC) are used to hold these logs.
The entire set of logs is known as a journal to RecoverNow, and on the
recovery server, associated redo logs together with undo logs form the
journal. The journal is often illustrated as a single pool, and these logs are
20
RecoverNow Agents
block storage devices that do not interact with resident file systems or their
cache buffers.
RecoverNow Agents
RecoverNow uses the following agents:
Archive Agent (AA)A primary agent that runs on the recovery server
Restore Agent (RA)A primary agent that runs on the recovery server
LCA Agent
Shipping logs from the production server to the recovery server is the
responsibility of the LCA. The LCA reads from the journal any redo log
information that has been closed, or sealed, and this information is then
shipped over one or more IP networks to an agent that runs on the recovery
server. Both agents bind and communicate over the same socket. Socket
port addresses can either be default addresses or they can be
programmatically selected.
ABA Agent
On the other side of the socket and running on the recovery server, the ABA
is collecting log information. The ABA receives the redo log information in
the time order it was created on the production server, and then stores this
information in recovery logs. Remember, these are block storage devices
that do not interact with resident file systems. As the ABA receives the data,
it dynamically creates optimized State Map Transactions (SMTX). The
blocks identified are then sorted in ascending device/block order. Block
ordering is a more efficient organization for applying modifications to the
replicated data, or replica, on the recovery server.
Before the modifications are applied to the replica, yet another block
storage device is written to with information that would allow the replica to
step backward in time. This storage device is called the undo log and
appears to be nothing more than a logical volume to the volume manager.
Once the undo log information is saved on disk, the redo log can be applied
to the replica to bring it up to date with the data on the production server.
21
AA Agent
The AA, or Archive Agent, also runs on the recovery server. It is used to
extend RecoverNows rollback capabilities by recording redo and undo logs
to media. The AA currently works with Tivoli Storage Manager (TSM). The
AA uses the TSM API to send archive requests to TSM. When the logs are
archived, they are always spooled in pairs. Depending on the TSM
configuration the data is stored on media. A redo and an undo log are
always together when the AA stores on media. This gives RecoverNow the
ability to restore the data to any point in time. By unwinding the data with a
course grained undo log, then applying fine grained redo log information to
the log, the state of the replicated data can be restored to any point in time.
RA Agent
Restoration is handled by the RA and runs on the recovery server. It does
not, however, run continuously like the other agents. It can
programmatically be executed from the command line or through the GUI.
The RA deals with the following types of restore operations:
Virtual restores occur on the backup server. Production restores are restores
in which all volumes defined in a context are rolled back together on the
production server.
22
Replication
Replication
RecoverNow runs automatically on a production server, creating a mirrored
copy of protected data on the recovery server. For increased availability, it is
recommended that the recovery server be a remote machine. The following
illustration shows storage data flow during RecoverNow replication:
Data does not pass through the datatap on the recovery server.
The ABA sweeps through the log files in time order and uses the metadata
reads from the replica to calculate the amount of change required to apply
the working log file and stores this information in the undo log. The ABA
then reads from the redo log and applies the modification in block order to
the replica.
Journal Configuration
RecoverNow uses the following journals:
23
Production Journal
The production journal holds redo log buffers until the logs are transferred
to the recovery server. Then the logs are available to receive new
application write data. Sizing the journal properly prevents the recovery
server from falling so far behind the production server that dynamic
recovery must occur for the recovery server to catch up. If the journal is too
small, then transfers between the production server and the recovery server
are performed more frequently than is efficient. If the journal is too big,
then the recovery server may fall so far behind the production server that
dynamic recovery must occur.
The appropriate size of the production journal is proportional to the length
of network or recovery server downtime that RecoverNow can sustain
without falling into dynamic recovery, or the amount of data in write
throughput spikes that exceed system bandwidth that RecoverNow can
sustain without falling into dynamic recovery.
Recovery Journal
The recovery journal is on the recovery server, and holds redo and undo
logs, that act as RecoverNows internal rollback window. If you are using
external archive media such as tape, then the size of the journal on the
recovery server is not critical to the ability to restore data. The larger the
recovery journal, the larger the internal rollback window, which implies
faster access to redo and undo logs during production restores in that
window.
24
Maximum log size is one-half of the available RAM but not greater
than 512 MB
To calculate log size, you need an estimate of average write throughput, and
the required processing rate. For the required processing rate, if
RecoverNow processes one log every 60 seconds, the replica will be one
minute behind the production system.
Number of Logs
number of logs = (journal size) / (log size)
Even though the calculation for the number of log files appears trivial, keep
in mind that the number of log files can affect performance. If enough log
files are available on the production server, RecoverNow does not have to
rely on state maps during an outage, because it has not run out of log files to
take in data. A state map contains information about data changes for each
storage device protected by RecoverNow. It can be used to reconstruct data
changes if the underlying data is corrupted or lost. During peak usage, when
an application is writing data faster than the network can transmit, extra log
files enable the system to buffer during these peak periods without having to
rely on state maps, eliminating the risk of a restore blackout window. On the
recovery server, a sufficient number of log files allows activity to be
buffered in the event that the tape drive or library is taken offline.
25
RecoverNow Snapshots
Snapshots use significantly less space and are more efficient than data
mirrors. A mirror is an up-to-date copy of data for a logical volume. Two or
more complete copies can exist at the same time, although only one copy is
seen or used by an application, so mirrors require double or more the
amount of disk space than the original data.
A snapshot is a view of data at a specific point in time, much like a
photographic image is a snapshot of physical images at a particular point in
time. You can use snapshots to validate data before you save it to permanent
storage, data mine and generate reports, and retrieve specific data items.
Snapshots are stored in a different location than the replica so that the
replica can continue to march along in time. The snapshot, however, is
frozen with respect to the replica. Again, using the analogy of a photograph,
you can now draw on the photograph and it does not effect the original
subject of the photograph. The ability to modify the snapshot is
accomplished by using a copy-on-write log file.
Notice from the above figure that data is passing through the datatap on the
recovery server in the case of reads and writes to snapshot data.
RecoverNow uses a different set of device minor numbers when dealing
with snapshots, so that the datatap knows which log files to access in a
specific order. For example, when a write operation is directed at the
snapshot it is actually written to the copy-on-write (COW) log instead. If
the data has not been modified, then a read operation would come from the
26
Recovery
snapshot. If the data has been modified, then the read would come from the
copy-on-write log. Keep in mind that the snapshot is the representation of
the application data at a specific point in time.
Related Topics:
Recovery
Generally, there are two types of recovery restorations. A production restore
is a rollback in time which takes place in the protected volumes on the
production server. The other type of restore, a virtual restore, is a rollback
in time which is executed over a read-writable virtual image of the protected
volumes which reside on the recovery server.
For a production restore, RecoverNow must have exclusive I/O access to
the protected volumes. The application must be stopped, and the file
systems must be unmounted. RecoverNow is the only process that should
be allowed to write into the protected volumes during a production restore.
The control over the protected volumes and the information stored by the
RecoverNow process allow an undo of data corruption faster than the
corruption occurred.
Production restores are useful for a database crash where the database
will not come up. By recovering an image of the actual production database
to some point in the past directly on the production disk itself, RecoverNow
can rollback a crashed database in minutes rather than hours or days for the
most disastrous operational situation a database can encounter.
In contrast, a virtual restore is useful for database repair. In this case, an
image of the database is rolled back to some point in the past on the
snapshot which resides on the recovery server. Select pieces of the data can
then be extracted and copied into the production database.
Related Topics:
27
a single, unified view. VSP also provides services and portlets for
performing activities common to products.
Any Vision Solutions product that provides a portal application at a
compatible version to VSP can be used. A portal application includes the
graphical interface and supporting functionality for the products use within
VSP.
VSP is quick to configure and easy to customize. Portal connections define
how VSP connects to nodes in your enterprise. When configuring an
instance of a product, you identify the portal connections the instance will
use to connect to, retrieve data from, and perform actions for the product.
28
Use this chapter to prepare your RecoverNow system for its initial
configuration.
This chapter contains:
29
The production journal is the storage that contains all of the logs. A single
log is transferred to the recovery server when that log is filled. For example,
if each LFC is 64MB and there are 100 production LFCs, then the
production journal is 6400MB. When the current LFC is filled with
approximately 64MB of write I/O data (there is some additional metadata),
it will be transferred to the recovery server.
All logs in the production journal are redo logs. They contain information
that moves the disk image of the application forward through time when the
information is applied. This is called rolling forward.
Half of the logs in the recovery journal are redo logs, and half are undo logs.
Undo logs contain information that moves a disk image of the application
back through time when they are applied. This is called rolling back.
The recovery server also contains the snapshot journal. The snapshot
journal is the space on the recovery server where RecoverNow stores
copy-on-write information and write-cache data for snapshots.
The following table shows the variables that are used for estimating journal
sizes and log sizes. You need these estimates in order to configure
RecoverNow:
Concept
Meaning
Throughput
Average
throughput
Peak throughput
Peak duration
Bandwidth
Use a tool such as iostat to estimate throughput. You can also use
the Sizing tool to estimate throughput. For more information, refer
to Using the Sizing Tool to Calculate LFC Size on page 43.
30
The goal of sizing the production journal properly is to prevent the recovery
server from falling so far behind the production server that dynamic
recovery must occur for the recovery server to catch up. If the production
journal is too small, then transfers between the production server and the
recovery server are performed more frequently than is efficient.
The appropriate size of the production journal is proportional to one of the
following:
RecoverNow must not fall into dynamic recovery when write spikes
exceed bandwidth.
31
Throughput
32
You can also use the Sizing tool to calculate write journal pool
size. For more information, refer to Using the Sizing Tool to
Calculate LFC Size on page 43.
33
NOTE
Use the following equation to ensure that the space you allocate
for LFCs coincides with the physical partition size of the VG
where the LFC LVs are allocated. This enables you to utilize all
the space in a LV. This is not a requirement, you can elect to not
utilize all the available space in a LV.
y = (number of LFCs / number of LFC LVs) * LFC size
Where y
should be evenly divisible by the physical partition size of the VG
where the LFC LVs are allocated.
34
35
Description
End_of_day
DB_Migration
DB Migration completed
36
If rtstop fails for a Context ID it will be marked failed and then continue
to do rtstop on any remaining Context IDs. After all the Context IDs are
processed the exit status will be set to 0, or a value equal to the number
of Context IDs that were previously marked as failed. The reason for failure
will be recorded in the /usr/scrt/log/rn_shutdown.out file. A non 0 exit
will abort the AIX shutdown.
To have /usr/scrt/bin/rn_shutdown execute when the AIX shutdown
command is executed, it must be called from /etc/rc.shutdown.
For example, add the following to /etc/rc.shutdown.
#!/bin/ksh
/usr/scrt/bin/rn_shutdown
if(($? != 0))
then
printf "ERROR: rn_shutdown failed aborting.\n"
exit 1
fi
37
Customer Information
Customer Information
38
Customer Information
Is DNS enabled?
Is Network Information Services
(NIS) mounted?
Is NIS exported?
Customer Information
Customer Information
39
Customer Information
Customer Information
Release:
Policy domain
Policy set
Management class
Backup copy
Archive copy
Storage pools:
Disk
Tape
Policy domain:
Type
Number
Shared?
40
Customer Information
41
42
Running the Sizing Tool from the RecoverNow Sizing Tool GUI on
page 44
43
The RecoverNow Sizing Tool GUI window displays. The first tab,
Introduction displays, by default.
There are four RecoverNow Sizing Tool GUI tabs:
Introduction Tab
The Introduction page describes how you use the sizing tool. For detailed
information, click Help. This button displays the URL to access the Vision
Solutions Support web site. From this site you can download documentation
that describes how you use the sizing tool. In addition, you are provided
with CustomerCare support email and phone numbers. Click Exit to exit the
RecoverNow Sizing Tool GUI.
44
45
To select individual LVs, use the check box next to the LV, to select the
LVs that will be protected by RecoverNow.
Click the Run Disk Discovery Again button to re-discover the LVs.
The table below describes the parameters that you can modify:
46
Parameter
Description
Collection Interval
Count
Collection Interval
Minutes
Parameter
Description
Specifies the size for the RecoverNow LFC. The default value
is 16 MB.
Replication Outage
Hours
Specifies the hours that the production server can not send LFCs
to the replicated server. When this occurs, the LFCs will begin
to backup on the Production Sever, until there are no more LFCs
available. Once RecoverNow runs out of LFCs, it marks the
regions which require synchronization in the state map as dirty.
These dirty regions will automatically be synchronized when
the LFCs become available. CDP functionality will resume as
soon as the resynchronization completes. More LFCs are
required as outage time increases. The default value is 8 hours.
CDP Window
Hours
Snapshot Duration
Hours
The Run button becomes active, when you select an LV(s) and specify
values for the LV parameters.
NOTE
Before you click the RUN button, start your application on the
selected LVs, and ensure that your application has a heavy load, so
the tool collects enough data to reflect the activity for a worst case
scenario.
47
NOTE
48
Detailed logs from latest runThis section shows a scrollable text area
containing detailed statistics for the sztool script sizing log file. The log
file name is /tmp/sztool/sztool.log Click the Display Log button to
display the results derived from the original log file. The columns show:
Logical Volume
49
IO Count
Kb read
Kb written
Kbps (kilobit per second)
Try different parameters to get results from the already collected data.
You can edit the parameters shown below to see different log file
results.
Lfc Size (MB) Low. Refer to Lfc Size (MB) on page 47.
CDP Window Hours. Refer to CDP Window Hours on page 47.
Replication Outage Hours. Refer to Replication Outage Hours on
page 47.
Snapshot Duration Hours. Refer to Snapshot Duration Hours on
page 47.
To see different log file results:
1. Change the LFC size and CDP Window Hours.
2. Click the Show results with above parameters button.
The sztool script executes against the collected data to display log
file information in the Results section and the Detailed logs from
last run section.
NOTE
50
Before you run the Sizing tool you must have performed the
installation steps described in Installing the Sizing Tool on
page 44.
1. To start the RecoverNow Sizing Tool, type
usr/scrt/sztool/sztool from the command line. The tool
automatically creates a working directory /tmp/sztool, a config file
/tmp/sztool/sztool.cfg and a diskinfo file /tmp/sztool/diskinfo.
The diskinfo file contains a list of all LVs on the system. The user
selects which LVs should be protected by RecoverNow. The working
directory is for storing the config file, log file and tmp files. The
51
configuration file is for user to specify LV names and other run time
parameters.
2. Review the diskinfo file and determine which LVs RecoverNow should
protect.
3. Modify the parameters in the sztool.cfg configuration file shown
below:
LVs_1=testloglv testlv
LVs_2=
LVs_3=
LVs_4=
LVs_5=
LVs_6=
LVs_7=
LVs_8=
LVs_9=
LVs_10=
Collection_Interval_Count=24
Collection_Interval_Minutes=60
Lfc_Size_MB=16
Replication_Outage_Hours=8
CDP_Window_Hours=8
Snapshot_Duration_Hours=8
The table below describes the configuration file parameters that you can
modify:
Parameter
Description
LVs_1=
LVs_2=
.....
LVs_10=
52
Collection_Interval_Count
Collection_Interval_Minutes
Lfc_Size_MB
Parameter
Description
Replication_Outage_Hours
CDP_Window_Hours
Snapshot_Duration_Hours
4. Start your business application on the selected LVs. The load of the
businesses application should be as close to the worst case scenario to
ensure a meaningful result.
5. Type /usr/scrt/sztool/sztool from the command line to restart
the sztool. The sztool runs in background and you can check the results
in the /tmp/sztool/sztool.log. Check the
/tmp/sztool/sztool.log file to be sure the tool is running, one
collection interval is required before LV I/O data is written to the
/tmp/sztool/sztool.log. Check the
/tmp/sztool/sztool.log after the last collection interval count for
the final result. It is safe to log out from the terminal because sztool
uses nohup. The process will take 24 hours with the default collection
interval count of 24 and collection interval of 60 minutes.
6. When the tool completes, check the log file or the AIX window. At the
bottom of the log file or AIX window, the "<<---------<" lines
indicate the production and recovery server number of LFCs, and the
percentage of Write Journal (WJ). The log file also contains detailed
information for each LV IO statistics for each data collection interval.
The standard log file is called sztool.log. An additional copy of the log
file is also created, sztool.log-MM_DD_YYYY-HH:MM:SS. For
example: sztool.log-02_19_2010-14_22_19, where HH is the 24 hour
interval.
53
54
Description
sztool
If issued for the very first time, the working directory, diskinfo
file and sztool.cfg file are generated. You should review the
diskinfo file and then modify sztool.cfg, accordingly. You can
then re-run sztool.
sztool -c
sztool -d
sztool -g
sztool -h
sztool -l
When the log file is created, this command prints out the
calculation results for different LFC sizes based on the existing
log file. For example, sztool -l32 prints out the results when the
LFC size is at 32M. sztool -l16 -l512, prints out all the
calculation results from 32MB to 512MB. You cannot have
spaces between -l and the LFC size number. Only screen output,
there is not any delay or sleep.
sztool -r
sztool -s
sztool -x
Executes the sztool and prints the file name and line number of
the statement for debugging purposes. For debugging, use
sztool_main -x to view screen output.
sztool -v
Supported Configurations
Production
Server
LAN
Recovery
Server
Datatap
55
LAN
Production
Server
WAN
Recovery
Server
Replicated Server
Production
WAN
Remote Recovery
Server
Server
56
NOTE
LAN
Production
Server
Recovery
Server
LAN
Production
Server
Local Recovery
Server
WAN
Replicated
Server
57
58
Installation Procedures
Overview
This chapter describes RecoverNow, RecoverNow Portal Application and
Vision Solutions Portal (VSP) installation procedures.
Before you begin, ensure that you review support information, system
requirements, and decide on your preferred configuration. One you have
installed the RecoverNow components you can work with VSP. Refer to
Logging in to Vision Solutions Portal on page 127.
59
60
The calculation for the undo and redo logs is based on the required
recovery window and the network outage protection size. Refer to
Determining Storage Requirements on page 34. If you use the
snapshot journal, ensure that you take into account its size. Refer to
Guidelines for Snapshot Journal Size on page 33.
The required disk space to install VSP and the RecoverNow portal
application is 280 MB in /opt.
Base RecoverNow
You must install the Base RecoverNow software on each AIX cluster
node directly.
The scrt user and group are created for RecoverNow.
Sizing Tool
61
You can use the Sizing Tool to calculate configuration values before
RecoverNow is installed. However, it is also useful to run the tool after
RecoverNow is installed to determine if the number of LFCs or WJ
percentage needs to be adjusted. Refer to Using the Sizing Tool to
Calculate LFC Size on page 43.
Documentation
Documentation is available as.pdf files.
Either ssh and scp or rexec and rcp must be allowed. If ssh fails then
rexec and rcp is used.
To use rcp the ~root/.rhosts file must have the local host and user name.
Check /etc/services to find the ports used by exec and shell and check
that those ports are not blocked.
This requires that the echo port is not blocked. This is usually defined
as port 7 in /etc/services
NOTE
If you exit the installation wizard you can view the detailed errors
in the RecoverNow_4.n.n.n_Install.log
62
NOTE
63
10. On the License Key Locations screen, specify or browse to the location
of the license key file for RecoverNow obtained from Vision Solutions
for each node.
11. Click Next.
12. On the License Key Check screen, click Next to display the Summary
screen.
13. To specify another node click Previous. To cancel, click Cancel.
AIX Installation
This section describes:
64
AIX Installation
3. Click Next.
The Terms And Conditions screen displays. Read and accept the terms
of the License Agreement.
4. Click Next.
The Select Product screen displays.
65
5. Select either:
Double-Take RecoverNow and Vision Solutions Portal
(recommended) Installs Double-Take RecoverNow, the portal
application, and optionally, the Vision Solutions Portal to each node.
Proceed to step 6 on page 66.
Vision Solutions Portal and the Double-Take RecoverNow portal
applicationInstalls only the portal application and VSP, if
necessary. Proceed to Install only the Vision Solutions Portal and
the Double-Take RecoverNow Portal Application on page 77.
6. Select Double-Take RecoverNow and Vision Solutions Portal
(recommended).
7. Click Next.
The Install or Upgrade screen displays.
66
AIX Installation
67
10. Enter the node namePoints the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
11. Click Test ConnectionA dialog displays indicating whether or not the
connection to the node was successful.
12. Click Next.
The Node Login screen displays.
68
AIX Installation
69
18. Enter the node name or IP address, and click Add to have RecoverNow
software installed on this node.
19. Click Next.
The Node Login screen displays.
70
AIX Installation
71
23. Select the node for which the documentation will be installed.
24. Click Next.
The License Key Location screen displays.
72
AIX Installation
73
without license keys but cannot use it until valid license keys are
applied. Proceed to step 26 on page 74.
Select Contact Vision Solutions to get new license keys and
click Next, the Contact Vision Solutions screen displays. Use one
of the following methods to procure license keys.
On the InternetLog in to your account at:
VisionSolutions.com/SupportCentral
EmailCopy and paste the information on the Contact Vision
Solutions screen into your email message. When you contact
Customer Accounting to request a license, you will be asked
to provide the machine ID (uname -m) of your servers along
with the hostname, and your OS. Email your information to
Customer Accounting at support@visionsolutions.com and
request a license key. A license file will be generated and
emailed to you.
Telephone(800) 337-8214
NOTE
Once you procure the license keys from Vision Solutions, click
Next on the License Key Location to continue the installation.
Proceed to step 26 on page 74
26. Click Next.
NOTE
74
AIX Installation
75
76
AIX Installation
77
4. Enter the node namePoints the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
5. Click Test ConnectionA dialog displays indicating whether or not the
connection to the node was successful.
6. Click Next.
The Node Login screen displays.
78
AIX Installation
79
11. Enter the node name or IP address, and click Add to have RecoverNow
software installed on this node.
12. Click Next.
The Node Login screen displays.
80
AIX Installation
81
16. After you have installed the RecoverNow Portal Application, and the
Vision Solutions Portal (VSP) you can log into VSP. You can select one
of the highlighted nodes to launch VSP and log into VSP. See Logging
in to Vision Solutions Portal on page 127.
17. Click Done to exit the installation wizard.
82
AIX Installation
User Roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
Reinstall RecoverNow
Before you reinstall RecoverNow:
You must stop your application and RecoverNow on the node(s) where
RecoverNow is being reinstalled.
To reinstall RecoverNow:
1. In the section, Install Double-Take RecoverNow, the Vision Solutions
Portal and the Double-Take RecoverNow Portal Application on
page 64, perform step 1 through step 6.
2. From the Select Product screen select, Double-Take RecoverNow and
Vision Solutions Portal (recommended) and click Next.
The Install or Upgrade screen displays.
83
3. Click Next.
The Specify Node screen displays.
4. Enter the node namePoints the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
5. Click Test ConnectionA dialog displays indicating whether or not the
connection to the node was successful.
84
AIX Installation
6. Click Next.
The Node Login screen displays.
85
86
AIX Installation
87
88
AIX Installation
17. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the installation wizard has verified that RecoverNow has been
shutdown the Shutdown Verification Complete screen displays.
89
20. Click Next to reinstall the portal application and the Vision Solutions
portal.
The Shut Down Vision Solutions Portal screen displays.
90
AIX Installation
21. Select Yes, to shut down the Vision Solutions Portal and install the
Vision Solutions Portal and the portal application.
NOTE
You can also decide to skip the Vision Solutions Portal and the
portal application reinstall.
The Installing Vision Solutions Portal screen displays.
When the Vision Solutions Portal and the portal application reinstall
completes, a screen briefly appears stating that the reinstall was a success.
Then the Installation Complete screen displays.
91
22. After you have reinstalled RecoverNow, you can log into VSP. You can
select one of the highlighted nodes to launch VSP and log into VSP. See
Logging in to Vision Solutions Portal on page 127.
23. Click Done to exit the installation wizard.
Upgrade RecoverNow
Before upgrading RecoverNow to the current version:
You must stop your application and RecoverNow on the node(s) being
upgraded.
92
AIX Installation
3. Click Next.
The Specify Node screen displays.
93
4. Enter the node namePoints the installer to the target node where you
want to install RecoverNow. Alternatively, you may enter an IP address.
5. Click Test ConnectionA dialog displays indicating whether or not the
connection to the node was successful.
6. Click Next.
The Node Login screen displays.
94
AIX Installation
95
15. Select the node for which the documentation will be installed.
16. Click Next.
The License Key Options screen displays.
96
AIX Installation
97
18. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the installation wizard has verified that RecoverNow has been
shut down, the Shutdown Verification Complete screen displays.
98
AIX Installation
99
100
AIX Installation
101
After you have upgraded the RecoverNow Portal Application, you can log
into VSP. See Logging in to Vision Solutions Portal on page 127.
102
103
53
61
For example cd 53
6. Choose the directory path for a 32 or 64 bit kernel:
esFiles/52/32/
esFiles/52/64/
esFiles/53/32/
esFiles/53/64/
esFiles/61/64/
esFiles /71/64
7. Enter the following command:
smit
104
105
11. Select the current directory as the INPUT device/directory and enter a
dot (.).
106
107
User Roles
The installation process creates the scrt group in /etc/group, identifying the
category of users allowed to access the portal application.
IMPORTANT
Log Files
There are two types of log files:
If you receive the License validation failed message ensure that the
information specified in license.inf is correct and that the output of the
hostname command matches the hostname in license.inf. If the problem
persists, email or contact Customer Support. Refer to the readme.txt file for
contact information.
License Expiration
When the license expires for RecoverNow, the application on the
production server is not affected. However, data replication to the recovery
server will be stopped, and you will no longer be able to use the Continuous
Data Protection functionality. You can check the license file for the data
replication component for information about the expiration of the license.
The file is named: /usr/scrt/run/node_license.properties.
108
NOTE
109
4. Click Next.
110
5. Enter the node namePoints the installer to the target node where you
want to uninstall RecoverNow.
6. Click Next.
The Specify Nodes screen displays.
7. Enter the node name or IP address, and click Add to have RecoverNow
software uninstalled on this node.
8. Click Next.
111
11. Select the components you want to uninstall from each node and click
Next.
112
12. You must manually ensure that RecoverNow is shutdown, the wizard
does not perform this task. Click Next.
Once the uninstall wizard has verified that RecoverNow has been
shutdown the Shutdown Verification Complete screen displays.
113
114
115
1. Enter smit
The System Management screen displays.
116
117
118
Install the Vision Solutions Portal and Double-Take RecoverNow Portal Application on Windows
NOTE
8. Click Next
The Terms And Conditions screen displays. Read and accept the terms
of the License Agreement.
119
9. Click Next.
The Checking Vision Solutions Portal screen displays.
10. Once the VSP and portal application status is verified, the Installation
Options screen displays.
120
Install the Vision Solutions Portal and Double-Take RecoverNow Portal Application on Windows
11. You can decide to start the portal server when Windows starts and after
the installation completes.
12. Click Next.
NOTE
121
15. After you have installed the RecoverNow portal application, you can
log into VSP. You can select the highlighted machine-name address to
launch VSP and log into VSP. See Logging in to Vision Solutions
Portal on page 127.
16. Click Done to exit the installation wizard.
122
Uninstall the Vision Solutions Portal and Double-Take RecoverNow Portal Application Uninstallation Wizard on Windows
3. Click Next.
123
4. Once the VSP and portal application status is verified, the Options
screen displays.
124
Uninstall the Vision Solutions Portal and Double-Take RecoverNow Portal Application Uninstallation Wizard on Windows
6. When the shut down of VSP completes, the Ready to Uninstall screen
displays.
7. Click Next.
The Preparing to Uninstall screen displays.
125
Post-Installation Tasks
These sections contain the post-installation tasks:
126
Post-Installation Tasks
After you have logged in, the portal opens to the Home page. A default
portal connection exists for the node on which you logged in.
127
For detailed information on working with VSP, refer to Getting Started with
Vision Solutions Portal (2.0.00.07_VSP_Getting_Started.pdf)
packaged with RecoverNow for AIX.
128
Post-Installation Tasks
129
130
After you install RecoverNow, VSP and the portal application, you can
use the RecoverNow Replication Group Wizard or the Command line to
configure new replication groups and change, rename, and delete existing
ones.
This chapter contains:
131
2. Click Configuration
The Replication Group Configuration window displays.
132
3. Click New.
This starts the Replication Group Configuration wizard and the New
Replication Group Servers panel displays.
133
Field
Description
ServersSection for specifying the host name or IP address for the servers in
this replication group.
Production
Recovery
Options are:
Select the portal connection
Specify, and enter the host name or IP address for the
server in the recovery role.
Recovery host
name or IP
address
Replicated
(optional)
Options are:
Replicated host
name or IP
address
Failover server
None
Select the portal connection
Specify, and enter the host name or IP address for the
server in the replicated role.
4. Click Next.
134
The New Replication Group Servers panel displays. Use this panel to
log into the failover server, specified in the previous panel.If you have
not already logged in to all of the nodes, this panel displays.
5. Specify the username and password and click Log In. Log in to each
server to retrieve information.
NOTE
The user must be either root or a user that is in the scrt group.
Passwords cannot be blank
6. Click Log In.
The New Replication Group Servers panel displays.
135
The New Replication Group login panel contains the following fields:
Field
Description
Log in Status
Server
User
Password
Log In
Status
7. Click Next.
136
The New Replication Group Names panel contains the following fields:
Field
Description
Replication
group name
Primary context
ID
Failover context
ID
8. Click Next.
137
9. Click Add.
10. The Add Logical Volumes dialog displays. For detailed information
refer to the RecoverNow online help.
138
Description
Logical Volume
Volume Group -
Production
Volume Group Recovery
139
Column
Description
Size (GB)
Type
Mount Point
FS Log
12. Click Next on the New Replication Group Logical Volumes panel.
The New Replication Group Replication IP Addresses panel displays.
Use this panel to specify IP labels or addresses that will be used
specifically for replication. By default, replication uses the IP addresses
of the servers. There are two options:
If you select, Use server IP addresses for replication, the New
Replication Group Replication IP Addresses panel displays in the
following format..
140
Field
Description
Replication IP
Addresses
Production
Server - IP
address
141
Field
Description
Recovery Server
- Host name
Recovery Server
- IP address
142
Field
Description
Server
Number of
containers
Number of
containers on
replicated
Size of each
container
Total size
Default volume
group
Logical volumes
Use alternate
volume groups
or physical
volumes for
replication
containers
143
NOTE
Description
144
Server
Total container
size
Displays the total space (in MB) required for the containers
on the server.
Volume Group
Add
Use this button to adds the volume group to the list and
defaults the physical volume to Any.
Volume Group
Field
Description
Physical Volume
Remove
145
Field
Description
Use compression
Send partially
filled containers
automatically
146
Field
Description
Frequency to
check
Minimum
percent filled
The New Replication Group Snapshot Buffer panel contains the following
fields:
147
Field
Description
Snapshot
Buffers - Size
Warning
threshold
Location
Production
Replicated
The New Replication Group Tivoli Storage Manager panel contains the
following fields:
148
Field
Description
Enable Tivoli
Storage Manager
(TSM)
Specify the user ID for TSM to use to log into the server
where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
Specify the password for TSM to use to log into the server
where the TSM client is running. Enabled only when Enable
Tivoli Storage Manager (TSM) is checked.
TSM server
149
The content of this panel is the same as the Configuration Summary section
in the Replication Group Configuration window. Refer to the RecoverNow
online help for additional information for the Replication Group
Configuration window. Refer to Replication Group Configuration
Window on page 169.
IMPORTANT
150
commands for each step. The table below describes steps and commands
that are run when you create a new configuration.
Order
Step
Command
1.
Save configuration.
2.
Copy configuration to S (S is
production server, recovery
server or replicated
server).You will see a row
for each server in the
configuration.
3.
4.
Notes:
These steps may be run on one or more servers.
step 4 only runs if a failover server was specified in the
configuration.
Related topics
151
You cannot change the replication group name or context ID with the
Change Replication Group wizard.
2. Click Configuration.
The Replication Group Configuration window displays. For detailed
information refer to the RecoverNow online help.
3. Click Change from the Actions dropdown. There are two possible
results:
a. The Change Replication Group dialog displays with a warning:
152
4. Click Next.
153
The Change Replication Group Servers panel displays. Use this panel to
log into the failover server, specified in the previous panel. This panel
displays if you are not logged in.
5. Specify the username and password and click Log In. Log in to each
server to retrieve information When you run commands, context IDs are
used to identify the replication group. The context IDs specified have
been defaulted to unique IDs on the servers in this replication group.
6. Click Next.
The Change Replication Group Servers panel displays.
7. Click Next.
154
8. Click Next.
The Change Replication Group Logical Volumes panel displays.
Click Add to add a logical volume. For details refer to step 10 on
page 138.
Select the logical volume you wish to change or remove for this
replication group and click Next.
155
Use the Remove Logical Volumes dialog, shown below, to remove the
selected logical volumes from the replication group. These logical
volumes will no longer be protected. For detailed information refer to
the RecoverNow online help.
156
157
IMPORTANT
158
159
160
IMPORTANT
161
162
Order
Step
Command
1.
Save configuration.
2.
Copy configuration to S (S
is production server,
recovery server or
replicated server). You will
see a row for each server in
the configuration.
sccfgd_putcfg -C <Context
ID>
3.
4.
Delete replication
container logical volumes
on S (S is production
server, recovery server or
replicated server). You will
see a row for each server in
the configuration.
5.
6.
Delete replication
container and snapshot
logical volumes on S (S is
recovery server or
replicated server). You will
see a row for each server in
the configuration.
Order
Step
Command
7.
8.
9.
10.
Delete replication
container, snapshot, and
log record file container
logical volumes.
11.
12.
13.
as requiring replication (X
is a device such as /dev/
rLV_p520-95_0001).
14.
15.
16.
Load drivers on S (S is
production server or
recovery server). You will
see a row for each server in
the configuration.
163
Order
Step
Command
17.
18.
19.
sccfgd_delcfg <Failover
Context ID>
NOTE
2. Click Configuration.
164
3. Click Rename from the Actions dropdown. There are two possible
results:
a. The Rename Replication Group dialog displays with a warning:
Replication group cannot be renamed because it is either active and
must be stopped, failed over, or the partition has been migrated
using live partition mobility. Use the Replication Group portlet on
the Replication page to stop the replication group.
165
Specify the new replication group name, and press OK. To view a
summary of the changes you made, refer to Replication Group
Configuration Window on page 169.
2. Click Configuration.
The Replication Group Configuration window displays. For detailed
information refer to the RecoverNow online help.
166
3. Click Delete from the Actions dropdown. There are two possible
results:
a. The Delete Replication Group dialog displays with a warning:
The Replication group cannot be deleted because it is either active
and must be stopped, failed over, or the partition has been migrated
using live partition mobility Use the Replication Group portlet on
the Replication page to stop the replication group.
167
168
Initialize a Context
Use the command line to initialize a context on the production, recovery,
and replication servers. Execute the following command on each server.
scsetup -C <Context ID> -M
169
170
171
172
LCM Command
Status
chlv
OK
chlvcopy
OK
chvg
OK
cplv
defragfs
OK
exportvg
OK
extendlv
OK
extendvg
OK
fileplace
importvg
OK
joinvg
lslv
OK
lsvg
OK
lvmstat
migratepv
mirrorvg
mirscan
OK
mklv
OK
mklvcopy
mkvg
OK
readlvcopy
OK
Support for LVM commands when the RecoverNow drivers are loaded
LCM Command
Status
redefinevg
OK
reducevg
OK
reorgvg
rmlv
rmlvcopy
OK
snapshot
Not supported
splitlvcopy
OK
splitvg
OK
syncvg
synclvodm
OK
unmirrorvg
OK
varyoffvg
varyonvg
OK
173
ID>aa_channel
ID>ra_channel
ID>ca_channel
ID>aba_dchannel
ID>lca_dchannel
ID>aa_achannel
ID>aba_channel
ID>lca_channel
5747/tcp
5748/tcp
5749/tcp
5750/tcp
5751/tcp
5752/tcp
5753/tcp
5754/tcp
#
#
#
#
#
#
#
#
Archive Agent
Restore Agent (scrt_rs)
Restore Client Agent
Assured Backup Agent
Log Control Agent
Archive Agent
Assured Backup Agent
Log Control Agent
6901/tcp
6902/tcp
6903/tcp
6904/tcp
6905/tcp
6906/tcp
6907/tcp
6908/tcp
#
#
#
#
#
#
#
#
sc250aba_channel
sc250lca_channel
sc250aa_channel
sc250ra_channel
sc250ca_channel
sc250aba_dchannel
sc250lca_dchannel
sc250aa_achannel
6909/tcp
6910/tcp
6911/tcp
6912/tcp
6913/tcp
6914/tcp
6915/tcp
6916/tcp
#
#
#
#
#
#
#
#
The default port assignment shown below for the Configuration Daemon is
added to the/etc/services file when RecoverNow is installed.
scconfigd
174
7835/tcp
Change the port number for the scconfigd entry in the /etc/services file
on all servers to an unused port number.
175
176
177
4. Click OK, to start the replication group for the selected servers.
The Start Replication Group dialog remains displayed until the action
completes successfully.
178
4. Click OK, to stop the replication group for the selected servers.
The Stop Replication Group dialog remains displayed until the action
completes successfully.
Steps 1 and 2 are done for the first start after RecoverNow is
configured.
1. Stop any applications that are using the RecoverNow PVS (Production
Volume Set) LVs (Logical Volumes).
2. Make sure that any filesystems associated with the PVS LVs are
unmounted. Use the AIX umount command.
rtumnt -C <Context ID>
179
This displays:
# rtstart -C 34
Loading Double-Take RecoverNow Recovery Server Drivers
Starting scrt_aba
180
181
182
Creating Snapshots
Overview
RecoverNow enables you to restore a complete copy of the data on the
production server to any time in the past. You can quickly restore a
database that has crashed and rollback the data to a point before a logical
corruption occurred.
This chapter contains:
Test whether the snapshot is the correct one to use for rolling back the
data on the production server.
183
184
Navigate to the Snapshot Details portlet, and select Create from the
dropdown.
185
186
187
188
The following shows two examples based on the different date formats.
If the content of /tmp/mdm
is...
%m/%d/%y %H:%M:%S
%y.%m.%d %H%M%S
where:
189
%mMonth
%dDay
%yYear
%HHour
%MMinute
%SSecond
IMPORTANT
2. Make sure that a snapshot does not already exist on the recovery server.
scrt_ra -C <Context ID> -W
190
191
192
Administrative Tasks
193
On the recovery server, the replica LV (pt LV) MAX LPs: size is
not exceeded.
194
On the recovery server, the write journal associated with the file system
is not extended.
The filesystem can only be extended within the limits of the state map size.
The default state map supports a filesystem size of 1 TB
If a protected filesystem needs to be extended, verify the following.
1. The state map limit will not be exceeded.
2. The protected LV MAX LPs: will not be exceeded.
3. The replica LV (pt LV) MAX LPs: will not be exceeded.
4. There is enough free space in the protected and replica VGs.
The following command provides the state map size.
rtattr -C <Context ID> -o smb<LV> -a Size
The default state map size is 33280, which allows a filesystem to expand to
1 TB.
The state map limit is calculated as follows. The default region size is 4
MB.
max filesystem size = (SMBitmap size -512) * 8 * region size
For example, if SMBitmap size is 3584 and region size is 4 MB.
195
10. On the production and recovery servers, use rtdr to create a failover
context if one was configured.
rtdr -C <Primary Context ID> -F <Failover Context ID>
setup
11. On the production server, use scconfig to wipe the state maps clean
(0.000% dirty).
scconfig -WC <Context ID>
196
13. On the production server, use smit chfs command to extend the
filesystem.
4. Stop RecoverNow.
/usr/scrt/bin/rtstop -C <Context ID> -Fk
197
10. Remove the existing snapshot journal volume that you want to increase
rmlv -f <ObjectAttributeValue from step 9 on
minus the /dev/r prefix>
page 197
198
/fs1
/jfsold
where nonrtjfslog is a jfslog that exists in the volume group but is not
part the PVS.
2. If a jfslog does not exist in the volume group:
mklv -t jfslog -y< newjfslog>
<vgname> 1
where <newjfslog> is the name of the jfslog that you are creating for the
non-protected filesystem to use
3. Format the jfslog.
logform /dev/<newjfslog>
/jfsold
199
200
ODM Commands
The errnotify stanzas are added to the errnotify ODM using the
following command.
odmadd /tmp/ern_1
stanzas:
Description
SCRT_AAGENT_ABORTED_ERROR:
SCRT_AAGENT_API_ERROR:
SCRT_ABORT_ERROR:
SCRT_ARCHIVEMGR_ERROR:
SCRT_BSFC_READ_ERROR:
SCRT_BSFC_WRITE_ERROR:
SCRT_CONFIG_ERROR:
SCRT_CONFIG_UPDATE_ERROR:
SCRT_DBMFC_APPLY_ERROR:
SCRT_DBMFC_READ_ERROR:
SCRT_DBMFC_WRITE_ERROR:
SCRT_DEVREAD_ERROR:
201
202
Error
Description
SCRT_DEVWRITE_ERROR:
SCRT_LFC_READ_ERROR:
SCRT_LFC_WRITE_ERROR:
SCRT_LIFC_FULL_ERROR:
SCRT_LIFC_READ_ERROR:
SCRT_LIFC_WRITE_ERROR:
SCRT_LRFC_READ_ERROR:
SCRT_LRFC_WRITE_ERROR:
SCRT_NETWORK_ERROR:
SCRT_POOL_EMPTY:
SCRT_POOL_HWM:
SCRT_SCJ_COWREAD_ERROR:
SCRT_SCJ_COWWRITE_ERROR:
SCRT_SCJ_FULL_ERROR:
SCRT_SCLM_WANCHORCB_ERROR
SCRT_SCLM_WANCHOR_ERROR:
SCRT_SCSM_BMWRITE:
SCRT_SCSM_LOGWRITE:
SCRT_SHAREDD_EXPORT_ERROR:
SCRT_SHAREDD_IMPORT_ERROR
SCRT_SMFC_READ_ERROR
SCRT_SMFC_WRITE_ERROR
SCRT_SM_ERROR
Description
SCRT_SNAP_OFF:
SCRT_SRC_ERROR
203
Mark state map bitmaps dirty for all devices or specified devices while
RecoverNow is active.
Where
-C is the context
Context ID
marks statemap bitmaps dirty for all devices or specified devices while
RecoverNow is active and performs a full synchronization of the Production
Volume Set or one LV in the Production Volume Set.
-B
204
You can also use the -L option with the scconfig command for
specific logical volumes to mark the state map zero percent dirty.
1. On the production server:
Stop any application(s)
Stop RecoverNow and unload the drivers:
rtstop -FC <Context ID>
3. Save the protected data on the production server to tape or disk. You
must save at the LV level in block sequence.
Save to tape:
dd if=/dev/db2 of=/dev/rmt0 bs=1024
Save to disk:
dd if=/dev/db2 of=/dev/db2bu bs=16m
6. Restore the data from tape or disk to the Replica on the recovery server.
NOTE
205
7. Start RecoverNow on the recovery server. The changes made after the
save to tape or disk synchronize to the recovery server.
rtstart -C <Context ID>
Overview
RecoverNow supports Live Partition Mobility for partitions running AIX
5300-07 or later or AIX 6.1 or later or AIX 7.1 or later on POWER6 or
POWER7 technology-based systems. Live Partition Mobility allows a
partition that is replicating with RecoverNow to be migrated to another
system without interrupting replication.
206
IMPORTANT
207
Migrating a Partition
Each time you migrate a partition, RecoverNow is licensed to run on the
migrated partition for up to 30 consecutive days. If the license is valid for
less than 30 days, RecoverNow will be licensed to run on the number of
days remaining on the original partition. When you migrate back to original
partition the license expiration date is returned to its original value.
IMPORTANT
208
209
210
10
Overview
You can apply the redo and undo logs from the recovery server to rollback
application data on the production server and restore the data to an earlier
point in time. When you rollback the application data the information is
synchronized with the replica on the recovery server similar to any other
write action.
211
Keep in mind that the majority of all database recovery operations require
database repairs that are typically performed by DataBase Administrators
(DBAs). Also, the remaining minority of recovery operations need database
resurrections that are done by IT administrators. In the burning repair
scenario, for example, RecoverNow makes the DBA's job more precise.
LAN
Production
Server
Recovery
Server
Data tap
Application
Data Storage
Data tap
Undo
Logs
Undo
Replica
Storage
Logs
In the resurrection scenario, the database is crashed and will not be coming
back up. If you need to rollback to 55 seconds, you do not need to restore
last days full backup and replay a full week of database redo (roll forward)
logs. In this case, RecoverNow can rollback a totally crashed and burned
database in a matter of momentsmany orders of magnitude faster.
Keep in mind that during a production restore, you do not need the database
instance up. In fact, the database cannot be up while RecoverNow rolls its
image around on disk. By definition, the database is down since it crashed.
As a result, the best and fastest way to restore your database is while it is
212
LAN
Production
Server
Recovery
Server
Data tap
Data tap
Snapshot
Log File
Application
Data Storage
Containers
Log File
Containers
Replica
Storage
COW
In this repair scenario, only some pieces of data are corrupted but the
production database is still running. After you restore a historical,
non-burning image of the database on the snapshot, you can pull pieces out
of it and put them back into the live production database to repair the bad
pieces of data. You can put out the fire with information. RecoverNow
restores automatically on the back-end snapshot.
Using the RecoverNow snapshot to visualize complete historical images of
the database enables DBAs to forgo spending time on a daily basis saving
pieces of database for possible use at a later time. Also, DBAs have access
to the exact historical image they want (typically by reviewing logs, etc.).
213
Step 10: Mount the Volumes for the Context on page 219
EXAMPLE:
In this example, the Context ID is 1. The volume that needs to be replaced is
/dev/rrtctx1. Display information about the volume.
sclist C 1 t pdfc
The output shows that /dev/rrtctx1 belongs to volume group rtvg1. The
volume size is shown in bytes (1073741824).
Object: <loglv00>, Type: <SCRT/containers/PDFC>, Serial <355>:
214
or
scrt_ra C <Context ID> t <LFCID>
The allowable formats for times entered with the D option are specified in
the file identified by the DATEMSK environment variable.
EXAMPLE:
In this example, the LFCID is 158. The Context ID is 1.
scrt_ra C 1 -t 158
215
216
EXAMPLE:
The Context ID in this example is 1. Enter the following command:
scrt_ra -C 1 W
/dev/loglv00
/dev/rtlv2
/dev/rtctx1
N/A
N/A
N/A
N/A
N/A
N/A
or
scrt_ra C <Context ID> t <LFCID>
217
You will be prompted for the time or LFCID that you used to create the
snapshot.
EXAMPLE:
scrt_rc C 1
218
At this prompt, enter l for LFCID or t for time. Use the same information
that you used to create the snapshot in Step 3: Create a Snapshot on the
Recovery Server on page 215.
> l
LFC target >158
You have requested an incremental LFC restore
from Tue May 12 08:16:24 2009 (1080058584)
to Mon May 10 11:17:29 2009 (1079983049),
LFCIDs 250+ to 158.
c(ontinue) or a(bort)? c
Rolling LFC restore status.
--------------------------Production at LFCID 248.
Production at LFCID 246.
Production at LFCID 244.
Production at LFCID 242.
Production at LFCID 240.
...
Production at LFCID 160.
Production at LFCID 158.
Production restored to LFCID 158.
Backingstore remains stable at LFCID 250.
rc> commit
committing...
RestoreServer is down, exit code 0.
Restore Client session complete.
You may need to use fsck on the file systems before they can be mounted. If
this is necessary, then unmount the volumes using the rtumnt command, run
fsck, and then mount the volumes using the rtumnt command shown above.
219
Even if you decide later that you would like to pick a better time or
a LFCID, you can roll forward or rollback to that point. This is
possible because RecoverNow keeps all of the change information
on the recovery server.
220
221
222
be rolled back to. Select either Point in Time, Container ID, or Event
Marker:
Point in TimeSpecify the date and time, within the rollback
window, where the production server will be rolled back to.
Container IDSpecify the container ID, within the rollback
window, where the production server will be rolled back to.
Event MarkerSpecify the event marker, within the rollback
window, where the production server will be rolled back to.
6. Mount file systemsSpecify whether to automatically or manually
unmount and mount and the file systems.
7. Click Finish to start the rollback.
The Production Server Rollback portlet shows the progress of the
rollback.
223
6. Enter c to continue.
Rolling LFC restore status
-------------------------Production at LFCID 34
Production at LFCID 32
Production at LFCID 30
Production restored to LFCID 30.
Backingstore remains stable at LFCID 36
rc>
224
NOTE
225
226
11
227
3. On the recovery server, start the production restore shell. Enter the
following command:
scrt_rc -C <Context ID>
228
rc>
If you are restoring from archived data, then the LFCID that you choose
identifies data that is no longer stored on the recovery server.
7. Enter a LFCID <example 1360>
This results in the following output:
You have requested an incremental LFC restore
from Fri Jan 23 10:22:50 2009 (1201274570)
to Fri Jan 23 10:01:26 2009 (1201273286),
LFCIDs 1466+ to 1360.
c(ontinue) or a(bort)?
8. Enter c to continue.
229
10. On the production server, mount the protected volumes. Enter the
following command:
rtmnt -C <Context ID> -f
230
12
Introduction
RecoverNow provides highly available data recovery and protection
through failover, resync, and failback operations on both local and remote
recovery servers, providing solutions for local failovers, as well as remote
failovers.
In a local failover operation, the production server fails over to a local
(on-site) recovery server, whereas in a remote failover operation, the
production server fails over to a remote (off-site) recovery server. Failing
over to a remote server is defined as Disaster Recovery, since it applies to
the situation where an entire site is lost due to a disaster such as a flood or
hurricane. Disaster recovery is also used for situations where failing over
to a local server is adequate for keeping data available and applications
running.
231
resync bring the production server data up to date with the recovery
server data
232
233
Names of all special files associated with devices (access points to LVs)
Filesystem information
234
Similar to a primary context, the failover context has all the information for
all the servers in a configuration. However, the difference is that the
configuration settings the failover context contains are derived from the
primary context and that it shares several attributes with the primary
context.
NOTE
You must setup one Failover Context for each additional server in
a configuration where you need to create only one Primary
Context for your production server.
By default, in a replicated configuration (Production, Recovery,
Replicated), the recovery server becomes the failover server. In a failover
event, it becomes the production server. All three servers will still be
connected when a resync operation is active.
In the same replicated configuration, to change the failover server to the
replicated server, change the configuration associated with the failover
context by specifying the hostname of the replicated server in the following
command:
rtdr -C <Primary Context ID> -F <Failover Context ID> setup
Syntax
rtdr -C <ID> -fhmnqv failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
-C Primary Context ID of Production, Recovery, Remote
Replica servers
-F Failover Context ID
-f Forced execution *
-h Help, prints usage
-m Man style help
-n No execution, just print commands
-q quiet, do not ask for confirmation
235
Use the -f attribute with the rtdr command with caution. If you
do not resync, this attribute forces a failback and leaves the
Production Volume Set (PVS) file systems and replica out of sync.
236
237
where:
<Primary Context ID> is the primary context on the production server,
and <Failover Context ID> is the failover context on the recovery server.
NOTE
If there is more than one target server and you do not want to use
the default for the Failover Server, then use the -s <Hostname>
option on the rtdr command to specify the Failover Server. Refer
to Syntax on page 235 for the rtdr command.
2. Mount filesystems:
rtmnt -C <Context ID>
238
2. Mount filesystems:
rtmnt -C <Context ID>
You can perform failover operations after you validate the replica.
239
240
13
Run a Procedure
The Procedures portlet and Steps portlet on the Procedures page will
guide you through each step.
1. Use one of the following methods to run a procedure. Use the method
that is most convenient for you based on the page you are currently
on:
From the Replication Group portlet, select Planned Failover,
Unplanned Failover, or Failback from the Actions dropdown for a
replication group.
From the Procedures portlet, select Run from the Actions
dropdown for a specific replication group.
From the Procedures portlet, select the Procedure name (Planned
Failover, Unplanned Failover, or Failback) and select Run from
the Action toolbar on the Steps portlet.
241
2. Use one of the following methods to run the next step in the procedure
or retry a failed step:
From the Replication Group portlet, select Resume Procedure from
the Actions dropdown for a specific replication group.
From the Procedures portlet, select Resume from the Actions
dropdown for a specific procedure and replication group.
From the Steps portlet, select Resume from the Action toolbar.
242
The Steps portlet shows the steps for the Planned Failover procedure
selected in the Procedures portlet.
The following table shows what steps are run to perform a planned failover.
The Sequence Number shows the sequence number of the step in the
procedure. The sequence begins with integers starting at 10 and
incremented by 10. The purpose of sequence numbers is to help identify
problem steps when communicating with customer care.
243
Table 2.
Sequence
Number
Step
10
20
Stop replication on
the current
production server
30
Failover the
replication group.
Server roles change
40
Start replication on
the new replicated
server (Only
displayed if the
replication group has
a replicated server
configured.)
50
Start replication on
the new recovery
server
60
Start replication on
the new production
server
NOTE
244
245
The Steps portlet shows the steps for the Unplanned Failover procedure
selected in the Procedures portlet.
The following table shows what steps are run to perform an unplanned
failover. The Sequence Number shows the sequence number of the step in
the procedure. The sequence begins with integers starting at 10 and
246
Sequence
number
Step
10
Create a snapshot
on failover server.
20
Delete a snapshot
on failover server.
30
Rollback the
failover server
40
Failover the
replication group.
Server roles
change.
50
Start replication on
the new replicated
server.(Only
displayed if the
replication group
has a replicated
server configured.)
60
Start replication on
the new recovery
server.
70
Start replication on
the new production
server
247
248
The servers in the replication group name have changed now that
the replication group has failed over.
249
The Steps portlet shows the steps for the Failback procedure selected in the
Procedures portlet.
The following table shows what steps are run to perform a failback. The
Sequence Number shows the sequence number of the step in the procedure.
The sequence begins with integers starting at 10 and incremented by 10.
The purpose of sequence numbers are to help identify problem steps when
250
Failback
Sequence
number
Step
10
on page 251
20
page 251
30
40
50
251
252
Failover Operations
253
where:
<Primary Context ID> is the primary context on the production server,
and <Failover Context ID> is the failover context on the recovery server.
NOTE
If there is more than one target server and you do not want to use
the default for the Failover Server, then use the -s <Hostname>
option on the rtdr command to specify the Failover Server. Refer
to rtdr on page 264 for the rtdr command.
2. Mount filesystems:
rtmnt -C <Context ID>
254
2. Mount filesystems:
rtmnt -C <Context ID>
You can perform failover operations after you validate the replica.
Failover Operations
This section describes how to perform the failover operations from the
production server to the recovery or replicated server when the production
server has failed.
255
Performing Resync
A resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge. This occurs after a failure of the production
server and a failover to the recovery or replicated server. When the
application is started on the recovery or replicated server the updates result
in a divergence from the data on the production server.
256
After restoring the original production server, use the resync operation to
ensure that the production server data is current with the recovery server
data.
To resynchronize the revived production server from the recovery or
replicated server, on all servers (Production, Recovery, and Replicated):
rtdr -qC <Primary Context ID> resync
On recovery server:
start lca for failover context
IMPORTANT
The resync operation assumes the original production data was not
lost, and is available in its entirety after the production server is
revived. If the production data is lost, the statemap on the recovery
or replicated server must be marked as dirty prior to resync. This
forces a complete region recovery and initializes the production
data.
Wait for the region recovery completes before performing the failback
procedure, as described in Performing Failback on page 258.
257
NOTE
Performing Failback
IMPORTANT
Before you failback ensure that you stop the application on the
recovery server.
After a resync operation completes, failback to return to the original
production, recovery, and replicated server roles. All servers must be active
and running to execute resync and failback operations.
To initiate failback to the original production server, on the Recovery or
Replicated and production servers execute:
rtdr -qC <Primary Context ID> failback
258
7. If the state maps are not clean on the recovery server wait until all the
data is synchronized to the production server.
8. On the production server execute the following command which will
create a snapshot to allow the integrity of the data to be checked.
scrt_ra -XC <failover context>
12. On the production server execute the following command which will
remove the snapshot created in step 8.
scrt_ra -WC <failover context>
13. On the recovery and replicated server execute the following command
which will failback to the primary context.
rtdr -qC <Failover Context ID> failback
14. On the production server execute the following command which will
failback to the primary context.
rtdr -qC <Failover Context ID> failback
259
260
CLI Commands
14
261
extend_replica_lv
Usage
You can use the extend_replica_lv command to force the expansion of a
Replica LV (Logical Volume) that is associated with a specified PVS
(Production Volume Set) LV, so that the Replica LV will be equal in size to
the PVS LV. This command will only run on the production server and the
LCA must be active.
Syntax
extend_replica_lv -C <Context ID> -L <PVS LV>
extend_replica_lv -h help
-C <Context ID>
-L <PVS LV>
-h Help, prints this usage
NOTE
This command is only required for PVS LVs that are extended and
have no associated filesystem, or PVS LVs that have an associated
filesystem with an outline log and the filesystem is extended.
262
rtattr
rtattr
Syntax
rtattr -C ID [-a attribute] [-o object] [-t type]
rtattr -C ID -a attribute -v value {-o object | -t type}
rtattr -h
-a Attribute for query/edit (ObjectAttributeName)
-C <Context ID>
-h Help, prints this usage
-o Object for query/edit (ObjectName)
-t Type for query/edit (ObjectType)
-v Value for edit (ObjectAttributeValue)
You can use the v parameter with the commands to edit. If you do not
specify the v parameter only query is available.
Usage
Use this command to query and change attributes in the RecoverNow ODM
(Object Data Manager) files:
SCCuAttr
SCCuObj
SCCuRel
Example 1
View all the machine hostids:
rtattr -C <Context ID> -a HostId
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f7"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
SCCuAttr:
ObjectName = "replica"
ConfigObjectSerial = 8
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f2"
ObjectAttributeType = "ulong"
263
SerialNumber = 8006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 16
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f3"
ObjectAttributeType = "ulong"
SerialNumber = 16006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
Example 2
View only the production servers hostid:
rtattr -C <Context ID> -o production -a HostId
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 16
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "0xc0a801f3"
ObjectAttributeType = "ulong"
SerialNumber = 16006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
rtdr
Syntax
rtdr -C <ID> [-fhmnqv] failover | resync | failback
rtdr -C <ID> -F <ID> [-s <hostname>] [-fhmnv] setup
-C Context ID (of the "primary" context)
-F Failover Context ID
-f Forced execution (use with caution)
-h Help, prints usage
-m Man style help
-n No execution, just print commands
-q quiet, do not ask for confirmation
-s Select failover site server from multiple recovery
servers (default is first replication hop's server.)
264
rtdr
Usage
This command manages RecoverNow's disaster recovery processes as well
as failover and failback operations. Given a primary context <X>
configured on both a Production and a Recovery Server, note:
265
Prior to failover, you should validate the data integrity of the Replica,
and if necessary, validate the data if necessary.
To validate data integrity of the Replica, create a snapshot of the
replica, then analyze it with the application itself. To create the
snapshot, on the recovery server:
scrt_ra -C <X> -X
Resync assumes the original production data was not lost, and is
available in its entirety after the production server is revived. In
the event that production data was lost, statemap on the recovery
server must be dirtied prior to resync to force a complete region
recovery, and re-initialize the production data.
266
rtdr
To dirty all statemaps, in the failover context on the recovery server (the
acting production server):
rtstop -C <X> -F
scconfig -C <X> -M
NOTE
Resync assumes the original production data was not lost, and is
available in its entirety after the production server is revived. In
the event that production data was lost, statemap on the recovery
server must be dirtied prior to resync to force a complete region
recovery, and re-initialize the production data.
To dirty all statemaps, in the failover context on the recovery server (the
acting production server):
rtstop -C <X> -F
scconfig -C <X> -M
rtstart -C <X>
267
rtmark
Syntax
rtmark [-C ] [-s <num>|-d <str>] [-iV] [<file>|-]
rtmark -rC <Context ID>
rtmark -h
-C ID Event is specific to Context ID.
-d <str> Date string, overrides event time.
-h Help, display this message.
-i Interactive query for event attributes.
-r Copies event marks from the production server to the
Recovery Server
-s <num> Seconds since epoch, overrides event time.
-V Print version.
<file> File containing the event mark attributes.
Usage
Event Markers are tags that mark points in time or points in process that are
significant to you for the purposes of recovery. An Event Marker can be
selected as the Recovery Point Objective (RPO) during a data restore. They
are typically needed for applications which cannot take advantage of
RecoverNows Any Point-In-Time (APIT) data restores along with
applications which do not have live transactional durability on disk.
The following is an example of a script that could be called, with as many
arbitrary attributes to the event that you want, in addition to the time and
date attribute automatically assigned by rtmark. The customer-defined
attributes between the cat line and the second EOF would also be added to
the event. The entire event would be replicated to the recovery server, and
available for viewing and selection during restores.
#!/usr/bin/ksh
cat <<-EOF | /usr/scrt/bin/rtmark -C <Context ID> name = test1
description = "This is a test."
owner = dave
priority = 2
another_attribute = Just another attribute
EOF
rtmnt
Syntax
rtmnt [-C ID][-fn]
268
rtstart
Parameters
Parameter
Description
-C ID
-f
-h
-n
-p
Usage
This command is used to mount all file systems associated with the context
specified.
See Also
rtumnt, sccfgd_putcfg
rtstart
Syntax
rtstart [-C ID][-BMnNp]
Parameters
Parameter
Description
-C ID
-B
-h
-n
-M
-N
-p
269
Usage
This command is used to load the RecoverNow data tap and to start the data
replication processes. On the production server, rtstart also mounts the
protected file systems.
See Also
rtstop
rtstop
Syntax
rtstop [-C ID][-FfknS]
Parameters
Parameter
Description
-C ID
-F
-f
-h
-n
-k
-S
Usage
This command is used to stop the data replication process of RecoverNow,
and optionally to unload the data tap.
See Also
rtstart on page 269
270
rtumnt
rtumnt
Syntax
rtumnt [-C ID][-Dfn]
Parameters
Parameter
Description
-C ID
-D
-f
-h
-n
Usage
This command is used to unmount all the volumes associated with the
specified context.
See Also
rtmnt on page 268 and sccfgd_putcfg on page 283
sclist
Syntax
sclist -t TYPE [-bR] [-A ATTR [ ... ]] [-R] [-C ID] [-d X]
sclist -t TYPE -o ATTR=VALUE [-bR] [-A ATTR [ ... ]] [-C ID]
[-d X]
sclist -a [-A ATTR [ ... ]] [-C ID] [-d X]
sclist -r SERIAL [-r SERIAL ...] [-b | -c] [-C ID] [-d X]
sclist -O SERIAL [-O SERIAL ...] [-C ID] [-d X]
sclist [-BeiIjJlLmMpPstSTvVX] [-I] [-D[D]] [-C ID] [-d X]
sclist -h [-z]
sclist -fZ
-a Query all objects
-A ATTR Query specific attribute (repeatable)
-b Be Brief, useful for scripting output
-B List of StateMap bitmap devices
-c Expansive, if possible, expand on output.
-C ID Operate on Context ID.
271
272
Parameter
Description
-a
-A attribute
-b
-B
-C ID
Operate on Context ID
-d
-D
-E
-f
-h
-i
sclist
Parameter
Description
-I
-j
-J
-l
-L
-m
-M
-o ATTR=val
-O serial
-p
-P
-r serial
-R
-s
-S
-v
-V
-t type
-X
-z
-Z
Usage
This command provides information about containers used in RecoverNow.
See Also
sccfgd_putcfg on page 283
273
scconfig
Usage
Use this command to manage DataTap devices and drivers.
Syntax
scconfig
scconfig
scconfig
scconfig
scconfig
scconfig
scconfig
scconfig
scconfig
274
-l
-u
-r
-M
-S
-s
-t
-C
-V
Parameter
Description
-a seconds
-b percent
-c
-C ID
-d X
-E
-f
-h
-i
-I name
-l
-L name
-M
Mark statemap bitmaps dirty for all or specified devs (see -L).
-n
-P
scconfig
Parameter
Description
-B
-Q
-q
-r
-R
-s
-S
-t
-u
-U
-v
-V
Display version.
-W
See Also
sccfgd_putcfg on page 283
275
scsetup
Makes or removes the Logical Volumes (LVs) used by RecoverNow in a
specific protection context, such as LFCs. Note however that scsetup will
not remove production LVs in the PVS or their associated replica LVs. Run
this command after defining and saving a context configuration using the
RecoverNow Replication Group Wizard.
After you have defined a context, scsetup creates a log file and containers
(logical volumes) in the specified volume group.
Syntax
scsetup -M [-ijlnprsv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -R [-inv] [-C ID] [-d X] [-o role] [ -t TYPE ]
scsetup -E [-cinv] [-C ID] [-d X] [-o role]
scsetup -I [-cinv] [-C ID] [-d X] [-o role]
scsetup -L [-inv] [-C ID] [-d X]
scsetup -X [-inv] [-C ID] [-d X]
scsetup -F [-inv] [-C ID] [-d X]
scsetup -h
-C ID Operate on Context ID.
-c Clear destination device files prior to export/import.
-d X Debug level of X (0-9).
-E Export production volumes.
276
Parameter
Description
-c
-C ID
Operate on Context ID
-i
-j
-L
-l
-m
-M
-n
-o
-R
-s
Skip setting or clearing bitmaps for statemap (if there are any)
scsetup
Parameter
Description
-v
-X
Usage
Preparation for RecoverNow data protection.
-F Failover preparation. PDFC LV names are moved to BSFC LV
names, and vice versa.
-i Ignore volume manager errors.
-I Import production volumes.
277
scrt_ra
Syntax
scrt_ra
scrt_ra
scrt_ra
scrt_ra
-t
-D
-S
-V
<>
<>
<>
<>
[-C
[-C
[-C
[-d
Parameters
278
Parameter
Description
-a
-C
-d
-D <target date/time>
-e
-f
-F
-h
-l
-L
-p
-P <path to script>
-S <target seconds>
-v
Verbose.
-V<vfb level>
-w
-W
scrt_rc
Parameter
Description
-x
-X
Usage
This command is the Restore Agent and is used to create snapshots on the
recovery server.
See Also
rtmnt on page 268, and rtumnt on page 271
scrt_rc
Syntax
scrt_rc [-C ID] [-d X] [-p X] [-h[v]] [-v] [-V]
-d Debug level of X (0-9)
-h Help, display this message
-C Operation on Context ID (default is 17)
-p Ping agent X (aba|lca|rs), ref is 0 if up
-v Verbose help
-V Print version
Usage
The restore client is an interactive command line interface, or shell, for
production data restore. To enter the shell, type scrt_rc -C<ID> at the unix
command prompt on the recovery (a.k.a. backup) server.
Entering the restore client shell starts a production restore session.
Ultimately, this session should be terminated with either the commit or
abort command. Problems during the restore can be resolved with the
recovery command.
Type help at the rc> prompt for all available commands.
NOTE
The -p option the scrt_rc command will not start the shell, but
instead will return with agent status.
279
LFC level
Date/Time
Session Termination
A restore session may be terminated either with an abort or a commit
command. When aborted, all restored devices are brought back to
pre-session levels. When committed, all restored devices remain at the last
target of the session.
A commit does not remove any forward or reverse incremental data from
the RecoverNow time line which allows for a subsequent restore to a time
after the committed target, if necessary. In fact, the restore itself is included
in the time line which allows it to be undone.
Process Overview
RecoverNow performs a production data restore by writing reverse block
incremental data directly into the raw Logical Volumes (LVs) of the
Production Volume Set (PVS), rolling those LVs back in time as a single
consistency group. The PVS is treated as a consistency group since it
encapsulates the entire storage footprint of the protected application. The
application's referential data integrity is always maintained.
All block I/O during the restore occurs at the Logical Volume Manager
(LVM) layer, below all file systems and/or databases associated with the
protected application. In RecoverNow, the reverse block incremental data is
recorded in odd numbered LFCs, the Before Image Log File.
Containers (BILFCs), which are also raw LVs and reside on the
backup/recovery server, or in external tape archives, if any.
The length of the restore window is a function of how many BILFCs are
available to RecoverNow, the size of the BILFC, and the average
application write rate. Tape archives are used to extend the restore window.
During a restore, the PVS LVs must be opened exclusively for writing by
RecoverNow. No other application may have the LVs opened for writing.
All associated databases and file systems must be unmounted.
Two agent daemons work together to perform a production restore. On the
production server, the Log Creation Agent (LCA) receives BILFC
transmissions and makes the BILFC writes to the PVS. On the
280
scrt_vfb
General Procedure
1. Ensure required agent daemons are running.
2. On the production server - Stop/Unmount the corrupted production
application
3. On the production server - Unmount file systems [rtumnt -Cx]
4. On the production server - Sync RecoverNow [scconfig -Cx -S]
5. On the recovery server - Execute RC [scrt_rc -Cx]
6. On the production server - Mount file systems [rtmnt -Cx]
7. On the production server - Start/Mount production database
Procedure Notes
1. The rtumnt -Cx command will perform a switch [scconfig -Cx -S]
automatically.
2. One After Image Log File Container (AILFC) may be sent during the
restore to fine tune to the nearest second. BILFCs are optimized for I/O
throughput, while AILFCs maintain individual write fidelity.
3. Backup and recovery server are synonymous.
scrt_vfb
The Tivoli Storage Manager must be defined in the RecoverNow
configuration before using this command.
Syntax
scrt_vfb [-bdDflLnUVrR] [-s <path to validation script>] [ -C ID ]
Usage
This command is used to create a virtual full backup.
281
Parameters
Parameter
Description
-b
-C
-D
-f
-h
Help.
-l
-L
-U
-R
-s
-V
Create VDevs.
sccfgd_cron_schedule
This command manages entries in cron for RecoverNow Virtual Full
Backups (VFB). The Tivoli Storage Manager must be defined in the
RecoverNow configuration before using this command.
Syntax
sccfgd_cron_schedule <Op> <Context_id> [<sched_type>]
[<cron_info>] [<vfb_opts>]
where:
Op -[a|q|d] for (add|query|delete respectively)
sched_type [once|daily|weekly|monthly]
282
sccfgd_putcfg
Usage
This command is used to schedule a virtual full backup.
Examples
sccfgd_cron_schedule add 3 daily 15:3:*:*:*
sccfgd_cron_schedule delete 3
sccfgd_cron_schedule query 3
sccfgd_putcfg
Syntax
sccfgd_putcfg primary_context_ID failover_context_ID
Parameters
Parameter
Description
primary_cont
ext_ID
failover_cont
ext_ID
-h
Usage
This command is used to load the RecoverNow configuration file into the
RecoverNow ODM by creating and loading a failover context configuration
based on a previously loaded primary context configuration.
sccfgchk
Syntax
sccfgchk-C <Context ID>
Parameters
Parameter
Description
-c class
-C CTX
283
Parameter
Description
-d X
-h
-i
-v
Verbose
Usage
This command is used to check a configuration before RecoverNow is
started. Issue this command on each node after the configuration is
initialized and before it is started.
sztool
Syntax
sztool
Parameters
sztool script
Command
Options
284
Description
sztool
sztool -c
sztool -d
sztool -g
sztool -h
sztool
sztool script
Command
Options
Description
sztool -l
When the log file is created, this command prints out the
calculation results for different LFC sizes based on the
existing log file. For example, sztool -l32 prints out the
results when the LFC size is at 32M. sztool -l16 -l512,
prints out all the calculation results from 32MB to
512MB. You cannot have spaces between -l and the LFC
size number. Only screen output, there is not any delay or
sleep.
sztool -r
Usage
You can use the Sizing Tool (sztool) to calculate configuration values before
RecoverNow is installed. It is also useful to run the tool after RecoverNow
is installed to determine if the number of LFCs or WJ percentage needs to
be adjusted. For more information, refer to Chapter 3, Using the Sizing
Tool to Calculate LFC Size on page 43.
285
286
There are two production nodes with shared disks between them.
287
Chapter A:
288
11. On the failover production server edit the production HostId stanza in
the /tmp/C<Primary Context ID>.cfg file. Replace the contents
of the ObjectAttributeValue field with the output from the
"rthostid" command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "6CABA7DF"
ObjectAttributeType = "ulong"
289
Chapter A:
SerialNumber = 15006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
12. On the failover production server edit the backup HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the content of
the ObjectAttributeValue field with the output from the
rthostid command.
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "5FBBC3EF"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 11
290
es_ha_config 1 82
The Primary Context ID is 1, and the new device major number 82, is
available on both production servers.
17. On the failover production server, use es_ha_config to configure a
device major number for the Primary Context ID. Choose a device
major number that is the same on both production servers.
es_ha_config <Primary ContextID> [NewMajorNumber]
The Primary Context ID is 1, and the new device major number 82, is
available on both production servers.
18. On the primary production server, unmount the filesystems associated
with the RecoverNow configuration.
unmount /usr/scrt/run/c<Primary Context ID>
unmount /usr/scrt/run/c<Failover Context ID>
291
Chapter A:
fi
echo "${DATE_CMD}: RecoverNow startup successful" >$RTSTARTLOG 2>&1
-> Insert application start script here <exit 0
292
/usr/scrt/bin/rtstop
-C ${CID} -kFv
293
Chapter A:
/usr/scrt/bin/rtstart -C<PrimaryContextID>.
294
Unplanned Failover
In this scenario, both Highly Available Production servers are unavailable
due to a disaster. For example, an entire site is lost due to a disaster such as
a flood or hurricane.
Before Performing Failover Operations
Do not perform a failover until you have validated the integrity of the data
on the recovery server. After performing a failover, RecoverNow cannot be
rolled back to a point in time before the failover. This section provides
guidelines to execute before performing your failover operations.
Validating Data Integrity
Validating the data integrity of the replica is critical. Prior to performing the
failover operations, validate the data integrity of the replica on the recovery
server and restore it if necessary. To validate the data, first create a snapshot
of the replica and then analyze it with the application itself.
1. On the recovery server, create a Snapshot based on the current redo log
and validate the data integrity: Refer to Chapter 8, Creating
Snapshots on page 183.
If analysis indicates data corruption, remove the snapshot and create a
Snapshot based on a Specific LFCID or a Specific Date and Time to
locate and validate an optimal restore point.
2. On the recovery server, create a Snapshot based on a Specific LFCID or
a Specific Date and Time and validate the data integrity. Refer to
Chapter 8, Creating Snapshots on page 183.
Use scrt_ra to print valid restore time spans
/usr/scrt/bin/scrt_ra vpC<Primary ContextID>
Current Snap Time:
1300122094 (LFCID: 305000): Mon Mar 14 13:01:34 2011
Available internal rollback windows:
295
Chapter A:
No recorded VFBs.
Once you have located an optimal restore point, remove the snapshot.
Proceed to step 3 to Backup the replica or to step 4 to perform a failover
restore or a Failover to the Latest Point in the Data.
3. On the recovery server, if you have TSM or SysBack, backup the
replica. This provides additional data protection by keeping complete
copies of the data on archive media such as tape. Refer to Chapter 11,
Working with Archived Data on page 227.
4. On the recovery server, perform a failover restore to rollback the replica
to the validated point in time from step 2 or a Failover to the Latest
Point in the Data.
To perform a failover restore, refer to Performing a Failover
Restore on page 256.
To perform a Failover to the Latest Point in the Data, refer to
Failover to the Latest Point in the Data on page 256.
5. On the recovery server, start your application.
Recovery of the Failed Production Server(s)
Cluster services should not be running on these nodes. Recreate the
RecoverNow production environment when one or both production servers
have been recovered, the steps will depend on the extent of the damage to
the production servers.
Managing Resynchronization to the Production Server
If the failover from the production server to the recovery server was
necessary because all protected data was lost on the production server, then
the data must be restored to the production server before doing a failback.
This data loss could occur if a disk or disk subsystem failed or was
destroyed on the production server. Refer to Manual Resynchronization
Process if production server Data Is Lost on page 258.
If none of the protected data was lost on the production server, refer to A
resync operation is required when the Production Volumes and Recovery
Replica Volumes diverge.
296
Performing Failback
Before you failback ensure that you stop your application on the recovery
server. Refer to Performing Failback on page 258.
Planned Failover/Resync/Failback
In this scenario, the administrator has a scheduled maintenance period and
switches operations that run on the production server to the designated
recovery server.
Performing Failover
Perform the following steps for a planned failover:
1. On the production server stop your application.
2. On the production server, use rtstop to unmount the protected
filesystems, transfer any current LFC data to the recovery server and
unload the RecoverNow production server drivers.
/usr/scrt/bin/rtstop FSC<PrimaryContextID>
Verify that the failover process completed, the failover output will
display:
---Failover Context ID <Failover ContextID> is
enabled. ---
297
Chapter A:
Example:
esmon 740
Mar 22 17:13:11 Total LFC Size=3025M, Free size=3014M,
Used Size=11M, Usage=1/100
The failback process will transfer the used LFCs to the production
server.
7. On the recovery server failback, wait for failback to successfully
complete before performing failback on the production server. Refer to
Performing Failback on page 258.
8. On the production server failback, wait for failback to successfully
complete before starting your application Refer to Performing
Failback on page 258.
298
There are two recovery nodes with shared disks between them.
A Resource Group can only be moved between the two recovery nodes,
and the RecoverNow roles of the nodes never changes.
same enhanced concurrent mode volume group(s) that are part of the
RecoverNow configuration. By default the RecoverNow File
Containers are created on the Default VG, defined using the
RecoverNow Replication Group Wizard. The filesystems need to be at
least 128MB in size and must have a separate jfslog, jfs2log, or jfs2
inline log since they will not be part of the PVS.
299
Chapter A:
11. On the Failover recovery server edit the backup HostId stanza. In the
/tmp/C<Primary Context ID>.cfg file. Replace the contents of
the ObjectAttributeValue field with the output from the
rthostid command.
SCCuAttr:
ObjectName = "backup"
ConfigObjectSerial = 15
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "6CABA7DF"
ObjectAttributeType = "ulong"
SerialNumber = 15006
300
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 1
12. On the Failover recovery server edit the production HostId stanza in the
/tmp/C<Failover Context ID>.cfg file replacing the contents of
the ObjectAttributeValue field with the output from the
rthostid command.
SCCuAttr:
ObjectName = "production"
ConfigObjectSerial = 4
ObjectType = "SCRT/info/host"
ObjectAttributeName = "HostId"
ObjectAttributeValue = "5FBBC3EF"
ObjectAttributeType = "ulong"
SerialNumber = 4006
ObjectNlsIndex = 0
SC_reserved = 0
ContextID = 11
301
Chapter A:
302
fi
echo "${DATE_CMD}: RecoverNow startup successful" >$RTSTARTLOG 2>&1
-> Insert application start script here <exit 0
-C ${CID} -kFv
303
Chapter A:
echo
exit
fi
echo
exit
304
/usr/scrt/bin/rtstart -C<PrimaryContextID>
305
Chapter A:
Unplanned Failover
In this scenario, the production server is unavailable due to a disaster. For
example, an entire site is lost due to a disaster such as a flood or hurricane.
Before Performing Failover Operations
Do not perform a failover until you have validated the integrity of the data
on the active recovery server. After performing a failover, RecoverNow
cannot be rolled back to a point in time before the failover. This section
provides guidelines to execute before performing your failover operations:
Validating Data Integrity
Validating the data integrity of the replica is critical. Prior to performing the
failover operations, validate the data integrity of the replica on the active
recovery server and restore it if necessary. To validate the data, first create a
snapshot of the replica and then analyze it with the application itself.
1. On the active recovery server, create a Snapshot based on the current
redo log and validate the data integrity: Refer to Chapter 8, Creating
Snapshots on page 183.
If analysis indicates data corruption, remove the snapshot and create a
Snapshot based on a Specific LFCID or a Specific Date and Time to
locate and validate an optimal restore point.
2. On the active recovery server, create a Snapshot based on a Specific
LFCID or a Specific Date and Time and validate the data integrity:
Refer to Chapter 8, Creating Snapshots on page 183.
Use scrt_ra to print valid restore time spans
/usr/scrt/bin/scrt_ra vpC<Primary ContextID>
Current Snap Time:
1300122094 (LFCID: 305000): Mon Mar 14 13:01:34
2011
Available internal rollback windows:
------------------------------------------------------Start: 1300122094 (LFCID: 305000): Mon Mar 14
13:01:34 2011
End: 1300112259 (LFCID: 304502): Mon Mar 14
10:17:39 2011
306
Available VFBs:
-----------------------------------------------------No recorded VFBs.
Once you have located an optimal restore point, remove the snapshot.
Proceed to step 3 to Backup the replica, or to step 4 to perform a
failover restore or a failover to the Latest Point in the Data.
3. On the active recovery server, if you have TSM or SysBack, backup the
replica. This provides additional data protection by keeping complete
copies of the data on archive media such as tape. Refer to Chapter 11,
Working with Archived Data on page 227.
4. On the active recovery server, perform a failover restore to rollback the
replica to the validated point in time from step 2 or a Failover to the
Latest Point in the Data.
To perform a failover restore, refer to Performing a Failover
Restore on page 256.
To perform a Failover to the Latest Point in the Data, refer to
Failover to the Latest Point in the Data on page 256.
5. On the active recovery server, start your application.
Recovery of the Failed Production Server
Recreate the RecoverNow production environment when the production
server has been recovered, the steps will depend on the extent of the damage
to the production server.
Managing Resynchronization to the Production Server
If the failover from the production server to the recovery server was
necessary because all protected data was lost on the production server, then
the data must be restored to the production server before doing a failback.
This data loss could occur if a disk or disk subsystem failed or was
destroyed on the production server. Refer to Manual Resynchronization
Process if production server Data Is Lost on page 258.
If none of the protected data was lost on the production server, refer to
Performing Resync on page 256.
Performing Failback
Before you failback ensure that you stop your application on the active
recovery server. Refer to Performing Failback on page 258.
307
Chapter A:
Planned Failover/Resync/Failback
In this scenario, the administrator has a scheduled maintenance period and
switches operations that run on the production server to the active recovery
server.
Performing Failover
1. On the production server stop your application.
2. On the production server, use rtstop to unmount the protected
filesystems, transfer any current LFC data to the recovery server and
unload the RecoverNow production server drivers.
/usr/scrt/bin/rtstop FSC<PrimaryContextID>
Verify that the failover process completed, the failover output will
display:
---Failover Context ID <Failover ContextID> is
enabled. ---
308
Performing Failback
1. Before you failback ensure that you stop your application on the active
recovery server.
2. Verify that the resync operation has completed, use esmon to check the
LFC usage count
esmon <FailoverContextID>
Example:
esmon 740
Mar 22 17:13:11 Total LFC Size=3025M, Free size=3014M,
Used Size=11M, Usage=1/100
The failback process will transfer the used LFCs to the production
server.
3. On the active recovery server failback, wait for failback to successfully
complete before performing failback on the production server. Refer to
Performing Failback on page 258.
4. On the production server failback, wait for failback to successfully
complete before starting your application. Refer to Performing
Failback on page 258.
Prerequisites
Before you begin, keep in mind the following:
309
Chapter A:
310
Failback Procedure
1. Move the Production_Server Resource Group to the Production node.
2. Bring the Recovery_Server Resource Group online on the Recovery
node.
311
Chapter A:
Names used such as, Cluster Nodes, Resource Groups are arbitrary. The
integrator can choose to use any names.
Notification scripts are not provided and are the responsibility of the
integrator.
The following RecoverNow scripts are provided for the PowerHA for AIX
configuration. These scripts require parameters -C <Primary Context ID>
and for the first two scripts optionally -P if called from the
Production_Server Resource Group. These scripts will log to "/usr/scrt/log"
if the "HACMP Log File Parameters" have "Debug Level" set to "high".
/usr/scrt/bin/production_failback_acquire
/usr/scrt/bin/production_failover_release
/usr/scrt/bin/ABA_Monitor
/usr/scrt/bin/LCA_Monitor
312
Production_Server
Participating Nodes
Production Recovery]
Startup Policy
Fallover Policy
Fallback Policy
Never Fallback
Recovery_Server
Participating Nodes
[Recovery]
Startup Policy
Fallover Policy
Fallback Policy
Never Fallback
[Recovery_Server]
[Production_Server]
Production_85
Start Script
[/users_path_name/Production_Server_Start]
Stop Script
[/users_path_name/Production_Server_Stop]
Application Monitor
Server Name
LCA_Monitor_85 Monitor_85
Recovery_85
Start Script
Stop Script
Application Monitor
ABA_Monitor_85
313
Chapter A:
Production_Server_Start
#
# Arguments:
None
#
# Returns:
0 - success
#
# Environment:
None
###############################################################################
###############################################################################
#
################################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
/usr/scrt/bin/production_failback_acquire -C <Primary Context ID> -P
if ((${?}!=0))
then
printf "$(date) Production Server start failed.\n"
exit 1
fi
printf "$(date) Production Server start successful.\n"
-> Insert application start script here <if ((${?}!=0))
then
printf "$(date) Double-Take_RecoverNow_85_Application_Start failed.\n"
exit 1
fi
printf "$(date) Double-Take_RecoverNow_85_Application_Start successful.\n"
################################################################################
314
Production_Server_Stop
#
# Arguments:
None
#
# Returns:
0 - success
#
# Environment:
None
###############################################################################
###############################################################################
#
###############################################################################
PROGNAME=${0##*/}
[[ ${VERBOSE_LOGGING} == high ]] &&
{
rm -f /tmp/${PROGNAME}.out
exec 1> /tmp/${PROGNAME}.out
exec 2>&1
PS4='[${PROGNAME}][${LINENO}]'
set -x
}
printf "$(date) ******** Begin ${PROGNAME} ********\n"
315
Chapter A:
LCA_Monitor_85
Production_85
[Long-running monitoring]
Monitor Method
Monitor Interval
[180]
[9]
Stabilization Interval
[900]
Restart Count
[0]
Restart Interval
[0]
[notify]
Notify Method
[/users_path_name/LCA_85_Notify]
Note:
With 20,000
ABA_Monitor_85
Recovery_85
[Long-running monitoring]
Monitor Method
Monitor Interval
[180]
[9]
Stabilization Interval
[900]
Restart Count
[0]
Restart Interval
[0]
[notify]
Notify Method
[/users_path_name/ABA_85_Notify]
Monitor Name
Monitor_85
Application_85
Monitor Mode
Monitor Method
[Long-running monitoring]
[/users_path_name/Double-Take_RecoverNow_85_Application_Monitor]
Monitor Interval
[60]
[9]
Stabilization Interval
[900]
Restart Count
[0]
Restart Interval
[0]
[notify]
Notify Method
[/users_path_name/App_85_Notify]
316
Production_Server
Production Recovery
[Production_85]
Recovery_Server
Recovery
[Recovery_85]
Production
Debug Level
high
Standard
Recovery
Debug Level
high
Standard
317
Chapter A:
318