Disaster Recovery

DISASTER RECOVERY
A Disaster is a situation in which critical components in the R/3 System environment

Become unavailable so that service cannot be resumed in a short period of time.
The critical components are the Database and the R/3 Application hot instance that runs
Message and enqueue services.
Risk Analysis
Before starting to build a disaster recovery site , Identify the Vulnerabilities aspects of the
System environment and consider the system uptime business requires according to the
Costs of the failure and Return on Investment ( ROI ).
Factors Affecting the Business Decision for Disaster Recovery Site.
1 Environment Factors - Likelihood of Disaster Such as Earthquake.

2 Time expected to replace critical hardware components or reconfigure critical
software components is more than the period that would put your entire
business at risk.
3 Recovery Time and Recovery Point - Estimating the Tolerable recovery time
and Recovery Point.
Process
1 To protect the R/3 Application Host running enqueue and message

services.
This can be achieved by having standby system available ( at a remote site )
that can be started up in the event of disaster .
2 To protect the database.
The entire database can be replicated but you have to use a Method provided
by the database vendors’ .Approach know as the Hot Site Backups or Standby
Database. The products mentioned below follow the concept of replication
transparency . This means that the functionality to achieve replication is built
into the database service instead of having to be coded by the client
applications.
The Normal Method of Building a system from scratch after a disaster recovery requires
several steps .First the hardware must be procured and configured .The operating system
must be installed to the pre-disaster OS configuration. Then the database can be
recovered , depending on the availability of logs and offline or online tape backups , steps
required to rebuild a local cluster based on the business need . It takes a long time to fully
recover the database and roll forward the database .Sometimes a full recovery is not
possible.
Campus Clusters
Clustering products can improve the availability of SAP system Providing fast and
automatic recovery of failures . A cost effective way of stretching a cluster across a
campus or larger site , up to 10 Km is to use software RAID 1 for the shared disks. The
campus cluster configuration with DB and CI package or resource groups and the shared
Storage system is mirrored from within each server cluster’s OS . The cluster quorum or
lock disk is also mirrored .Dual, redundant Fibre channel paths are used between the
servers and the storage and FDDI is used for the cluster IP networks to be in the same IP
subnet at the 10Km distance. As it requires software RAID1 to function it requires a
reliable file system. Presently such a system is only available on Unix clusters.Windows
2000 clusters with MSCS cannot support software mirroring of file systems .Microsoft
has announced support for veritas file system with clustering, which would allow campus
clustering configuration.
This solution is cost effective because the shared disk systems can be mid-range systems
and only two server nodes are required.As the software RAID1 is used over a large
distance , Fast storage system with large chache along with an optimum layout to avoid
unnecessary I/O delays , helping keep the database response times low .
This Solution has some drawbacks – The problem with split-brain syndrome.
Metro clusters
This solution can span city wide or metropolitan distances( less than 60 Km) .A metro
cluster is designed for an automatic fail-over in a disaster recovery environment. This
solution gives highest levels of availability a hardware- based clustering can offer and is
in production by many SAP customers.
The important function of this solution is to automatically switch the remote DR storage
system into read/write mode ( write-protect turned OFF ) so it can properly fail over the
database in case of a primary site failure.
There are atleast six server nodes configured in the metro cluster , although more are
allowed .Two in Primary data center , two in Disaster recovery center , and two
additional servers are needed in a third location to act as a cluster arbitrators .The
arbitrator servers are required because there is no centralized cluster lock disk or quorum
disk when using a split cluster configuration.
Reason for supporting metropolitan distances is the need to synchronize the disk write-
I/O commands. Only when both storage systems have written the I/O into their cache
and acknowledge it is the I/O cycle complete .
Split Brain Syndrome
With geographically split data centers, the communication links between the clusters
nodes may go down, yet the cluster nodes may remain functioning. In this case each
cluster node thinks the other is down because the cluster heartbeat isn’t able to make
contact with the remote server node, each attempting take over the shared resource
resulting in a integrity problem. This is the reason to have arbitration servers in the third
data center to ensure membership consistency in cluster.
The arbitration servers can be workstations running other applications .They simply must
run small background task that is arbitrator whenever cluster communication or failure
events occur.
DR Clusters with Microsoft cluster server
Disaster recovery can be configured using the enterprise storage copy in Microsoft cluster
server environment. The configuration can use two cluster server nodes that are
configured to use the primary storage system.The secondary storage system maintain a
copy of the database volumes , but in a unshared read only mode . During normal mode it
is not visible to the server nodes.If the primary storage system fails , the remote or
standby system’s disk volumes can be manually set to read-write mode by IT
administrator.
Continental Clusters.
Clusters across greater distances than metropolitan areas can be supported with SAP
cluster which may be interesting for organizations that need a disaster recovery solution
beyond the immediate geographic region. This protects against environmental events that
affect an entire area such as hurricanes, earth quakes .
This solution uses ESCON connection over the WAN to support making physical disk
copies and thus can support both continental and inter continental distances the longer
distances require the continental cluster solution to employ asynchronous disk I/O .The
I/O is acknowledged as soon as the local disk storage system successfully writes it into
cache , without waiting for the second disk system to acknowledge .
The ESCON over WAN links are slower than pure ESCON or FC channel , so full data
replication of entire disk volumes is not feasible in this configuration . Backup and tape
restores are needed to initially synchronize large data volumes and for recovery .The
failure is typically in one direction and so is designated for fewer , but real disaster
scenarios .It is not an automatic recovery . Some manual intervention is required.
Database Failover
Database failover solutions make logical copies of the data .Failover to standby or
recovery database server can be effectively used as an alternative to HA clustering in
SAP Production environment.
Remote standby Database server - A cost effective way to decrease the recovery time
after a local disaster it to use a remote standby database server .This is based on sending
the log files to a remote server that runs the database in recovery mode .This solution can
be used to recover from disasters , to provide recovery from logical errors , for fast
restores and for decoupled backups.
In a disaster scenario at the primary data center the standby database can be recovered
unto the last available log file and set online in read/write mode .This helps reduce the
time needed to restore , and can be used while the primary data center is being build.
This solution requires two copies of database software and also two database servers and
storage with identical copies of the data achieved with a full backup and restore. The
remote database can be built with lower performance and cost effective disk layout .
In a disaster situation the SAP application servers would need to be pointed to the new
database host or the complete set SAP applications servers with remote database server
can built and started .This does require manual intervention to set the DR database server
out of recovery mode into online read/write mode.
Only the archived or inactive logs are sent to the remote system. In case of disaster to the
primary database server , the recovery on the remote server can only be upto the last
archived logs. The advantage of this solution is its protection against logical errors
.Logical errors can destroy the integrity of primary database .In Microsoft SQL server it
is called log shipping.
One draw back in this solution is that the structural changes does not reflect in the logs so
they are not sent to the standby database . Third party solutions are avail able which notes
the database changes or DB catalog files for structural changes and provides more control
over the time delay of the log recovery.

Disaster Recovery - Srinivas

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Disaster Recovery - Srinivas

Hochgeladen von

Copyright:

Verfügbare Formate

A Disaster is a situation in which critical components in the R/3 System environment

Factors Affecting the Business Decision for Disaster Recovery Site.

1 Environment Factors - Likelihood of Disaster Such as Earthquake.

1 To protect the R/3 Application Host running enqueue and message

Split Brain Syndrome

DR Clusters with Microsoft cluster server

Das könnte Ihnen auch gefallen