Beruflich Dokumente
Kultur Dokumente
June 2010
Authors
Josef Stelzel, Sr. Developer Evangelist, Microsoft Corporation, jstelzel@microsoft.com
Summary
This paper describes how to implement a high availability solution for SAP applications on Microsoft Windows Server 2008 R2. It is written for developers, technical consultants, and solution architects. This paper introduces the technologies and architecture used, describes various high availability scenarios, and discusses the implementation process. This paper also contains links to advanced features and technical topics including disaster recovery methods. Note: Access to some of the linked information might be restricted such as SAP notes available at the SAP Service Marketplace at https://service.sap.com. Access to this Web site is available only to registered SAP customers and partners, and requires a user name and password.
ii
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Server, the Windows logo, SQL Server, and Active Directory are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Applies To
SAP NetWeaver 7.0 SAP NetWeaver 2004 SAP Business Suite (mySAP ERP) SAP Application Server SAP Replicated Enqueue SAP System Central Services
Keywords
SAP NetWeaver, disaster recovery, high availability, SAP Application Server, SAP Replicated Enqueue, planned downtime, unplanned downtime, SQL Server 2005/2008 R2, Windows Server 2008 R2
Contact
This document is provided by Microsoft Corporation. Please check the SAP interoperability area at www.microsoft.com/sap and the .NET interoperability area in the SAP Developer Network at http://sdn.sap.com for updates or additional information.
iii
Contents
Applies To ...................................................................................................................... ii Executive Summary ...................................................................................................... 5 High Availability Considerations .................................................................................. 6 Critical application availability requirements ................................................................ 6 Classes of availability problems .................................................................................. 6
Loss of physical resources .................................................................................................... 6 Logical errors and inconsistencies ........................................................................................ 7 Disasters ................................................................................................................................ 7 Planned downtime ................................................................................................................. 7
SAP Architecture and Requirements ......................................................................... 12 SAP NetWeaver and its components ........................................................................ 12 SAP Application Server architecture ......................................................................... 13
ABAP system architecture ................................................................................................... 14 Dual-stack system architecture ........................................................................................... 18 Java system architecture ..................................................................................................... 22 SAP system single points of failure ..................................................................................... 23
Unplanned Downtime Avoidance Strategies ............................................................. 35 Hierarchy of high availability solutions ...................................................................... 35
Data storage protection ....................................................................................................... 36 Server protection ................................................................................................................. 37 Network high availability ...................................................................................................... 37 Application specific configurations ...................................................................................... 39
Simple cluster for a single SAP system..................................................................... 40 Using multiple clusters for SAP instances and databases ......................................... 42 SAP Replicated Enqueue ......................................................................................... 44 Multi-SID cluster ....................................................................................................... 45 Multi-node cluster ..................................................................................................... 50 SAP application servers ............................................................................................ 51 IT infrastructure protection ........................................................................................ 52 Hyper-V host cluster ................................................................................................. 53 Planned Downtime Minimization Solutions ............................................................... 54 Planning ahead for minimizing planned downtime .................................................... 54
Change management strategy deployment ........................................................................ 55
iv
Hyper-V Live Migration ............................................................................................. 61 Data Inconsistency Protection Solutions .................................................................. 63 Logical error reasons ................................................................................................ 63
Database data inconsistencies............................................................................................ 64 Sabotage and accidental data deletion ............................................................................... 64 Data loss through viruses and worms ................................................................................. 64
Disaster Recovery Solutions ...................................................................................... 72 SAP system protection in a geographically dispersed cluster.................................... 73
Storage replication .............................................................................................................. 73 Cluster quorum configuration .............................................................................................. 75 Majority Node Set configuration for Windows Server 2003 ................................................ 76 File share witness for Windows Server 2003 ...................................................................... 77 Network configuration .......................................................................................................... 78
Microsoft SQL Server database log shipping ............................................................ 79 Database mirroring with SQL Server 2005/2008 R2 ................................................. 81
Asynchronous database mirroring....................................................................................... 83 Synchronous mirroring with automatic failover in case of error .......................................... 83 SAP database mirroring configurations ............................................................................... 83
Executive Summary
Business applications are central to a corporate IT operation. All corporate business processes are supported by software solutions that help to better plan, process, or communicate in all business related tasks. Consequently, any service failure has an immediate and direct impact on corporate business results. This often decreases revenue and can damage the corporate image. This is especially true for SAP applications as corporations increase their dependencies on a productive IT environment. Enterprise Service Architecture (ESA) and the global network of interacting companies have increased both uptime requirements as well as the number of IT components that are ultimately needed to fulfill business requirements. As an increasing number of companies join global networks, there is always a time zone that utilizes a computing service. While in the past, centralized application systems like SAP R/3 have been used, ESA orchestrates the use of service providers in order to achieve a larger task. Those services can be distributed inside or outside a company and need to be available. High availability of mission critical applications has always been the focus for SAP infrastructures. The starting point for increasing availability traditionally has been to address the loss of a critical hardware resource that could generate downtime until the computer system is available again. More solutions have been developed over time to address other problems like downtime due to operating system defects, downtime caused by data inconsistencies, or downtime caused by disasters like earthquakes, floods, or terrorism. Even planned downtime, which is needed to upgrade systems or install patches, is contrary to the requirement to have an application service consistently available. However, planned downtime does reduce system vulnerability and increases reliability. This guide describes the solutions that address the various areas of availability for SAP on the Windows platform. It helps to identify the cause of potential downtime and provides the technical strategy to reduce or eliminate it. In addition, this guide provides solution description references that help the reader understand the technology and quickly find assistance. Microsoft has a long history of providing a comprehensive portfolio of solutions for protecting enterprise class applications like SAP. Microsoft Windows Server 2008 R2 offers even more functionality than previous versions with clustering, geographic distribution, and operating system security. Improved network configuration functionality, performance enhancements, and storage subsystem management included with Windows Server 2008 R2 make it easier to work with the latest technology from hardware partners. As a central component of Windows Server 2008 R2, high availability makes managing the complexity of modern infrastructures both effective and affordable.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Logical errors and inconsistencies
While many computer system hardware failures often directly and immediately generate unplanned application downtime, there are also problems that can cause logical errors, glitches and spikes during operation, and data inconsistencies. A random memory problem might for example cause data block corruption that potentially might only be discovered the next time this data is accessed. Accidental data deletion or file corruption caused by a computer virus is another example of logical errors. In many cases, the effect of these problems is not a hardware failure, but system degradation. However, since there is no way to predict when the system will use the data again, a potential problem might arise at any time during normal operation. Real application downtime is most often created during the process of recovering, such as when restoring the last backup or cleaning database problems manually. Since the data exists only once, maintaining data consistency is a crucial part of the availability concept. Disasters Disasters like fire, flood, hurricanes, earthquakes, and terrorism can instigate the loss of all IT systems in a data center. Problems of this scale might not be sufficiently addressed by having enough redundant hardware in one location. Besides having a proper geographical distribution of computer systems for the continuation of a critical application, typical questions include how to synchronize the data between the different sites and how to plan for the real event. In addition to a technical solution, the complete solution requires good planning skills and an in-depth knowledge of the applications and organizational requirements. Solutions that address the problems discussed are considered disaster recovery solutions rather than high availability solutions. Although, disaster recovery solutions might employ typical high availability techniques like clustering or database mirroring as part of a recovery plan, they clearly have a different scope than solutions that protect against a simple server hardware failure. Planned downtime Planned downtime occurs at an intended and often appropriate time, most likely at a time of low application usage. Planned downtime is typically implemented for server and software maintenance, upgrades or migrations, and changes to or the testing of critical configurations. Ironically, this maintenance helps to improve the computer system stability and security by eliminating known problems and by maintaining system resources. However, this maintenance does require application downtime. This white paper describes some architectural concepts that can help to minimize or eliminate planned downtime for typical SAP application tasks. However, proper planning and change management are still the main reasons for planned downtime. Therefore, it might not be possible to eliminate all planned downtime.
Note: To avoid user interruption, all dependencies must be protected as well as the primary application services. If a productive system is integrated into an IT infrastructure, this infrastructure is also critical as is a potential data provider or data consumer in the productive system. Any downtime associated with these dependent systems will interrupt the primary application services as well.
Availability level
Standard
Application
Application services and infrastructure components that can fail for a short period; typically one to two days without business impact. Standard often also implies a definition of minimum protection like using reliable servers, hot pluggable components, and so on. Applies to applications that need to be available even when a critical hardware resource is lost. Logical errors and loss or inconsistency of data needs to be addressed and a planned procedure for making the application service available again in case of a problem must exist. The duration of the outage has to be minimized. The application service is absolutely critical for the business processes. A service loss, even for a short period of time, might have a high financial impact on the company. All measures for protection must be taken.
Table 1. Service level agreements
High availability
Mission critical
Availability measures In order to measure and quantify computer system or application availability, the following formula is used: Availability = 100% * achieved availability / planned availability Availability is defined as the percentage that the application was used for an intended purpose. Defined availability values like 99.999 are often used in marketing as a solution quality indicator. The following table shows the assumed unavailability for various typical values.
Maximum possible downtime Days 18.25 7.3 3.65 0.365 0.0365 0.00365 Hours 438 175.2 87.6 8.76 0.876 0.00876 Minutes 26280 10512 5256 525.6 52.56 5.256
10
Guest clustering Also available with Hyper-V is the ability to configure a WSFC between two VMs so that the cluster service runs in the guest operating system. One advantage of this configuration is that it provides the ability for an entire test lab for cluster services to exist on one physical server. Because only one physical server is required, this configuration would reduce costs. While the VMs on a guest cluster could feasibly be located on a single physical server, this setup would create problems if the high availability of an application inside this cluster is important. Since the application cannot survive the failure of a single server if both of the VMs in a guest cluster reside there, a configuration with the VMs located on two physical servers is required for high availability. Please note that when using guest clustering, the type of storage used for the cluster disks is restricted to iSCSI. The following figure shows the configuration of a Hyper-V guest cluster on a single physical server and on two standalone physical servers.
11
Figure 1
More information about support for SQL Server in a guest cluster environment can be found at: http://support.microsoft.com/kb/956893 A detailed description for how to configure a Hyper-V guest cluster can be found at: http://blogs.technet.com/b/mghazai/archive/2009/12/12/hyper-v-guest-clustering-step-bystep-guide.aspx
12
For user and information integration, SAP NetWeaver uses the SAP Enterprise Portal (EP) and SAP Business Warehouse (BW). Data is also integrated by the SAP Master Data Management (MDM). By using the SAP Mobile Infrastructure, user integration can be extended to wide variety of remote devices. Process integration is performed by SAP Process Integration (PI), formerly known as SAP Exchange Infrastructure (XI). Enterprise service architectures are made possible by the integration of people, information, and processes, and are the foundation of a new breed of applications. Composite applications are composed from a variety of individual functions already available in the application infrastructure, and demonstrate how to develop faster and more flexible solutions for future business requirements.
13
A downside of this flexibility is that it increases the dependency on a greater number of components in the infrastructure in order to make a service available. Note: Because of the composition of enterprise services to business applications, all service providers in use must fulfill the same level of protection in order to make the composite service highly available. The following figure shows the SAP NetWeaver platform from a technical perspective in order to show how high availability could be implemented. Besides the SAP AS for the Enterprise Portal, Master Data Management, Business Warehouse, Process Infrastructure, or Mobile Infrastructure, there are also standalone engines and surrounding support systems. While the Internet Transaction Server today is mostly replaced by the Internet Communication Manager, a component of the SAP AS, there are often standalone gateways for RFC communication or the TREX search engine. Typically, SAP NetWeaver is supplemented by an installation of the SAP solution manager, a SAP NetWeaver administrator, and the SAP NetWeaver development environment.
14
layer for the business logic coded in ABAP or Java, they are required for the fulfillment of the business process which in turn creates high availability requirements. Solutions for optimized availability are supported by the SAP AS architecture, but always depend on additional components such as redundant servers and monitoring, and control processes that are typically in high availability clusters. Before going deeper into the SAP AS architecture, the general features should be discussed. All application servers consist of at least of one central database and a central SAP instance that provides unique services for the SAP system. If more transactional performance is required by the SAP system, additional application servers can be added to the SAP system. A SAP system that is identified by a unique System Identifier (SID) might consist of many SAP instances and the common database. Depending on the type of application, a SAP AS can be installed for ABAP, Java, or for both workload types as shown in the following figure:
ABAP system architecture The layout and structure of SAP AS 7.00 has changed from version 6.40. The following figure shows how the structure of a pure ABAP system was used up to SAP AS, version 6.40.
15
This figure shows two instance types including a central instance and one or more dialog instances. Processes like Dialog, Batch, Update, Spool, or the Dispatcher process exist many times in a SAP system and are therefore redundant. Each installation of an ABAP instance also has one gateway process configured that is used for communication through the Remote Function Call (RFC) protocol. Also, each instance has its own Internet Communication Manager (ICM) process for HTTP-based communication. The Internet Graphics Server (IGS) only supports the creation of bitmaps for browser-based clients. To register all the instances of a SAP system and to support the communication between the various components of a distributed SAP system, a single message server is configured in the central instance. Also specific to the SAP system is the central Enqueue server that manages the lock entries in a distributed SAP system in a lock table inside of the shared memory of the server. Because of these two unique processes, the term central instance was used for this installation. A central instance is the lowest work unit of the SAP system and the performance can be extended by adding an additional AS. When looking closer at the directory structure of this SAP system, the installation of the SAP AS 6.40 is demonstrated in the following figure.
16
All profiles and executables of a distributed SAP system are made available from the central instance to all dialog instances through the share SAPMNT. In order to support a simple patch process for executables, there is one master copy of the executables on the central instance. Any time a dialog instance starts the SAP utility, SAPCPE checks for the availability of a newer executable version. When available, this executable is copied to the AS local runtime directory before it is used. Changes in the SAP system ABAP reports are distributed by using the transport system. SAP systems can be configured to be a member of a transport domain. For each transport domain, there is one directory that is shared by all members of the domain. The directory is: <Drive>:\usr\sap\trans. Because of the central character of this shared directory, it can be considered a single point of failure for the operation of more than one SAP system. With the introduction of SAP AS 7.0, there was a major change in the layout of the central instance. Similar to the structures in pure Java systems, the unique Message and Enqueue server processes have been moved to a separate SAP instance: the ABAP System Central (ASCS) instance. Therefore, no typical AS has more system wide functions. The following figure shows the SAP landscape simplification:
17
Subsequently, the file system of an ASCS instance would look like the following figure:
18
SAP installations of SAP AS 7.0 consisting of an ASCS instance and a dialog instance will continue to use the name format D<instnr> for the instance directory. This combined installation structure is shown in the following figure:
Regardless of this combined installation structure, the Enqueue and Message server processes are now in the ASCS instance. This naming convention was not changed because of compatibility reasons with older versions. Dual-stack system architecture With the introduction of J2EE as a possible SAP system component in version 6.40, SAP AS can be installed for ABAP, Java, or for both types of workloads. There is a considerable difference in architecture between ABAP and Java platforms as seen in the following figure:
19
As shown in this figure, both the ABAP and the Java part of the SAP AS have their own Message and Enqueue server as critical components. The Java AS is primarily made up of Java server processes. The software deployment manager (SDM) is used for the installation and management of software versions. The server operating system must also have a Java development kit (JDK) installed to configure the Java virtual machine (JVM). The JDK for Windows is available for Windows through Sun Microsystems. While in ABAP Applications Server version 6.40, the Enqueue and Message server are still a part of the central instance: The Java AS always uses the system central services (SCS) instance concept. This means that every 6.40 dual-stack system must, at a minimum, consist of two instances. As with the pure ABAP configuration, the hybrid system still has a central database that divides the respective application data types by using a schema. In the hybrid structure, the ABAP and Java functions are shut down simultaneously as if a single instance. Both instance parameters are also configured in a single instance profile. The Java SCS instance in this installation is a complete unit and has its own profile. It can be started or stopped independently. For the purpose of maintaining distributed installations of SAP instances, all profiles and executables of the SAP system are shared on one central instance on a network share. This server is typically the server that holds the central instance or the SCS instance. All together, the dual-stack directory structures are naturally more complex than a pure ABAP or a Java instance. The SAP AS 6.40 dual-stack system structure is shown in the following figure:
20
As seen previously with the pure ABAP AS, the system structure for dual-stack systems became simpler with the introduction of the SAP AS 7.0. The only difference between the typical identical system instances is the Software Deployment Manager (SDM) that is installed only in one instance. The SDM is required to install and patch Java programs and is only needed when new programs are installed or during software maintenance. Therefore, there is no need to configure the SDM in a cluster solution. To secure SDM service availability, a backup copy can be installed on any AS when needed. As with SAP AS version 7.10, the SDM will be completely removed from the installation and replaced with a new Java Support Package Manager (JSPM) function. The software maintenance functionality will then be an integrated part of every AS and this function would be redundant. The following figure shows the SAP AS 7.0 dual-stack system structure. As shown in the figure, the ABAP Message server and Enqueue have now been moved to a new, separate ASCS instance that simplifies the dual-stack SAP AS setup.
21
The following figure shows the SAP AS 7.0 dual-stack directory structure with an ABAP, Java, and a SCS instance. In this file system layout, there is a clear distinction of the different components described.
22
Java system architecture In addition to the installation variant for ABAP or dual-stack systems, there is a third way to install pure Java systems. In this case, there is no difference between the SAP AS versions 6.40 and 7.0. This configuration typically uses a SAP Web dispatcher or a hardware load balancer to distribute the HTTP connection load. The Java system application servers are all constructed in the same way with the addition of the Software Deployment Manager (SDM) on one instance. This functionality, however, is removed in SAP AS version 7.10 and will no longer appear in subsequent Java system versions. The following are the three main components of a Java system: The central Enqueue and Messenger services used by all Java instances Java and dispatcher processes to handle the workload A central database for persistent data storage
23
Multiple J2EE instances placed on several physical servers create a Java cluster. The basic rule is that a Java instance can only be configured once per physical server. At a minimum, the Java instance must consist of at least one Java server process and a dispatcher, but can also have multiple Java server processes. The central SCS instance might also be put together with a regular Java instance on one physical server. Similar to the ABAP installation, the profiles and executables of a distributed Java system also reside on one physical server and are shared there. Because of the central character of these files, this server is the server that holds the SCS instance of the SAP system. SAP system single points of failure Single points of failure are SAP system elements that are critical in order to operate a system and must be protected against high available SAP system loss or failure. Single points of failure assessment As mentioned previously, every SAP system has the following central components that are required to be available at all times: A central database for data storage one per system Separate message servers and Enqueue servers for ABAP and Java systems The SAPMNT-share for profiles, executables, and Java Secure Store files of a SAP system. There is one SAPMNT-share per SAP system.
The purpose of the database is to provide persistent data storage for SAP system data and the runtime environment. Databases work with a series of internal mechanisms known as ACID (Atomic, Consistent, Isolated, and Durable). These mechanisms ensure
24
data consistency at all times. For example, there is a mechanism that logs all changes executed during a transaction. If a database operation fails in the middle of a transaction, the logs are used to restore the previous condition. The transaction logs can also be used to reapply transactions to a database image. For example, a database image restored from a backup would not reflect the latest transactional state since the transactions have most likely been executed after the backup was created. The latest transactions would be lost due to the restore if there was no transaction log available to reapply them. Databases are central application components and are often protected by high availability clusters or other technologies like Microsoft SQL Server 2005/2008 R2 Database Mirroring. High availability clusters use the same database image that is accessed from two servers (shared disk) for server redundancy. Database mirroring, on the other hand, is able to maintain a physically independent copy of the critical data. The main purpose of all these technologies is to protect the database service against loss since it is the most critical component of a SAP system. The SAP System Message Server registers all SAP system instances and load balancing user demands by connecting new users to the most available server in the system. Existing connections will remain intact if a message server goes offline, however, no new connections can be made by that server. This makes the Message Server an ideal cluster solution candidate. The Enqueue Server is part of the SAP lock concept. The purpose of the SAP lock concept is to synchronize data access in order to protect the consistency of SAP data objects. This is one of the most important functions of a SAP system. It keeps SAP data consistent by not allowing two users to make changes to the same data object at the same time. Instead, the data would be locked for the first user. The Enqueue Server in the following figure consists of a work process and a lock table in the shared memory of the server that is used to store the lock information for an entire SAP system. The Enqueue work process is needed in distributed systems to insert or verify lock information on behalf of the dialog instances. Local work processes can directly access the lock table and do not need this Enqueue work process. If, however, the lock table is lost by a server failure, lock information can no longer be verified. In a distributed system, this would create a transaction reset and roll-back of all pending transactions, even on dialog instances that would normally resume working, and all session contexts would be lost. An example of a SAP AS 6.40 ABAP with a single point of failure (SPOF) is shown in the following figure:
25
This figure shows only the critical SAP AS components and is therefore, not complete. Another critical point resides in the file systems and network shares of the SAP installation on Windows. It is important that the SAP system executables and profiles are always installed with the central instance or SCS instance in newer systems. Access to these files is provided through the SAPMNT share that is present only once per SAP system. Executables available on this share are copied to the local machine before an instance starts through the SAPCPE SAP program. This is done to improve the stability of the SAP instance. However, the profiles are only read through this share. The following figure shows the infrastructure of two servers: Server Alpha has the central instance and Server Beta is a SAP application server. Server Alpha hosts the central instance and therefore the SAPMNT share. Both instances have the share SAPLOC that is used to access the local environment of a SAP instance. Both servers have two environmental variables: SAPGLOBALHOST and SAPLOCALHOST. The UNC names \\SAPGLOBALHOST\sapmnt and \\SAPLOCALHOST\saploc were derived from these variables. These names are used in the SAP kernel to search the SAP system profiles and system directories. Server Alpha has both variables set to the name of the local server so all access points are local. However, Server Beta is directed to the central server when accessing SAPGLOBALHOST. SAPLOCALHOST is used for all instance specific operations and therefore is accessed again through a local path.
26
The mentioned directory structure and the SMPMNT share must be protected within a high availability solution because of their central significance for the SAP system. Since the access to the UNC path is derived from the variable SAPGLOBALHOST, these files are also called global files. Prior to SAP AS version 7.0, in all ABAP or ABAP + Java systems, the central instance was protected in a cluster. The reason was simple: It was not possible to separate the Enqueue and the Message server from the rest of the SAP central instance. Together with the central instance, the Enqueue server and the Message server, the global files and the SAPMNT share were implicitly protected in a cluster as well. With the development of the SAP Standalone Enqueue, it became possible for the first time in the SAP AS 6.40 to extract the central component Message server and Enqueue server into a single instance. By doing so, the cluster configuration for critical SAP services was significantly simplified. While in version 6.40, the SAP System Central Services (SCS) was introduced only for Java-based systems. SCS configurations also became available for ABAP-based systems with SAP AS 7.0. These configurations are called ASCS instances. Note: All high availability configurations of SAP systems today are based on the SCS instances, the protection of the SAPMNT share, and the GLOBAL files in a failover cluster.
27
One of the main benefits of this configuration compared to the protection of a complete central instance in older versions is the fact that only two relatively lightweight services need to be moved and restarted. SCS instances lead to shorter failover times and more stability in the cluster implementation. Since there are no SAP users connected to a SCS instance, the effect of a failover is also much smaller in the SAP system. Using the SAP Replicated Enqueue in addition to SCS high availability configurations enables enterprises to minimize application server interruptions. For more information, see the Measures to Avoid Unplanned Downtime section. The information below confirms which configuration is supported by which version. Up to version 6.40, the central instance is clustered. During an upgrade of an existing 6.40 central instance to 7.0, the established architecture remains intact. SAP has documented the migration steps to support the new ASCS structure in SAP note 1011190. When initially installing SAP AS 7.0, only the SCS/ASCS instance will be clustered. Since the SAP AS 6.40 SR1 release, only the SCS is clustered. No changes are needed to upgrade to 7.0. Since the SAP AS 6.40 SR1, the Java SCS instance together with the ABAP central instance is clustered. With the new installation of SAP AS 7.0, only the ASCS instance and the SCS instance are clustered.
SAP Enqueue process special requirements The dependency on the single lock table was not completely resolved by the introduction of the SAP Standalone Enqueue. To address this issue, the SAP Standalone Enqueue was combined with a SAP Replicated Enqueue on another server. The following figure shows the new system design with SCS instances to host the central and critical SAP system services. There would be one for ABAP and one independent SCS instance for Java. Any other component of the SAP System, such as the SAP AS that handles the user workload, would not be considered critical because they are implicitly redundant if more than one is configured in the system. Because of this redundancy, only the SCS instances require high availability protection measures.
28
If one combines the SAP Standalone Enqueue in the SCS instance with a SAP Replicated Enqueue running on a second server, one can continuously replicate the lock table. In a larger SAP system with several SAP application servers, an operation with minimal interruptions is provided. This is provided even when the central services must be transferred to another server due to a hardware failure. The SAP Replicated Enqueue can only be used for lock table replication and cannot function as a regular Enqueue server of a SAP system. The lock table in the SAP Replicated Enqueue, which holds the replicated lock entries, cannot be used directly for the Enqueue service. During the process of a failover of the Enqueue server to the replication site, the standard Enqueue process is first started and a new, empty lock table is created. The replicated data in the shadow lock table is then read and transferred to the original lock table before the system is operational again. The following figure shows the configuration of several SAP application servers and a central instance in combination with a SAP Replicated Enqueue:
29
A SAP Replicated Enqueue should be combined with a high availability cluster solution. One reason for this is to enable the administrator to switch the Message server from one server to another in case of a severe failure. Another reason for this setup is that the SAP Replicated Enqueue is not a fully functional Enqueue server. Instead, it is only used to replicate the lock table. During regular operation, the SAP Replicated Enqueue only inserts lock requests into a standby lock table on a second server. In case the original server dies, the normal Enqueue Server needs to failover to this server and resume work with this replicated lock table. Additionally, high availability cluster solutions are also used to protect the database against hardware failures. Since Message and Enqueue servers in the SCS instance have very little resource requirements on a server within a high availability cluster, it is possible to install additional local application servers. In this context, local means that they are not managed by cluster management and are lost by a failure of the respective server hardware.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The SAP Web Dispatcher
30
The SAP Web Dispatcher is a Software Load Balancer for HTTP or HTTPS connections. Typically, it will be installed in a demilitarized zone (DMZ) between the SAP backend systems and public Internet access. Connection requests from the Internet will be passed by the SAP Web Dispatcher to the available SAP system AS in a circular way. The routing algorithms are used to review the capacity and load on all the various instances to determine which server to connect to. With ABAP instances, the number of configured dialog work processes will be evaluated. With Java instances, the number of available server processes determines which server gets the next connection request. The architecture of the SAP Web Dispatchers corresponds with the SAP Internet Communications Manager (ICM) which is a component of every ABAP instance. While an ICM forwards incoming connection requests directly to a dialog work process of an ABAP instance or to the Java dispatcher of a dual-stack installation, the SAP Web Dispatcher passes those requests first to the respective ICM of a SAP instance which in turn further processes the request. The SAP Web Dispatcher basically acts as a software router for the incoming HTTP requests. Because of their central function for Internet communications, the SAP Web Dispatcher is also a critical component in a system landscape that needs to be protected against hardware failures. Since the SAP Web Dispatcher looks like a SAP ABAP instance, it can be integrated relatively easy into a high availability cluster solution and therefore be protected against hardware failure. The typical structure of a SAP landscape using a Web Dispatcher is shown in the following figure:
SAP on Windows Server 2008 R2 - High Availability Reference Guide In contrast to most of the high availability installations, the installation of a SAP Web dispatcher in a cluster is not supported by SAPINST. SAP note 834184 provides the steps to manually configure a WSFC for the SAP Web Dispatcher in detail. SAP notes can be downloaded from the SAP Service Marketplace at https://service.sap.com. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. Additional SAP Web Dispatcher administration information is available at: http://help.sap.com To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP Web Dispatcher link.
31
SAP standalone gateway The SAP gateway enables SAP systems and external programs to communicate with one another. The protocol for the communication is the Common Programming Interface Communication (CPI-C) which is also used by the Remote Function Call (RFC) interface. Subsequently, all RFC connections in a SAP system rely on the SAP gateway process. By default, each SAP AS has one gateway process configured. In certain cases, it is also possible to configure a standalone gateway process. One example would be to configure a standalone gateway for the System Landscape Directory (SLD). As the SLD is a component of the Java AS, the standalone gateway acts as a bridge to allow ABAP systems to read and write data per the RFC in the SLD. Another typical use case is the installation of a standalone gateway on a single database instance with no ABAP engine. In this case, the gateway is needed in order to make the database calendar (Transaction DB13) functional. This configuration is also described in SAP note 657999. In order to configure a standalone gateway in a failover cluster, it is very simple to add the gateway to the Enqueue and Message server process of a SCS instance. This configuration is described in SAP note 1010990. TREX TREX is an abbreviation for Text Retrieval and Extraction and is a search engine designed to search for structured and unstructured data. TREX provides SAP applications with numerous services for searching, classifying, and text mining in large document collections or unstructured data. In addition, TREX provides SAP applications with services for searching and aggregating business objects or structured data. This search engine is used as a standalone engine in combination with the SAP Enterprise Portal (EP) or the Knowledge Management (KM) application. For access to the TREX search engine, each SAP AS has ABAP or Java components that support the communication with the engine. The most simplified form of a TREX installation is shown in the following figure:
32
TREX is one example of a SAP solution that does not rely on a standard SAP AS, but is run on special server architecture. TREX installations can also be implemented as master/slave configurations spanning several physical servers. Additional information about the distribution and implementation of a TREX engine is available at http://service.sap.com/instguidesnw70. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this information after logging on to the site, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the Search and Classification (TREX) link.
General information about TREX is available at http://help.sap.com. SAP liveCache SAP liveCache is a component of the Advanced Planning and Optimization (APO) application that supports the SAP SCM solution: an application for supply chain management in the mySAP suite. SAP liveCache is a memory resident database for rapid access. The foundation of this technology is derived from the SAP MAXDB, formerly known as SAP DB. In addition to this memory resident database, each APO system has a normal database for the APO data and programs. In order to access data
SAP on Windows Server 2008 R2 - High Availability Reference Guide objects in the liveCache rapidly during operation, those objects are loaded into the liveCache at startup. A special logging mechanism writes savepoints to the disk every few minutes that does not reflect the transactional state of the system.
33
APO systems consist of a SAP AS and a liveCache as standalone engine. From the perspective of high availability, there are two solutions possible to protect the liveCache: A failover cluster for the APO system and the liveCache. LiveCache is supported in the WSFC as of SAP NetWeaver 7.0 SR1. A hot standby liveCache where the database log files are exported to a standby server and constantly applied to a database in recovery mode. This log shipping solution works with two independent servers that do not share common storage.
Additional information about the installation of SAP liveCache and cluster configurations in WSFC is available at http://service.sap.com/instguidesnw70. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. In addition, see the following SAP note about the configuration of liveCache in WSFC at https://service.sap.com/notes. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 780795: SAP liveCache 7.5: WSFC Installation
General information about the administration of the SAP liveCache is available at http://help.sap.com. To access this information after logging on to the site, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP liveCache Technology link.
SAP Content Server The SAP Content Server is an independent server instance for temporary data and Web documents that can be requested by the SAP AS through the Internet. By using the Content Server, large document volumes can be maintained for cached access. A SAP Content Server can be installed together with a SAP AS on a physical server or as standalone instance. It is possible to install this server with or without its own database. When installing this server with its own database, MAXDB is typically used. The simultaneous use of a SAP AS database and a Content Server is not supported. In order to protect the SAP Content Server against loss, it can be configured in a failover cluster.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The following SAP notes located at https://service.sap.com/notes provide information about the installation and clustering of the SAP Content Server. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 175096: SAP Content Server installation guide SAP note 1039401: SAP Content Server Clustering with Windows 2003
34
To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP Content Server link.
35
Securing a SAP application against interruption due to the loss of hardware resources generally requires applying several techniques. Applying these techniques will lead to better application protection.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Data storage protection
36
The lowest level of the high availability hierarchy manages the way data is stored and made available in a secure and reliable way. Data storage protection is typically widely implemented in a SAP data center. Most of the storage devices offer some level of protection by default. However, storage subsystems still have a number of challenges for a SAP data center. First, the amount of data grows rapidly over time. In addition, the data needs to be constantly protected to prevent hardware failure and data access performance loss that directly relates to the overall SAP system performance. SAP system performance is typically measured by user transaction response time. SAN infrastructures A Storage Area Network (SAN) provides a centralized approach to maintaining the storage resources needed in a computer system. Traditionally, Direct Attached Storage (DAS) has been used for the computer system local storage requirements. The use of DAS has high space requirements and administration costs. By centralizing data storage into a scalable, network type architecture, administrative costs are lowered, and space is managed more efficiently. SANs can be built on Fibre Channel connections using fiber optic cables and built on the SCSI protocol for block-oriented data transfer. In addition, iSCSI devices are now available that use normal TCP/IP networks for the transport. The SCSI protocol for the data transfer is packaged into the TCP/IP transport. From the high availability perspective, the use of SAN infrastructures in data centers is recommended. Redundancy of physical disks and protection against the single disk failure is maintained in the storage subsystem itself and follows the hierarchical approach shown at the beginning of this section. Depending on the vendor and the type of storage subsystems in a SAN, even data replication over larger distances can be achieved with SAN-based storage. Additional information on the concepts of the SAN infrastructure for highly available Windows systems can be found in the Server Cluster: Storage Area Networks white paper by searching for the title at http://technet.microsoft.com/en-us/default.aspx. Multipathing Data storage protection against unplanned downtime always includes connection protection between a server and its storage. If there is only one storage subsystem host adapter and subsequently only one storage cable connection, any host adapter, cable, or controller failure in the storage array would create an application interruption. The use of a WSFC could help protect the server components such as the host adapter. However, it is preferable to avoid connection failovers in a cluster. These failover types can be avoided by using a redundant host adapter and two cable connections to a storage device that in turn has two storage controllers. This configuration is called multipathing. The Windows operating system supports multipathing through the MPIO driver. Additional information for MPIO configurations is found in the Multipathing and the Microsoft MPIO Driver Architecture white paper at: http://download.microsoft.com/download/3/0/4/304083f1-11e7-44d9-92b92f3cdbf01048/mpio.doc
SAP on Windows Server 2008 R2 - High Availability Reference Guide Server protection
37
Servers host the individual components such as the SAP instances and services that compose a SAP system. The server role and importance depends on its function. A database server for a productive SAP system typically has the highest requirements in availability, stability, and performance. While SAP specific solutions are discussed later in this paper, there are a number of general server recommendations that incorporate high availability. With high availability, redundancy is the method to protect servers against downtime. Inside of a server this could mean that the server has two independent power supplies with two power cords. Of course, each power cord needs to supply enough energy to sustain the operation in case the other one fails. It might also include redundant host adapters for storage or network access. Finally, a conceptually well designed system with hot pluggable components is always valuable. However, there are server components that cannot be easily configured to be redundant. Main memory or CPUs are examples of these critical components as well as the server operating system that also exists only once. There are two solutions that are typically used to address this. One solution would be to use fault tolerant systems built to recover from memory or CPU hardware failures. However, the disadvantage to this solution is the limited performance range and higher prices. The second solution for protecting servers against failures is high availability clusters like the WSFC. With WSFC, two or more servers share storage subsystem access and can take over the storage volumes and restart applications automatically in case a server fails. This concept even maintains redundancy at the operating system level as each server has its own operating system. However, clusters depend on additional software components and need a proper configuration and a change management policy. We will discuss the possible cluster implementations with SAP applications later in this section. Network high availability Networks are the backbone of all corporate communication, both internally and externally. The SAP application network implementation has multiple communication layers based on different functionalities including: A server network that interconnects SAP application servers and the database server. A client network for local users using the SAP GUI or a browser. A demilitarized zone for connection to the public Internet. A provider for access to the public Internet.
Again, component redundancy is the key factor for high availability solutions. However, the architecture of a real implementation reflects additional considerations. For example, public access to the Internet immediately raises security concerns and has more requirements than the internal and isolated server network. While in the server network, besides availability, performance might be another issue. The following figure shows the different SAP network aspects.
38
Important considerations for highly available SAP networks include: Router and switch redundancy: The server network is redundant up to the used routers or switches. Redundancy can be accomplished by network teaming of Network Interface Cards (NIC). Routers need to monitor each other and take over the functionality of a failed device. Client separation: Clients are typically not connected in a redundant way. However, there needs to be a separation of clients between different switches so that only a single client group can be affected by a hardware failure. DMZ redundancy: In the DMZ, there is typically a redundant SAP Web dispatcher configuration or hardware load balancers. Internet access redundancy: Redundant access to the public Internet is necessary.
Additional information about the SAP systems high availability network requirements can be found in the SAP Help pages at http://help.sap.com. To access the information, do the following: In the left menu pane, click SAP NetWeaver. Under SAP NetWeaver 7.0 Library, choose English. Search for Network High Availability.
A description of the SAP landscape and SAP system network requirements can be found at http://sdn.sap.com.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Application specific configurations High availability clusters are a classic solution for protecting SAP applications against critical hardware resource failure. In the Windows-based SAP installations, SAP supports the database and the SAP SCS instance installation in a WSFC.
39
SAP components can be installed into the cluster by using the SAPINST SAP tool. In the simplest configuration, the cluster consists of two servers and a storage subsystem that the application components are installed on. The storage subsystem has to be accessible by both servers. A SAP system with a SQL Server database is shown in the following figure:
Each of the cluster nodes has its own local operating system with the SQL Server engine installed locally. Each node must be capable of accessing the external storage subsystem where the applications components are installed. Supported storage systems include Serial Attached SCSI (SAS), Fibre Channel, or iSCSI-based systems. Every WSFC cluster needs to maintain a copy of the cluster database that contains cluster configuration information. This information determines which cluster node can take ownership of the cluster resource group for the SAP application and database in case the communication between the nodes is interrupted. When two servers compete for the cluster resource group, this is known as Split-Brain syndrome and can generate a deadlock. In the simple cluster configuration shown in the previous figure, the cluster database is stored on each node. If the cluster uses a Disk Witness, the Disk Witness will also store
40
a copy. Applications that are protected in a WSFC cluster are configured in cluster groups. A cluster group contains the application resources like the shared disk storage volume that contains the SAP installation file system. In the case that a cluster group needs to be transferred from one server to another, such as during a hardware failure, these resources must become available on the second server before the cluster service can start the application there. Cluster resources can be configured to handle dependencies on other resources. For example, it makes no sense to start a SAP SCS instance before the SAP system database is available. For the exchange of status information between the members of the WSFC cluster, a private network is required. Since the status information that is periodically sent out is similar to a heartbeat, the network is called the cluster heartbeat network. Every SAP application network connection in the cluster is assigned a virtual IP address that is activated on a server by the cluster service when starting the SAP cluster group. While the virtual IP addresses are activated only on the server that runs the application, all network cards also have configured local IP addresses that are permanently assigned. Additional information about Windows Server 2008 R2 failover clusters can be found at: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
41
The installation of a SAP system in WSFC cluster solutions is described in the SAP Installation Guide for the respective SAP NetWeaver release at: https://service.sap.com/instguides A user name and password are required to access this Web site. To access this information after logging on to the site, do the following: From the left menu, open SAP NetWeaver. Select SAP NetWeaver 7.0 (2004s). Select Installation. Select Installation Guide - SAP NetWeaver 7.0 SR3 or Installation Guide SAP NetWeaver 7.0 SR2. Select Windows and the installation type (ABAP, ABAP + Java, or Java).
There are a number of SAP notes that provide additional information about WSFC related issues at https://service.sap.com/notes. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The following table lists the most important notes:
SAP note 106275 139513 779253 941092 962955 967123 1010990 1011190 1043592 1172679 Description Availability of SAP components on Microsoft Cluster Server Merge transports for high availability systems Clustering your Java Add-In Systems on Windows MSCS: Post-Upgrade Steps for systems upgraded to NW 7.0 SR<x> Use of virtual TCP/IP host names SAP NetWeaver 7.0 / Business Suite 2005 SR2: Windows Configuring a Standalone Gateway in a HA ASCS Instance MSCS: Splitting the central instance after upgrading to 700 MSCS: Cluster Resource Monitor Crashes on W2K3 SP2 Troubleshooting MSCS Issues
Table 3. WSFC related SAP notes
42
43
The description of a local SAP application server installation inside of a WSFC cluster is the same as the standard SAP application server description. The SAP installation guides for NetWeaver 7.0 describe the setup starting with the version SR2. These guides are available at https://service.sap.com/instguides. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this information after logging on to the site, do the following: From the left menu, open SAP NetWeaver. Select SAP NetWeaver 7.0 (2004s). Select Installation. Select Installation Guide - SAP NetWeaver 7.0 SR3 or Installation Guide SAP NetWeaver 7.0 SR2. Select Windows and the installation type.
An overview about the supported WSFC configurations is available from SAP in the MSCS Configuration and Support Information for SAP NetWeaver 04 and the SAP NetWeaver 7.0 Systems white paper at http://sdn.sap.com/irj/sdn/windows.
44
The regular Enqueue service and its lock table are on the server from which the SCS instance was started. The second server in the cluster has a SAP Replicated Enqueue in addition to the active database. Additional application servers are located on servers that are not in a cluster formation. All lock requests from the active Enqueue servers will be mirrored onto the Replicated Enqueue. In case of a severe SCS server hardware problem, the SCS instance will be transferred to the database server and started from there. During this process, the SAP Replicated Enqueue is stopped and the lock information from the mirrored lock table is copied into the new lock table of the regular server. Therefore, the SAP AS outside these clusters does not lose any information and their running transactions are not influenced. The SAP Installation Guide for SAP NetWeaver 7.0 SR2 and SR3 describe the cluster setup for the Enqueue Replication and the Enqueue Replication server installation.
45
SAP note 524816 gives detailed information about the SAP Standalone Enqueue. SAP note 804078 describes the concept of the SAP Replicated Enqueue and how it can be used to protect a SAP system. Attached to this note is also an installation guide for the Enqueue Replication server in a WSFC cluster. In addition, the SAP lock concept and high availability solutions are described at http://help.sap.com. To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the Standalone Enqueue Server link.
Multi-SID cluster
A limitation of older Windows-based cluster configurations was that only one SAP system per cluster could be configured. The reason for the restriction was because of the SAPMNT share. Any access to the SAP system global files in a distributed installation have to use this share. Since the share is configured on the <Drive>:\usr\sap directory, there is only one unique location in the file system. Underneath this path, there is a <SID> directory that hosts all the data for a specific SAP system. The consequence of this structure is that if there is more than one SAP system installed on the server, the share would contain the global data for all SAP systems. Since this share has to be relocated to another server in the WSFC cluster in case of a failover, that operation would impact all SAP systems. Because of this, SAP does not support this configuration. A remedy for the described problem and restriction is resolved by using a new SAP installation method. With this method, the SAP system disks are linked with the <SID> directory under <Drive>:\usr\sap by using junctions. Junctions are similar to symbolic links in the diverse UNIX versions. They are a file system detour that allows access to a designed directory to be automatically transferred to another directory. For example, the following figure shows the principle setup of a WSFC cluster with three SAP systems with AAA, BBB, and CCC designations. Each SAP system has its own hard disk that can be accessed on shared drives from both the servers in the cluster. SAP system AAA and system BBB run on Server A and system CCC runs on Server B.
46
The SAPMNT share previously was configured as a cluster resource inside the cluster configuration. Now it is in the local operation system of the respective server. The share is stationary and is no longer managed through the cluster. Under the C:\usr\sap directory path are three directories: AAA, BBB, and CCC. These directories have been created in both servers. Depending on system type, the directories in the following table are created on the shared drives: SAP system type All system variants Java ABAP ABAP + Java add-in Shared drive directory \usr\sap\<SID>\SYS \usr\sap\<SID>\SCS<InstanceNr> \usr\sap\<SID>\ASCS<InstanceNr> \usr\sap\<SID>\SCS<InstanceNrJava> \usr\sap\<SID>\ASCS<InstanceNrABAP>
Table 4. Directories per system type
47
Next, all the junctions are created from the local hard drive of every server. To create junctions, the executable linkd.exe from Microsoft is available. The executable is a part of the Microsoft Windows resource kit. The syntax for the commands is: linkd.exe <Argument1> <Argument2> Depending on the system type, the arguments can be accessed from the following table: SAP system type All system variants Java Junction creation arguments <localdrive>\usr\sap\<SID>\SYS <shareddrive>\usr\sap\<SID>\SYS <localdrive>\usr\sap\<SID>\SCS<InstanceNr> <shareddrive>\usr\sap\<SID>\SCS<InstanceNr> ABAP <localdrive>\usr\sap\<SID>\ASCS<InstanceNr> <shareddrive>\usr\sap\<SID>\ASCS<InstanceNr> ABAP + Java add in <localdrive>\usr\sap\<SID>\ASCS<InstanceNrABAP> <shareddrive>\usr\sap\<SID>\ASCS<InstanceNrABAP> <localdrive>\usr\sap\<SID>\SCS<InstanceNrJava> <shareddrive>\usr\sap\<SID>\SCS<InstanceNrJava>
Table 5. Junction creation arguments
As seen in the following figure, after the sample clusters installation, the cluster groups AAA and BBB were then activated on server A, and CCC on server B. All the SAP instances file system accesses were redirected to the respective shared disk. The external access takes place as usual in the cluster through the cluster group virtual IP address. With this configuration, if Server A crashes due to a hardware failure, two things will happen. First, the shared disks of both applications AAA and BBB will be activated on server B. Next, the virtual IP address of cluster group AAA and BBB will be activated on server B. By using the junctions that point from the shared disk to a local hard drive of a cluster server, a client is able to resume its work as usual and can resolve all data. Clients who previously already have worked with the SAP application BBB on server B are not affected. The following figure shows the situation after the failover:
48
Using the junction configuration, all the SCS instances of a larger SAP landscape can now be configured as one cluster. In general, Multi-SID clusters can also protect the database instances. Because of the varying resource requirements of databases compared to a SAP SCS instance, the sizing could be more difficult. Therefore, a better design would be to place the databases and the SCS instances on two different clusters. The following figure shows this system structure:
49
Figure 29. Separate database and SCS instance clusters for simplified sizing
The database servers would have an additional SAP Standalone Gateway configured. This is required as a local service for administration. Finally, each of the database servers would also get their own SAPOSCOL service installed for performance monitoring. Multi-SID clusters demand a different approach during a cluster installation, but require no changes in the application operation. As a minimum requirement, the use of SAP SCS instances is required. For a pure Java system, it is already possible in version 6.40. With ABAP or dual-stack systems, version 7.0 must be employed. The installation of multi-SID cluster solutions will be described in a separate installation guide for the NetWeaver 7.0 SR3 release. SAP note 106275 describes how the SAP supports a multi-SID cluster for the SAP AS 7.0. The Multi-SID WSFC Installation for SAP NetWeaver 7.0 compact disc master is available at http://service.sap.com/swdc. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password.
SAP on Windows Server 2008 R2 - High Availability Reference Guide To access this information after logging on to the site, do the following: From the left menu pane, click on Download. From the left menu pane, select Installations and Upgrades. From the left menu pane, click on Entry by Application group. In the list that appears in the right pane, click SAP NetWeaver. From the right menu pane, select SAP NetWeaver. From the right menu pane, select SAP NetWeaver 7.0. From the right menu pane, select Installation and Upgrade. From the right menu pane, select Windows Server. Select MS SQL SERVER as the database, and then scroll down to the list of downloadable objects.
50
The SAP Installation Guide is available at https://service.sap.com/instguides. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this guide after logging on to the site, do the following: Select Installation Guide - SAP NetWeaver 7.0 SR3. Scroll down to the Multi-SID Installation on Windows WSFC section. Select Windows as the platform and SQL Server as the database type.
Multi-node cluster
Besides the previous limitation of only one SAP system configuration per cluster, there was also a restriction in the number of cluster member nodes supported for SAP clusters. While Windows Server 2003 could support up to eight servers and Windows Server 2008 R2 could support up to 16 servers in a WSFC cluster, SAP only supported two-node clusters before SAP NetWeaver 7.0 SR2. Because these limitations no longer exist with NetWeaver 2004 SR2, multi-node clusters are now possible. However, if Replicated Enqueue is used, SCS must still be configured to run on two nodes. The following figure shows a cluster with three servers. Two of the servers actively run a SCS instance and the SAP system database while the third server is a backup in case an error occurs on either of the first two servers. With proper sizing of the main storage and the CPUs in the middle server, it is possible for both SAP instances to run in the middle server at the same time.
51
Multi-Node clusters are supported with SAP NetWeaver 7.0 SR2 using the SAP installation tool, SAPINST. The installation of additional nodes in a WSFC cluster is described in the SAP Installation Guide.
52
IT infrastructure protection
Applications always have a direct relationship to the servers in a data center. Across these servers necessary resources like CPU, memory or disk storage are made available. At the same time, applications and server operating systems are consumers of central IT services. These services include: Centralized backup processes File and print services Active Directory Deployment services Patch server Network services such as DNS or DHCP
Not only does the server that the application is running on need to be protected against interruptions, but all data center services and resources that are significant to application operations must be protected as well. This fact is especially important because the data center central services serve all applications and could cause an interruption on a larger scale than the failure of a single AS. For example, after a DNS service interruption, no name into an IP address resolution in a data center can be carried out. The following list contains some critical services that might require protection: DNS DHCP WINS NFS server Fileserver with SMB/CIFS Print server Authentication Time synchronization Backup functions Central monitoring service
Since there are many critical protection services, detailed discussion of these services is beyond the scope of this white paper. Additional documentation is available at http://technet.microsoft.com. If third party solutions are being used, the high availability discussion should incorporate the vendor perspective as well.
53
54
Planned downtime is crucial for reliable and safe application operation in addition to computer systems and their supporting environment. By applying the recommended fixes for known software bugs, increasing the computer system resources, or testing the data center high availability solutions, the vulnerability of an application service against unplanned downtime is largely minimized. Contrary to this requirement, 24x7 application availability is becoming increasingly necessary and the time and frequency an application service is unavailable must be minimized. In fact, planned downtime in general has more to do with the application service unavailability than unplanned downtime. Reducing planned downtime should therefore be part of any high availability strategy. When determining possible solutions for planned downtime, it is important to consider the frequency that downtime occurs. If, for example, an offline database backup requires downtime once a week, it is a premium candidate for a technical solution that helps to eliminate this downtime. Many activities occur only once a month or annually such as an operating system or application upgrade. The less frequent the downtime, the higher the probability to find a convenient time slot where this work can be performed. In any case, proper planning and change management is one of the essential tasks to minimize and manage planned downtime. On the technical level, there are many options available to minimize planned downtime for dedicated tasks including the creation of backups and the operating system or database engine patching process. In any case, the appropriate hardware architecture in addition to proper planning is required to achieve this goal.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Operating hours: 8:00 to 20:00 CST, 5 days per week, Monday through Friday. System availability: 22 hours, 7 days a week 99.5 percent annual availability during the defined operating hours
55
In the above example, the IT department would have a maintenance window of two hours per week. They would need to take additional measures to ensure that an unplanned downtime would not exceed 0.5 percent of the uptime or 22 hours. Change management strategy deployment Having a limited time for maintenance requires the IT staff to get the most out of the available time. Typically, the work flow during any maintenance action involves having a backup copy of the existing state, performing the required work, and testing throughout before the system is returned to production. Proper preparation is one of the key factors for success. Testing changes must occur first on test systems in order to verify the side effects of the change. This process generates information regarding the time requirements to perform this task. An additional benefit is that the IT staff learns about the required steps while working with the test systems. This also helps to minimize the downtime when the same work has to be done on the productive server. Another important task is planning ahead to have enough resources like disk storage or main memory for the future growth of the SAP system. By providing enough resources, the SAP system stability and quality of service improve and frequent shut downs for the installation of additional components can be avoided. In productive SAP systems, a common strategy is to inflate the required hardware resources at the start date of the productive use and maintain enough headroom for at least six months of growth. Any further extension should also reflect this principle. Besides adding resources, there are also strategies such as archiving that can be incorporated to minimize the storage requirements of a SAP database. Planning the operating system and application software maintenance is another operational aspect. It is essential to know the software vulnerabilities and install fixes in a timely manner. Typically, installing fixes needs to be synchronized and installed in a sequential manner. For example, test and QA systems are updated first to work out the installation issues. The production systems are updated only after the issues are resolved. The amount of security vulnerabilities in a system can be minimized by a process called hardening. Hardening a SAP system is configuring the SAP system with only the minimum platform functions that are necessary for operating the system. Additional information about IT landscape hardening can be found by searching for the SAP Hardening and Patch Management Guide for Windows Server white paper at: https://www.sdn.sap.com/irj/sdn
SAP on Windows Server 2008 R2 - High Availability Reference Guide Snapshot backup
56
Backing up a large database might take a long time. The primary issue when creating a backup is that it must be transactional consistent in order to use it for a potential restore. Transactional consistency means that all transactions are either finished or not contained in the backup. SQL Server database backups are created by using the backup database command. This command first executes a checkpoint which means that all pages that have been changed since the last checkpoint and still reside in memory are flushed from the database server main memory to the storage subsystem. After this operation is complete, the database files are backed up by copying the data to another disk or a tape device. To maintain the transactional consistency, the transaction log file is also copied during this process. The transaction log is used to roll back or undo transactions that were not finished at the time the backup was made. Despite the fact that SQL Server backups are online, the backups produce an additional load for the storage subsystem. Therefore, one usually tries to minimize the time of an online backup. In order to minimize the time a backup will impact normal system operation, it is possible to use the snapshot feature of SQL Server 2005/2008 R2 to create the backup. Snapshot backups reduce unavailability of the SQL Server 2005/2008 R2 database during a backup to a couple of seconds. This is especially useful for moderate to very large databases where availability is very important. SQL Server snapshot backup is accomplished in cooperation with third party hardware or software vendors, or both. These vendors use SQL Server 2005/2008 R2 features that are designed for this purpose. The underlying backup technology creates a point-intime copy of the database image that is being backed up. The instantaneous copying is typically accomplished by splitting a mirrored set of disks or by creating a copy of a disk block when it is written. This preserves the original. At restore time, the original is made available immediately and the synchronization of the underlying disks occurs in the background if necessary. This restores operations almost instantaneously. The following figure shows an example of snapshot technology with NetApp FAS storage system and the NetApp SnapManager for Microsoft SQL Server and SnapDrive for Windows solution. In this example, the time required for backups and restore can be reduced to seconds by using SnapManager.
57
Detailed information about the SQL Server 2005/2008 R2 Snapshot Backup feature is available at http://msdn.microsoft.com/en-us/library/ms189548.aspx.
SAP on Windows Server 2008 R2 - High Availability Reference Guide the specified amount of time has expired. With this configuration, the instance is made idle quickly and can be shut down.
58
It is required that all the remaining SAP AS instances in a logon are able to handle the workload sufficiently. In larger systems, it might be appropriate to prepare one universal AS instance that can join several groups. This can be achieved by installing several SAP AS instances on a server and start the respective instance when needed. Such a standby AS could be used temporarily to maintain the transactional performance of a SAP system. The following figure shows the setup of this landscape:
There are still the central elements of the SAP system including the servers for the SCS instance and database. If both components are in a WSFC cluster, a server can be isolated using a planned failover to the respective standby server. This switch can actually happen at a convenient time with little effort. The empty server can subsequently be patched and restored to operation. SQL Server instance maintenance Expanding on the previous maintenance concept by using SQL Server Database Mirroring adds the option to patch the database engine installed on a server while the SAP system continues to work. The basic principle is to switch the database to the mirror copy when a patch needs to be installed at the database engine of the original server. After successful installation, the database is switched back and the same process would be executed on the mirror side as well. See the following figure for an example of this:
59
More information on SQL Server database mirroring can be found in the Disaster Recovery Solutions section. An example of a patch cycle for Windows and SQL Server patches by using the above concept could look like: Patch schedule: Windows and SQL Server patches applied monthly (if applicable) SAP service packs applied during quarterly release Patch first in sandbox and test. Only place in production after a few days of successful testing. If no reboot is required, apply the patches and the patch process is finished. If reboot is required, perform the following steps on each AS: Isolate Dialog/Batch server from: Logon group RFC group Update group
Patch sequence:
Patch process:
SAP on Windows Server 2008 R2 - High Availability Reference Guide Batch group Spool (or have redundant spool server)
60
Drain connections, patch, reboot, then add back into the respective group and proceed to next server. If required, take the temporary AS into the respective group. Suspend mirroring, patch, and reboot secondary server, re-synch, fail over to secondary, patch and reboot primary server short SAP downtime during failover has to be planned. There is no need to fail back. Relocate the SAP central instance in the WSFC cluster to the database server. Patch and reboot inactive node. Fail over database and CI, patch and reboot other node short SAP downtime during failover has to be planned. Distribute database and central instance on the two nodes as before for better performance.
SAP on Windows Server 2008 R2 - High Availability Reference Guide To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Search for Planned and Unplanned Downtime.
61
Information about SAP upgrades can be found at https://service.sap.com. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. After logging on to the SAP support portal, click the Quick Links menu and search for /upgrade. More information on SAP upgrades can be found on SDN by searching for upgrade at: https://www.sdn.sap.com/irj/sdn The SAP Service Marketplace has the following related notes available at: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 139513: Merge transports for high availability systems SAP note 361735: Inactive import of reports
SAP on Windows Server 2008 R2 - High Availability Reference Guide The Hyper-V host cluster can perform a Live Migration of the VMs without application service downtime. The following example describes this process in more detail.
62
A Hyper-V host cluster configured for Live Migration and a VM running a SAP application is actively used by clients. At some point, the administrator must migrate this VM to another server in the Hyper-V host cluster since the server that the VM resides on must go into maintenance. Initially, while the VM is still actively used on the primary server, an empty VM is created on the second server and the memory image of this VM is copied to the second server. If the memory pages on the primary server are changed during this process, Hyper-V detects this and copies those pages again. Eventually, the number of pages that are different between the two servers is significantly reduced. When the difference is small enough, Hyper-V pauses the VM on the primary server and copies the last set of changed pages to the new server. Subsequently, the client access is re-routed to the new server and the VM on the primary server can be deleted. Since the final state transfer happens very quickly and no TCP timeout occurs, the client does not recognize this transfer. Note: It is important to note that Live Migration does not work for unplanned downtime. In the case of a server failure, the VMs will fail over using failover clusters. The Live Migration process must be planned and requires an active primary system for the duration of the migration. Since the Hyper-V host cluster and Live Migration use the same setup, this solution is an extension of the high availability solution with WSFC. Live Migration provides the capabilities for minimizing planned downtime in a virtual environment. These capabilities are not available for applications that must be installed directly on the physical server. More details on how to set up a Live Migration cluster are available in the following documents: Windows Server 2008 R2 Hyper-V Live Migration white paper available at: http://download.microsoft.com/download/C/C/7/CC7A50C9-7C10-4D70-A427C572DA006E61/LiveMigrationWhitepaper.xps. Best Practices for SAP on Hyper-V white paper available at: http://www.microsoft.com/virtualization/en/us/solution-business-apps.aspx. Hyper-V: Live Migration Network Configuration Guide available at: http://technet.microsoft.com/en-us/library/ff428137(WS.10).aspx.
63
SAP on Windows Server 2008 R2 - High Availability Reference Guide Database data inconsistencies
64
Data consistency in the SAP system central data repository is one of the most fundamental requirements of a stable SAP operation. After all, data only exists once in an application. As we have discussed in the previous section, there are numerous triggers that can cause data corruption at any given time during operation. What is especially difficult in this class of problems is the early recognition of an error condition before it causes damage. Database consistency checks do not typically take place during normal operation, or at least during normal workload. A procedure to perform such a consistency check with a SQL Server database is discussed in the Data Inconsistency Protection Solutions section in this white paper. Other database vendors have developed similar procedures. These procedures typically can be found in the respective database vendors documentation. Sabotage and accidental data deletion Accidental data deletion in the database or on the file system level can have very serious consequences for application operations. It can cause a serious disruption to the normal course of operations, or even bring them to a total stand still. In order to avoid such problems, one must try to address IT operation security aspects through a concept of authorization in which only designated persons have permissions to work in their area of expertise. Of course, especially in the group of administrators, there is overlapping authorization needed for their daily work. Indeed, it appears that accidental deletion of data is not a rare problem. From past experiences with large data center operations of SAP infrastructures, there are many reports about SAP application downtime due to deleted data or tables in the database. Once the damage has occurred, the only way to recover is to restore the missing data from the last backup. This can be challenging since in most cases only the missing data needs to be restored. If for example only a single table is affected, it does not make sense to restore the complete database from the last backup. This procedure would cause the data in all other tables not affected by the issue to be set to the same backward state as the affected table. Therefore, restoring these tables would potentially cause more data to be lost. Databases with snapshot technology can be especially helpful to minimize the downtime duration. A snapshot is a transactional consistent point in time after which all changes are directed to a different physical location. In other words, they are an exact image of the database to the point of its generation and can, without recovery, be put into operation. During the system recovery process, it is possible to export tables from the snapshot and import them into the active system. Alternately, in case of a very serious problem, the complete condition of a snapshot could again be restored in the active database. However, under these circumstances, changes in other tables might be lost too. Data loss through viruses and worms Viruses, worms and Trojans are an unfortunate reality that every business must address. Malicious code, that tries to secretly enter computer systems, might start activities which range from espionage to data erasure. This attack is not necessarily external. For example, employees who work with their laptop at home or in a hotel might bring an unwanted guest back when they plug their computer into the company network. Another reason that viruses might appear is because infected personal software is installed on a PC attached to the company network.
65
Security measures that are taken to minimize data loss or espionage of confidential information are significant. The measures taken include the implementation of firewalls, virus scanners, and surveillance tools, as well as employee policies. Optimal security requires that appropriate measures are taken on all levels of the IT operation including: Virus scanners on the computer level. Demilitarized zone for outbound communication. Firewalls at the network level. Well-developed authentication and audit procedures at the application level.
Security measures also include operational tasks, such as the timely installation of security patches to close any possible gaps immediately after such vulnerabilities have been published. A detailed discussion of the threats, as well as possible concepts and measures are outside the scope of this white paper. The Microsoft TechNet library provides detailed information about Microsoft products and technology for IT professionals. The Microsoft TechNet library can be found at: http://technet.microsoft.com/en-us/default.aspx
Appropriate backups can be used to restore individual files in case a single file has been deleted. However, these backups can also help to recover a complete system if a severe
66
failure happens. Even in the case where a disaster destroyed the original computer systems, backups can be applied to a second computer and operation will be restored. A backup and restore strategy is the last step in a recovery from an unforeseen event that will return a database to some predefined point: Most likely to the last completed transaction prior to the failure. All aspects of the backup and restore strategy should be well documented and reviewed regularly. Most importantly, they should be tested regularly to ensure that the data and the media for backups are valid and that the processes work as expected. Database backup strategies The backup and restore components provided with Microsoft SQL Server 2000 and later enable the administrator to reproduce a database to an exact replica of the original database at any point in the database history from the time an appropriate backup strategy was implemented. There are several backup types available: Full backup: Makes a complete backup of the database to the last completed transaction affected during the backup process. Differential backup: Makes a copy of the database pages changed since the last full backup. It is a useful backup mechanism to back up a database without consuming as many resources as a full backup. In a restore operation, this is used in conjunction with a full backup. Transaction log backup: Makes a backup of all the completed transactions that have taken place since the log was last backed up. A transaction log backup is used in conjunction with a full backup, and potentially differential backups, to enable an administrator to restore a database to a specific point in time or to the last completed transaction that was backed up. File backup: When a database consists of multiple files, each file can be backed up individually. This provides an accelerated backup process as well as a faster restore process. File backups are used in conjunction with transaction log backups. Step-by-Step Guide for Windows Server Backup in Windows Server white paper available at http://technet.microsoft.com/de-de/default.aspx. Note: This document can be found by searching for the title. SAP Backup and Restore Information MS SQL Server help documentation available at http://help.sap.com.
To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Choose Technical Operations Manual. Choose General Administration Tasks. Choose Database Administrations. Follow the MS SQL Server link. Follow the Monitor for Backup and Restore Information link.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Blog entry: How does Microsoft perform backups in their SAP system landscape available at:
67
http://blogs.msdn.com/saponsqlserver/archive/2008/03/28/how-does-microsoftperform-backups-in-their-sap-system-landscape.aspx
68
For more information about SQL Server log shipping, please refer to the Disaster Recovery Solutions section.
Snapshots
A snapshot is an image of information that has been frozen at a certain point in time. The snapshot delivers an accurate picture of the information at an accurately defined point in time. Snapshots are typically taken from fast changing data, like in a file system or in a database. Technically, snapshots typically use the copy-on-write principle. In this principle, all data on a storage media is represented as a chunk of data blocks. Data blocks access is provided by pointers. Each block has an individual pointer that describes where this block resides on the media. A snapshot first takes all the pointers at a certain point in time and saves them. Any time a data block is changed, the data block is first copied into a snapshot file and the system uses a new pointer for the changed block. By copying only the pointers to data initially and copying data blocks only if changes occur, snapshots are very fast and require relatively little disk space. However there will be some impact for copying changed blocks to the snapshot. Database snapshots with SQL Server 2005/2008 R2 With SQL Server 2005/2008 R2 Enterprise Edition, database snapshots can be created. A database snapshot is a read-only, transactional consistent view of a database. Transactional consistent means that only those transactions which have been finished by a commit work statement are taken into the snapshot. Snapshots can be generated automatically, at any point in time, and also be used for reading access during daily operations, such as report generation. The snapshot copy of the database can be queried by client applications and, in the event of the original database becoming damaged or unusable, it can be reverted to the state it was in when the snapshot was created. Since every new snapshot requires storage space, it is recommended that the older snapshots are always deleted after a certain time. The optimal time for retaining snapshots depends a bit on the individual requirements, but a time interval of one or two days for retaining snapshots is sufficient.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Additional information about SQL Server 2005/2008 R2 available by searching for snapshots in the SQL Server 2005 online books at: http://msdn.microsoft.com/en-us/library/ms130214.aspx. Further information can be found on the SQLCAT Web site at:
69
http://sqlcat.com/whitepapers/archive/2008/02/11/database-snapshot-performanceconsiderations-under-i-o-intensive-workloads.aspx. Snapshots with storage solutions As we have seen, the snapshot feature of SQL Server 2005 provides an opportunity to create a transactional consistent snapshot on the database level. In addition, there is another snapshot feature on the storage level. This kind of snapshot works on the physical storage layer and is not restricted to certain data types like database data. From the technical perspective, a snapshot is represented by pointers to the blocks of data on the storage volume. During a write in a specific data block, this block is copied into a snapshot file and the pointer in use by the system is pointing to the new block while a copy of the old pointer is maintained. In case one would need to go back to the time when the snapshot was created, the old pointer would be re-activated. In other words, the snapshot would be reverted. Another possibility is the deletion of a snapshot which would merge all copied blocks with the original data. This would require the recorded changes to adhere to the new standard. In order to recover a system after a severe error, volume snapshot copies can be activated as read-only and any damaged data can be extracted from this copy. Since the creation of the snapshot during normal operation might be done online and without severe performance impact on the system, it is possible to maintain several snapshot copies per day. This process can be automated and might run without human intervention. A very important aspect of snapshots is scheduled backup execution. Snapshots do not replace backups. Since snapshot copies of the data can reside on the same physical media as the original data, it is possible to lose this data if the physical volume becomes defective. Snapshots are a great and efficient way to maintain a consistent state in time that can be reverted if needed. Hyper-V snapshots for virtual machines With Hyper-V snapshots, administrators can capture current VM time images that can be accessed at any stage. Since a consistent VM state can be created by using snapshots, this feature can be used before any critical VM change such as when applying patches, changing configurations, or upgrading applications. If any of these steps fail, the VM can be easily reverted to the previous state. VM snapshots can be created inside the Hyper-V administration GUI or by using System Center Virtual Machine Manager. Hyper-V enables users to create a hierarchy of snapshots. When a snapshot is created, the existing VM VHD file is frozen in its current state and any change that occurs inside a VM after a snapshot is created is transferred to a new file that is called an AVHD file. If another snapshot is created after the first one, the first AVHD file is frozen and any change made afterward is transferred to another AVHD file.
70
It is a good strategy to run the DBCC check outside downtime and with the normal system workload during low volume operation such as over the weekend. Based on the SQL instance configuration, the DBCC check command only requires one CPU core. Therefore, with a modern multi-core server, there are still enough cores available to maintain the SAP operation.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Large database consistency
71
With very large databases over a terabyte in size, consistency checks using the DBCC CHECKDB command in a maintenance window or during low volume times might impact production for too long. To be able to regularly examine the database consistency for such systems, one can choose another approach. It is possible, for example, to restore the last backup of a productive database on a test system periodically and then carry out the consistency check on this server. The advantage of this approach is that the command runtime does not negatively affect the productive SAP system performance anymore since it is executed on the test system. A second advantage would be the fact that the test system gets frequently updated with the latest productive data. Finally, restoring the last backup of the productive SAP database to a test system is a very efficient check if the backup itself is usable. Finding backups that are not readable or do not contain the right data is a real disaster once the backup is really needed. The process of restoring a backup also helps the administrators to train in the procedure. Proper skills in the backup and restore process are very helpful if needed in a real emergency. The training advantage should not be underestimated. SAP and Microsoft support deal with dozens of cases every single year where database backups prove not to be restorable due to mechanical issues such as tape errors, human error, or simply lost tapes. Additional data consistency check information is available in the following locations: SAP note 142731: DBCC Checks of SQL Server available at: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP Help also has information about database server checks at http://help.sap.com. To access this information, do the following: From the left menu pane, click SAP NetWeaver. Under SAP NetWeaver 7.0 Library, choose English. Choose Technical Operations Manual. Choose General Administration. Choose Database Administration. Follow the MS SQL Server link. Under Periodic Tasks, follow the Database Server Checks link. http://msdn.microsoft.com/en-us/library/ms130214.aspx Note: Search for DBCC CHECKDB:
72
Besides these two considerations from the business process perspective, there are two more prerequisites that influence the decision for a disaster recovery solution. These are: The distance between the primary and the disaster recovery site. The available network bandwidth between the primary site and the disaster recovery site.
The following chapter describes in detail the technical components available with Windows-based systems to achieve geographical distribution. Additionally, the solutions for maintaining data copies over large distances based on SQL Server are also covered. Disaster recovery solutions are typically combined with other solutions to protect against outages. It is typically not desirable for example that a local server failure results in a complete site transfer. Here, a local cluster would protect against hardware failures while the geographic dispersion is only used in case of a real disaster. The decision when a site transfer is required is typically not automated and requires human interaction.
73
Note: Because of the complexity of geographically dispersed clusters, the hardware vendor must be involved with the design, setup, configuration, and subsequent support of the implementation. SAP support is limited to the standard WSFC cluster implementation that does not recognize geographic dispersion. From the perspective of a SAP application, it looks as if the cluster is local. Storage replication In order to enable a fast failover on a secondary site in case a catastrophic event occurs, a synchronous copy of the file system in use by the SAP system has to be maintained on each site. This block-level replication can be achieved with hardware or software-based replication. Hardware-based replication With this method, the complete replication task is done at the storage level. The following figure shows the basic setup:
74
The advantage of this solution is that it works completely independent from the application. However, as the replication is performed by the storage controller, the SAN storage devices have to be from the same vendor and there is a high bandwidth requirement for the replication. Additionally, software components from the storage vendors are required to enable WSFC to appropriately use this configuration. Examples of storage-based replication providers are: EMC with SRDF HP Storage Works Business Copy EVA NetApp MetroCluster with SyncMirror IBM GDPS with PPRC Hitachi Storage Clusters
Note: For hardware or software-based replication solutions to work, they are required to replicate SQL Server write I/Os in exactly the same order as originally issued by the database. Software-based replication With this method, any change on the active side is copied over the network to the secondary side and replicated there. This requires the use of a software product that is not part of the initial cluster setup. While these software components increase the implementation cost, the advantage is that different storage devices can be used. It is even possible to have SAN storage on one side and a Direct Attached Storage (DAS) on the other side. Examples of vendors for software-based replication products include:
SAP on Windows Server 2008 R2 - High Availability Reference Guide NSI Double-Take Legato RepliStor Symantec Storage Replicator SteelEye DataKeeper Neverfail ClusterProtector
75
The following figure shows the principle setup when using this method.
Cluster quorum configuration In simple terms, quorum for a cluster is the number of elements in a cluster that must be online in order to enable proper cluster function. If one or more nodes in a cluster can no longer communicate to the other nodes in the cluster because of a split situation for example interrupted network connections there must be a voting mechanism that determines which side has the majority (quorum) to actively hold the applications in the cluster. Each WSFC cluster has a special resource known as the quorum resource. While with Windows Server 2003 almost all server clusters used a disk in cluster storage as the quorum resource, a different approach is used with Windows Server 2008 R2. If a node could communicate with the specified disk, the node could function as a part of a cluster, and otherwise it could not. This made the quorum resource a potential single point of failure. In Windows Server 2008 R2, a majority of votes is what determines whether a cluster achieves quorum. Nodes can vote, and where appropriate, either a disk in cluster
76
storage, called a disk witness, or a file share, called a file share witness can vote. There is also a quorum mode called No Majority: Disk Only that functions like the disk-based quorum in Windows Server 2003. Aside from that mode, there is no single point of failure with the quorum modes since what matters is the number of votes, not whether a particular element is available to vote. There is a comprehensive description about the available quorum options for Windows Server 2008 R2 available at http://technet.microsoft.com/en-us/library/cc770620.aspx. Majority Node Set configuration for Windows Server 2003 In a Majority Node Set (MNS) cluster, each node in the cluster maintains a copy of the quorum data locally on its system disk. This MNS quorum is constantly synchronized and kept consistent by the cluster itself. If the configuration of the cluster changes, that change is reflected across the different member nodes. The change is only considered to have been committed if it has been successfully distributed to: (Number of nodes configured in the cluster/2) + 1 This ensures that a majority of the nodes have an up-to-date copy of the data. The cluster service itself will only start up and therefore bring resources online if a majority of the nodes configured as part of the cluster are up and running the cluster service. If there are fewer nodes, the cluster will not have the quorum and therefore, the cluster service waits to restart until more nodes join. In the case of a failure or split-brain, all partitions that do not contain a majority of nodes are terminated. This ensures that if there is a partition running that contains a majority of the nodes, it can safely start up any resources that are not running on that partition. This ensures that it can be the only partition in the cluster that is running resources. MNS quorum implementations are recommended for geographically dispersed clusters. By having a single MSCS member node in a separate location, split-brain situations can be avoided by using one node as an arbiter. See the following figure for an example of this:
77
SAP supports a Majority Node Set Cluster if it is part of a cluster solution offered by the Original Equipment Manufacturer (OEM), or Independent Hardware Vendor (IHV). File share witness for Windows Server 2003 The file share witness feature is an improvement to the current Majority Node Set (MNS) quorum model of Windows Server 2003. This feature enables the use of a file share that is external to the cluster as an additional vote to determine the status of the cluster in a two-node MNS quorum cluster deployment. One of the disadvantages of a two-node MNS cluster is that it cannot sustain the failure of any cluster node without losing the majority of nodes. In other words, it cannot continue operation. The only solution to overcome this problem is to configure at least three nodes in a MNS cluster. The three cluster nodes need to be continuously available and should be in different physical locations. With the File Share Witness feature, it is possible to use an external file share instead of the third cluster node also referred to as the witness. By using the File Share Witness, a two-node MNS cluster can be configured and remains operational even if one cluster node dies. The file share used acts as an additional vote to determine which node takes ownership of the configured cluster resources. Additional information about the File Share Witness feature is available in the Microsoft Knowledge Base Article 921181 at http://support.microsoft.com/kb/921181.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Network configuration Any WSFC configuration requires at least two network adapters for the following purposes: A public network that is used for the communication between the SAP central instance, SAP AS, and SAP system client connections. A cluster private network that is used internally for status exchange and WSFC cluster heartbeat information between the member nodes.
78
Each of these network adapters is required to have its own physical IP address and corresponding host name. The cluster service in a WSFC cluster is unaware of a possible geographical dispersion and assumes that its public and private network interfaces still exist in the same network segment with the same IP subnet. This is because cluster software is unable to determine network topology and because it operates on IP failover that only functions within the same subnet. To accommodate these restrictions for geographic dispersion, organizations can implement VLAN technology. Virtual LANs (VLANs) can be viewed as a group of devices on different physical LAN segments that can communicate with each other as if they were all on the same physical LAN segment. Even though some of the cluster service network communication limitations have been removed in Windows Server 2008 R2, a single subnet is still required. This is still true for SAP component communications as well. With Windows Server 2003, the limitation for the heartbeat roundtrip time is 500 milliseconds. This fixed parameter is directly dependant on the latency and bandwidth of the network connections used between the two sites. With Windows Server 2008 R2, this parameter became configurable between 250 and 2000 ms on the same subnet. Theoretically, even different subnets are possible with Windows Server 2008 R2, but due to the requirements of the SAP instances, SAP installations are only possible in a single subnet configuration. Additional geographically dispersed cluster information is available in the following resources: White paper: Geographically Dispersed Clusters in Windows Server 2003 http://www.microsoft.com/windowsserver2003/techinfo/overview/clustergeo.mspx White paper: Server Cluster Quorum Options in Windows Server 2003 http://technet.microsoft.com (Note: Search for the title.) White paper: Stretching Microsoft Cluster with Geo-Dispersion http://www.microsoft.com/technet/prodtechnol/windows2000serv/maintain/optimize/g eoclust.mspx White paper: Server Clusters: Majority Node Set Quorum http://technet.microsoft.com (Note: Search for the title.) Microsoft Storage solutions http://www.microsoft.com/windowsserversystem/storage/default.aspx Microsoft Knowledge base article: Microsoft Cluster Services Installation Resources http://support.microsoft.com/kb/259267 Multi-Site clustering with Windows Server 2008 R2 https://www.microsoft.com/windowsserver2008/en/us/clustering-multisite.aspx
79
Transactional log backups on the primary database server are copied to a local disk on this server and transferred over the network to the standby database in the configured time interval. Transactional log backups received on the standby server are applied to the database. It is also possible to transfer the transactional log backups from the primary to multiple standby servers. The process of changing the database role from primary to secondary or to bring the secondary database online in the event of the primary database becoming unavailable is not an automatic process. The secondary database can be brought online manually. During the process of setting up SQL Server log shipping, initially a database backup copy is restored on the standby server. With log shipping in place, every transactional change is reproduced on the standby side.
80
By design, SQL Server log shipping might only maintain the SAP system database in geographic dispersed way. As the complete functionality of a SAP system requires a SAP central instance with the network shares and possibly an AS, these have to be maintained separately. SQL Server log shipping is therefore not considered a full disaster recovery solution, but is a simple method of maintaining a copy of the database of a SAP system and can be combined with other technologies like database mirroring or WSFC clusters. The following figure shows the general setup with a local WSFC cluster:
Additional SQL Server log shipping information is available in the following resources: SAP Service Marketplace URL: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 493290: Configuring SQL Server Log shipping SAP note 1101017: Log shipping on SQL Server 2005
SQL Server 2000 high availability series http://www.microsoft.com/technet/prodtechnol/sql/2000/deploy/harag05.mspx White paper: SAP with SQL Server 2005 http://www.microsoft.com/sql/techinfo/whitepapers/sap-with-sql-server.mspx
SAP on Windows Server 2008 R2 - High Availability Reference Guide White paper: Using SQL Server 2005 with SAP R/3 http://www.microsoft.com/technet/itsolutions/msit/operations/sql2005sap.mspx SAP Help documentation: SAP High Availability http://help.sap.com To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Choose Technical Operations Manual. Choose General Administration Tasks. Choose High Availability. Follow the SAP High Availability documentation link. In the left menu pane, choose Database High Availability. Choose High Availability for the MS SQL Server Database.
81
82
The mirror server that receives these transaction log records writes them into the mirror database log buffer before it writes them into a local transaction log file. The received transaction log records are then applied to the mirror database. During the transaction log application, all transactions executed on the active database are then executed on the mirror side. Therefore, both databases can be maintained on the same transactional level. While, for database mirroring, the active and the mirror database must always be available, there is an optional third role: the witness. With this optional configuration, in case of error, an automatic failover to a mirror database can take place. When the database mirroring is used in a high availability configuration, within seconds, the mirror server can take on the role of the active server. The mirror database becomes available in cases where the witness confirms the failover. Database mirroring assures the availability of a consistent standby database in case of a productive database interruption. By encoding the data packets in the network transport, good data security is assured. SQL Server 2005/2008 R2 enables three different mirror configurations including: Asynchronous mirroring Synchronous mirroring Synchronous mirroring with automatic failover
SAP on Windows Server 2008 R2 - High Availability Reference Guide Asynchronous database mirroring
83
Asynchronous data transfer between a productive database and a standby database means that there is no waiting to acknowledge the transfer before the pending transaction is concluded with the commit work statement. The primary advantage of this operation is that the transaction processing is minimized. The time required to acknowledge a transaction onto a mirror server can mean that with low network bandwidth, there is a significant performance bottleneck. Though the transaction performance is improved with asynchronous database mirroring, there is a significant disadvantage. For example, one cannot guarantee that all transactions were safely transported to the mirror server at any point in time. In case of an error, this situation can lead to a loss of committed transactions. With the asynchronous mirroring, SQL Server 2005/2008 R2 cannot switch automatically to a standby server in case of an error without an additional Microsoft partner solution. However, the standby database is still available, but only with the potential loss of committed transactions. The database can be designed to continue with productive operation, but the failover needs to be initiated manually. Asynchronous database mirroring is best applied in disaster recovery scenarios. Because of the greater distances between mirror servers, network bandwidths are often limited in this case. Presently, the log shipping technology introduced with SQL Server 2000 is often deployed in this scenario. Synchronous mirroring with automatic failover in case of error With synchronous database mirroring, the advantage is that the database transactions are seen as complete only if the writing process on the mirror side is complete. In this type of operation, it is guaranteed that the mirror copy always has the exact same transactional level as the original. Because of this increased data security, the automatic switching of the database operation in case of an error is possible. The prerequisite for the automatic failover configuration, however, is the installation of an additional database server or the witness. This witness is basically an additional instance of a SQL Server that is only needed for determining which mirror site is able to take over. This can be basically any SQL Server instance. Even the free SQL Server Express Edition would work. In case of a failure of the active database server, the mirror server and the witness supply for the majority (quorum) that defines who can actively hold the database. Even if the primary server recovers, the active database role is not accidentally returned to the primary server. This is because the quorum defines the mirror server as the active database owner after a failover occurs. Due to the bandwidth requirements of synchronous database mirroring, fast network connections with low latency are required. This typically also determines the maximum distance that two sites can be apart from each other. In current technologies, this distance is about 50 km. SAP database mirroring configurations There is detailed information about database mirroring installation and configuration for SAP available in the SAP note 965908. This note is also important when determining how to combine database mirroring with other technologies like log shipping.
84
Additional information about SQL Server database mirroring is available in the following resources: White paper: SAP with SQL Server 2005 http://www.microsoft.com/sql/techinfo/whitepapers/sap-with-sql-server.mspx White paper: Using SQL Server 2005 with SAP R/3 http://www.microsoft.com/technet/itsolutions/msit/operations/sql2005sap.mspx Books online: Microsoft SQL Server 2005 http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx White paper: SQL Server 2008 Technologies for SAP Solutions https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/60a236a2-81042b10-5ebe-8fef61cc82fd