SAP High-Availability PDF

Microsoft Collaboration Brief
June 2010
SAP Applications on Windows Server 2008 R2 High Availability Reference Guide
Authors
Josef Stelzel, Sr. Developer Evangelist, Microsoft Corporation, jstelzel@microsoft.com
Summary
This paper describes how to implement a high availability solution for SAP applications on Microsoft Windows Server 2008 R2. It is written for developers, technical consultants, and solution architects. This paper introduces the technologies and architecture used, describes various high availability scenarios, and discusses the implementation process. This paper also contains links to advanced features and technical topics including disaster recovery methods. Note: Access to some of the linked information might be restricted such as SAP notes available at the SAP Service Marketplace at https://service.sap.com. Access to this Web site is available only to registered SAP customers and partners, and requires a user name and password.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
ii
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Server, the Windows logo, SQL Server, and Active Directory are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Applies To
SAP NetWeaver 7.0 SAP NetWeaver 2004 SAP Business Suite (mySAP ERP) SAP Application Server SAP Replicated Enqueue SAP System Central Services
Keywords
SAP NetWeaver, disaster recovery, high availability, SAP Application Server, SAP Replicated Enqueue, planned downtime, unplanned downtime, SQL Server 2005/2008 R2, Windows Server 2008 R2
Contact
This document is provided by Microsoft Corporation. Please check the SAP interoperability area at www.microsoft.com/sap and the .NET interoperability area in the SAP Developer Network at http://sdn.sap.com for updates or additional information.
iii
Contents
Applies To ...................................................................................................................... ii Executive Summary ...................................................................................................... 5 High Availability Considerations .................................................................................. 6 Critical application availability requirements ................................................................ 6 Classes of availability problems .................................................................................. 6
Loss of physical resources .................................................................................................... 6 Logical errors and inconsistencies ........................................................................................ 7 Disasters ................................................................................................................................ 7 Planned downtime ................................................................................................................. 7
Service level agreements............................................................................................ 7

Availability measures ............................................................................................................. 8
High availability solution risks and side effects ............................................................ 9

Increased complexity ............................................................................................................. 9 Higher costs........................................................................................................................... 9
Hyper-V virtualization and availability........................................................................ 10

Guest clustering................................................................................................................... 10
SAP Architecture and Requirements ......................................................................... 12 SAP NetWeaver and its components ........................................................................ 12 SAP Application Server architecture ......................................................................... 13
ABAP system architecture ................................................................................................... 14 Dual-stack system architecture ........................................................................................... 18 Java system architecture ..................................................................................................... 22 SAP system single points of failure ..................................................................................... 23
SAP standalone engines........................................................................................... 29

The SAP Web Dispatcher ................................................................................................... 30 SAP standalone gateway .................................................................................................... 31 TREX ................................................................................................................................... 31 SAP liveCache..................................................................................................................... 32 SAP Content Server ............................................................................................................ 33
Unplanned Downtime Avoidance Strategies ............................................................. 35 Hierarchy of high availability solutions ...................................................................... 35
Data storage protection ....................................................................................................... 36 Server protection ................................................................................................................. 37 Network high availability ...................................................................................................... 37 Application specific configurations ...................................................................................... 39
Simple cluster for a single SAP system..................................................................... 40 Using multiple clusters for SAP instances and databases ......................................... 42 SAP Replicated Enqueue ......................................................................................... 44 Multi-SID cluster ....................................................................................................... 45 Multi-node cluster ..................................................................................................... 50 SAP application servers ............................................................................................ 51 IT infrastructure protection ........................................................................................ 52 Hyper-V host cluster ................................................................................................. 53 Planned Downtime Minimization Solutions ............................................................... 54 Planning ahead for minimizing planned downtime .................................................... 54
Change management strategy deployment ........................................................................ 55
Backup and patching solutions ................................................................................. 55

Snapshot backup ................................................................................................................. 56
Optimized server maintenance system architecture .................................................. 57

Server and operating system maintenance ......................................................................... 57 SQL Server instance maintenance...................................................................................... 58
SAP application planned downtime reduction ........................................................... 60
SAP on Windows Server 2008 R2 High Availability Reference Guide
iv
Hyper-V Live Migration ............................................................................................. 61 Data Inconsistency Protection Solutions .................................................................. 63 Logical error reasons ................................................................................................ 63
Database data inconsistencies............................................................................................ 64 Sabotage and accidental data deletion ............................................................................... 64 Data loss through viruses and worms ................................................................................. 64
Backup and recovery ................................................................................................ 65

Database backup strategies ................................................................................................ 66
Database log shipping .............................................................................................. 67 Snapshots................................................................................................................. 68

Database snapshots with SQL Server 2005/2008 R2 ........................................................ 68 Snapshots with storage solutions ........................................................................................ 69 Hyper-V snapshots for virtual machines ............................................................................. 69
Database consistency checks ................................................................................... 70

Large database consistency ................................................................................................ 71
Disaster Recovery Solutions ...................................................................................... 72 SAP system protection in a geographically dispersed cluster.................................... 73
Storage replication .............................................................................................................. 73 Cluster quorum configuration .............................................................................................. 75 Majority Node Set configuration for Windows Server 2003 ................................................ 76 File share witness for Windows Server 2003 ...................................................................... 77 Network configuration .......................................................................................................... 78
Microsoft SQL Server database log shipping ............................................................ 79 Database mirroring with SQL Server 2005/2008 R2 ................................................. 81
Asynchronous database mirroring....................................................................................... 83 Synchronous mirroring with automatic failover in case of error .......................................... 83 SAP database mirroring configurations ............................................................................... 83
Disaster recovery solutions for virtual machines ....................................................... 84
Executive Summary
Business applications are central to a corporate IT operation. All corporate business processes are supported by software solutions that help to better plan, process, or communicate in all business related tasks. Consequently, any service failure has an immediate and direct impact on corporate business results. This often decreases revenue and can damage the corporate image. This is especially true for SAP applications as corporations increase their dependencies on a productive IT environment. Enterprise Service Architecture (ESA) and the global network of interacting companies have increased both uptime requirements as well as the number of IT components that are ultimately needed to fulfill business requirements. As an increasing number of companies join global networks, there is always a time zone that utilizes a computing service. While in the past, centralized application systems like SAP R/3 have been used, ESA orchestrates the use of service providers in order to achieve a larger task. Those services can be distributed inside or outside a company and need to be available. High availability of mission critical applications has always been the focus for SAP infrastructures. The starting point for increasing availability traditionally has been to address the loss of a critical hardware resource that could generate downtime until the computer system is available again. More solutions have been developed over time to address other problems like downtime due to operating system defects, downtime caused by data inconsistencies, or downtime caused by disasters like earthquakes, floods, or terrorism. Even planned downtime, which is needed to upgrade systems or install patches, is contrary to the requirement to have an application service consistently available. However, planned downtime does reduce system vulnerability and increases reliability. This guide describes the solutions that address the various areas of availability for SAP on the Windows platform. It helps to identify the cause of potential downtime and provides the technical strategy to reduce or eliminate it. In addition, this guide provides solution description references that help the reader understand the technology and quickly find assistance. Microsoft has a long history of providing a comprehensive portfolio of solutions for protecting enterprise class applications like SAP. Microsoft Windows Server 2008 R2 offers even more functionality than previous versions with clustering, geographic distribution, and operating system security. Improved network configuration functionality, performance enhancements, and storage subsystem management included with Windows Server 2008 R2 make it easier to work with the latest technology from hardware partners. As a central component of Windows Server 2008 R2, high availability makes managing the complexity of modern infrastructures both effective and affordable.
SAP on Windows Server 2008 R2 - High Availability Reference Guide
High Availability Considerations

High availability refers to all technical or conceptual solutions that are used to improve the application availability. For the purpose of this paper, availability is defined as: usable for the intended purpose of supporting corporate business processes. Issues with business application availability can have many causes. These causes range from hardware problems to planned downtime due to patching or installing upgrades, to disasters that happen periodically on a larger scale. How to achieve optimal high availability is not always easy to answer. A hardware problem protection solution might not help protect the hardware from possible relational database inconsistencies. The goal of this white paper is to identify the various reasons for the lack of SAP application availability and describe the solutions available to safeguard these points of failure. The intent of this paper is not only to emphasize the benefits of high availability solutions, but to also identify the potential risks and side effects. When several options are available to solve the same problem, this white paper will help the reader decide which solution is optimal.
Critical application availability requirements

SAP applications are typically used for business critical processes or processes that are essential for maintaining the company workflow. Additionally, SAP applications often directly exchange data with external companies. For example, this is done to process orders electronically or to validate external data such as with credit card validation. The availability requirement for such applications has increased to nearly 24x7x365. It is primarily corporate globalization and Internet communication that has generated the increased availability requirements. No company can afford to manually track goods, maintain accounting, or control the financial streams. Having the application services available whenever the business process needs them is a necessity. Unavailability directly impacts revenue and reputation, and can even threaten the existence of a company. In addition, some business sectors like the financial sector have laws and regulations that require the corporations to operate their core applications in a failsafe and reliable manner.
Classes of availability problems

While the consequences of losing a mission critical application service for a company are always the same, the reasons for an outage are very diverse and can have a multitude of causes. This ranges from planned downtime for maintenance reasons to a disaster striking an entire geographic area. In order to implement rational protection against a potential problem, the nature of the problem must be identified. The typical problem types are listed in the following sections. The associated class of solutions with each problem type is discussed in more detail in subsequent sections. Loss of physical resources One of the more obvious reasons for an application service loss is a required hardware resource failure. Physical resources are not only servers, storage, or network infrastructures of a company, but include the facilities supporting the computing environment and provide shelter, air conditioning, and electrical power. While hardware resources are more directly related to the application environment, the supporting facility is also related to disaster recovery considerations.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Logical errors and inconsistencies
While many computer system hardware failures often directly and immediately generate unplanned application downtime, there are also problems that can cause logical errors, glitches and spikes during operation, and data inconsistencies. A random memory problem might for example cause data block corruption that potentially might only be discovered the next time this data is accessed. Accidental data deletion or file corruption caused by a computer virus is another example of logical errors. In many cases, the effect of these problems is not a hardware failure, but system degradation. However, since there is no way to predict when the system will use the data again, a potential problem might arise at any time during normal operation. Real application downtime is most often created during the process of recovering, such as when restoring the last backup or cleaning database problems manually. Since the data exists only once, maintaining data consistency is a crucial part of the availability concept. Disasters Disasters like fire, flood, hurricanes, earthquakes, and terrorism can instigate the loss of all IT systems in a data center. Problems of this scale might not be sufficiently addressed by having enough redundant hardware in one location. Besides having a proper geographical distribution of computer systems for the continuation of a critical application, typical questions include how to synchronize the data between the different sites and how to plan for the real event. In addition to a technical solution, the complete solution requires good planning skills and an in-depth knowledge of the applications and organizational requirements. Solutions that address the problems discussed are considered disaster recovery solutions rather than high availability solutions. Although, disaster recovery solutions might employ typical high availability techniques like clustering or database mirroring as part of a recovery plan, they clearly have a different scope than solutions that protect against a simple server hardware failure. Planned downtime Planned downtime occurs at an intended and often appropriate time, most likely at a time of low application usage. Planned downtime is typically implemented for server and software maintenance, upgrades or migrations, and changes to or the testing of critical configurations. Ironically, this maintenance helps to improve the computer system stability and security by eliminating known problems and by maintaining system resources. However, this maintenance does require application downtime. This white paper describes some architectural concepts that can help to minimize or eliminate planned downtime for typical SAP application tasks. However, proper planning and change management are still the main reasons for planned downtime. Therefore, it might not be possible to eliminate all planned downtime.
Service level agreements

As shown in the following table, availability measures throughout the IT infrastructure help administrators to determine which application service requires what level of protection. While the loss of a test or a QA system might impact the work in an IT department, the loss of a productive system almost always impacts company processes and outbound communication. Since improving availability is not without cost, it makes sense to focus on the most critical applications.
Note: To avoid user interruption, all dependencies must be protected as well as the primary application services. If a productive system is integrated into an IT infrastructure, this infrastructure is also critical as is a potential data provider or data consumer in the productive system. Any downtime associated with these dependent systems will interrupt the primary application services as well.
Availability level
Standard
Application
Application services and infrastructure components that can fail for a short period; typically one to two days without business impact. Standard often also implies a definition of minimum protection like using reliable servers, hot pluggable components, and so on. Applies to applications that need to be available even when a critical hardware resource is lost. Logical errors and loss or inconsistency of data needs to be addressed and a planned procedure for making the application service available again in case of a problem must exist. The duration of the outage has to be minimized. The application service is absolutely critical for the business processes. A service loss, even for a short period of time, might have a high financial impact on the company. All measures for protection must be taken.
Table 1. Service level agreements
High availability
Mission critical
Availability measures In order to measure and quantify computer system or application availability, the following formula is used: Availability = 100% * achieved availability / planned availability Availability is defined as the percentage that the application was used for an intended purpose. Defined availability values like 99.999 are often used in marketing as a solution quality indicator. The following table shows the assumed unavailability for various typical values.
Availability Percent 95 98 99 99.9 99.99 99.999
Achieved Days 346.75 357.7 361.35 364.635 364.9635 364.99635
Planned Days 365 365 365 365 365 365
Maximum possible downtime Days 18.25 7.3 3.65 0.365 0.0365 0.00365 Hours 438 175.2 87.6 8.76 0.876 0.00876 Minutes 26280 10512 5256 525.6 52.56 5.256
Table 2. Assumed unavailability
High availability solution risks and side effects

Improving the availability of an application service is a complex and intense task. The higher the requirement is, the higher the effort, cost, and complexity of the solution. When the system design requires the implementation of any high availability solution, it becomes a factor for the high availability assessment. This can cause downtime due to the emergence of new processes such as failover testing or disaster emergency training. Increased complexity Increased critical application loss protection using hardware redundancy, mirroring technologies, snapshots, clusters, and monitoring solutions always increase the complexity of an IT system. The disadvantage of complexity is twofold: It costs more to maintain and operate and it bears additional risks. A well educated staff must be available around the clock in case potential problems arise. Good planning, proper IT processes, and good communication between the teams in the IT department are also critical for reliable and secure operation. Higher costs Improved application service availability always requires more effort than standard solutions. Hardware failures might only be addressed by providing redundant servers that can be used in case of a failure. High availability software solutions need to be purchased, installed, and maintained. IT personnel need to be trained and IT processes must reflect the extended capabilities. For example, IT personnel need to periodically test and verify that the high availability functionality works. Even support from external providers often must be structured toward the increased requirements. Shorter response times and dedicated support offerings for improved availability carry higher price tags than standard offerings. The cost of implementing high availability solutions increases exponentially as the expected level of protection rises. While server clustering or database mirroring are standard high availability technologies today that can address problems of a single computer system, extending those solutions into a disaster recovery concept adds more costs. These additional costs include wide area networking, additional facilities and staff, as well as the additional associated operational costs. Generally, it is relatively easy for organizations to have inappropriate expectations regarding availability targets. It is also easy for organizations to demand higher levels of availability than they are actually willing to pay for before the cost implications are understood. The cost implications of most availability solutions include, but are not limited to, the following: Hardware Software Network infrastructure Training Serviceability and support Operational costs
10
Hyper-V virtualization and availability

As server virtualization technology and supporting hardware have matured to enterpriselevel reliability, performance, and functionality, businesses are moving more and more of their critical applications, such as SAP applications, to virtualized environments. With this move, new storage and IT requirements as well as opportunities to significantly improve overall application availability in planned or unplanned scenarios emerge. Although a full discussion on virtualization is outside the scope of this document, various virtualization techniques for high availability will be introduced as appropriate. Microsoft Virtualization provides a new way to install SAP applications on a physical server using the Windows Server 2008 R2 Hyper-V role. Rather than using a physical server for each application, multiple applications can be consolidated to a single physical server onto individual virtual machines (VMs) in a virtual environment. With this setup, clustering techniques can be used to provide high availability. However, with virtualization the requirement for maintaining the highest level of availability is even more important. This is because a physical server in a Microsoft virtualized environment typically holds many virtual machines. Therefore, if the server was to fail, many applications would fail with it. To deal with this issue, several new solutions for virtualized infrastructures have been developed based on existing methods to reduce downtime. These virtualized high availability solutions discussed later in the paper include: Unplanned downtime solution: The VM unplanned downtime scenario is still addressed by Windows Server Failover Cluster (WSFC). The only difference exists in the virtualization layer where the agents can now migrate the VM VHD files and then restart the VM on a new server after a failover has occurred. Planned downtime solution: The implementation of Live Migration has significantly improved the planned downtime scenario as maintenance downtime can now be avoided altogether. Logical errors solution: Logical errors that occur inside a VM such as unintended file deletion and data corruption are addressed with the Hyper-V snapshot feature. Disaster recovery solution: Disaster recovery solutions for Hyper-V now incorporate storage replication and WSFC in geographically dispersed installations.
Guest clustering Also available with Hyper-V is the ability to configure a WSFC between two VMs so that the cluster service runs in the guest operating system. One advantage of this configuration is that it provides the ability for an entire test lab for cluster services to exist on one physical server. Because only one physical server is required, this configuration would reduce costs. While the VMs on a guest cluster could feasibly be located on a single physical server, this setup would create problems if the high availability of an application inside this cluster is important. Since the application cannot survive the failure of a single server if both of the VMs in a guest cluster reside there, a configuration with the VMs located on two physical servers is required for high availability. Please note that when using guest clustering, the type of storage used for the cluster disks is restricted to iSCSI. The following figure shows the configuration of a Hyper-V guest cluster on a single physical server and on two standalone physical servers.
11
Figure 1
More information about support for SQL Server in a guest cluster environment can be found at: http://support.microsoft.com/kb/956893 A detailed description for how to configure a Hyper-V guest cluster can be found at: http://blogs.technet.com/b/mghazai/archive/2009/12/12/hyper-v-guest-clustering-step-bystep-guide.aspx
12
SAP Architecture and Requirements

High availability always requires a comprehensive analysis of potential risks and the implementation of appropriate measures to protect against those risks. Technical solutions that protect the application against a loss of critical hardware resources require detailed knowledge about the workflow and requirements of the application itself. The following section describes the general SAP infrastructure architecture options and the basic protection requirements provided by the architecture against the loss of availability.
SAP NetWeaver and its components

As shown in the following figure, SAP NetWeaver is an application and integration platform that consists of several individual components. In the most cases, the SAP Application Server (AS) is the technical base for the individual functions of SAP NetWeaver.
Figure 2. SAP NetWeaver framework
For user and information integration, SAP NetWeaver uses the SAP Enterprise Portal (EP) and SAP Business Warehouse (BW). Data is also integrated by the SAP Master Data Management (MDM). By using the SAP Mobile Infrastructure, user integration can be extended to wide variety of remote devices. Process integration is performed by SAP Process Integration (PI), formerly known as SAP Exchange Infrastructure (XI). Enterprise service architectures are made possible by the integration of people, information, and processes, and are the foundation of a new breed of applications. Composite applications are composed from a variety of individual functions already available in the application infrastructure, and demonstrate how to develop faster and more flexible solutions for future business requirements.
13
A downside of this flexibility is that it increases the dependency on a greater number of components in the infrastructure in order to make a service available. Note: Because of the composition of enterprise services to business applications, all service providers in use must fulfill the same level of protection in order to make the composite service highly available. The following figure shows the SAP NetWeaver platform from a technical perspective in order to show how high availability could be implemented. Besides the SAP AS for the Enterprise Portal, Master Data Management, Business Warehouse, Process Infrastructure, or Mobile Infrastructure, there are also standalone engines and surrounding support systems. While the Internet Transaction Server today is mostly replaced by the Internet Communication Manager, a component of the SAP AS, there are often standalone gateways for RFC communication or the TREX search engine. Typically, SAP NetWeaver is supplemented by an installation of the SAP solution manager, a SAP NetWeaver administrator, and the SAP NetWeaver development environment.
Figure 3. SAP NetWeaver software development environment
SAP Application Server architecture

The SAP AS is the technical base for SAP applications. While the former R/3 AS only supported the ABAP programming language, the SAP AS supports Java as well. The AS delivers the transactional power for business applications and must be extremely stable, scalable, and secure. Since the application servers are used as the execution
14
layer for the business logic coded in ABAP or Java, they are required for the fulfillment of the business process which in turn creates high availability requirements. Solutions for optimized availability are supported by the SAP AS architecture, but always depend on additional components such as redundant servers and monitoring, and control processes that are typically in high availability clusters. Before going deeper into the SAP AS architecture, the general features should be discussed. All application servers consist of at least of one central database and a central SAP instance that provides unique services for the SAP system. If more transactional performance is required by the SAP system, additional application servers can be added to the SAP system. A SAP system that is identified by a unique System Identifier (SID) might consist of many SAP instances and the common database. Depending on the type of application, a SAP AS can be installed for ABAP, Java, or for both workload types as shown in the following figure:
Figure 4. SAP Web application processing options
ABAP system architecture The layout and structure of SAP AS 7.00 has changed from version 6.40. The following figure shows how the structure of a pure ABAP system was used up to SAP AS, version 6.40.
15
Figure 5. A pure ABAP system using SAP AS version 6.40
This figure shows two instance types including a central instance and one or more dialog instances. Processes like Dialog, Batch, Update, Spool, or the Dispatcher process exist many times in a SAP system and are therefore redundant. Each installation of an ABAP instance also has one gateway process configured that is used for communication through the Remote Function Call (RFC) protocol. Also, each instance has its own Internet Communication Manager (ICM) process for HTTP-based communication. The Internet Graphics Server (IGS) only supports the creation of bitmaps for browser-based clients. To register all the instances of a SAP system and to support the communication between the various components of a distributed SAP system, a single message server is configured in the central instance. Also specific to the SAP system is the central Enqueue server that manages the lock entries in a distributed SAP system in a lock table inside of the shared memory of the server. Because of these two unique processes, the term central instance was used for this installation. A central instance is the lowest work unit of the SAP system and the performance can be extended by adding an additional AS. When looking closer at the directory structure of this SAP system, the installation of the SAP AS 6.40 is demonstrated in the following figure.
16
Figure 6. SAP central instance
All profiles and executables of a distributed SAP system are made available from the central instance to all dialog instances through the share SAPMNT. In order to support a simple patch process for executables, there is one master copy of the executables on the central instance. Any time a dialog instance starts the SAP utility, SAPCPE checks for the availability of a newer executable version. When available, this executable is copied to the AS local runtime directory before it is used. Changes in the SAP system ABAP reports are distributed by using the transport system. SAP systems can be configured to be a member of a transport domain. For each transport domain, there is one directory that is shared by all members of the domain. The directory is: <Drive>:\usr\sap\trans. Because of the central character of this shared directory, it can be considered a single point of failure for the operation of more than one SAP system. With the introduction of SAP AS 7.0, there was a major change in the layout of the central instance. Similar to the structures in pure Java systems, the unique Message and Enqueue server processes have been moved to a separate SAP instance: the ABAP System Central (ASCS) instance. Therefore, no typical AS has more system wide functions. The following figure shows the SAP landscape simplification:
17
Figure 7. Simplified ABAP system setup using SAP AS version 7.0
Subsequently, the file system of an ASCS instance would look like the following figure:
Figure 8. ASCS instance directory structure
18
SAP installations of SAP AS 7.0 consisting of an ASCS instance and a dialog instance will continue to use the name format D<instnr> for the instance directory. This combined installation structure is shown in the following figure:
Figure 9. ASCS instance and dialog instance
Regardless of this combined installation structure, the Enqueue and Message server processes are now in the ASCS instance. This naming convention was not changed because of compatibility reasons with older versions. Dual-stack system architecture With the introduction of J2EE as a possible SAP system component in version 6.40, SAP AS can be installed for ABAP, Java, or for both types of workloads. There is a considerable difference in architecture between ABAP and Java platforms as seen in the following figure:
19
Figure 10. A dual-stack system using SAP AS version 6.40
As shown in this figure, both the ABAP and the Java part of the SAP AS have their own Message and Enqueue server as critical components. The Java AS is primarily made up of Java server processes. The software deployment manager (SDM) is used for the installation and management of software versions. The server operating system must also have a Java development kit (JDK) installed to configure the Java virtual machine (JVM). The JDK for Windows is available for Windows through Sun Microsystems. While in ABAP Applications Server version 6.40, the Enqueue and Message server are still a part of the central instance: The Java AS always uses the system central services (SCS) instance concept. This means that every 6.40 dual-stack system must, at a minimum, consist of two instances. As with the pure ABAP configuration, the hybrid system still has a central database that divides the respective application data types by using a schema. In the hybrid structure, the ABAP and Java functions are shut down simultaneously as if a single instance. Both instance parameters are also configured in a single instance profile. The Java SCS instance in this installation is a complete unit and has its own profile. It can be started or stopped independently. For the purpose of maintaining distributed installations of SAP instances, all profiles and executables of the SAP system are shared on one central instance on a network share. This server is typically the server that holds the central instance or the SCS instance. All together, the dual-stack directory structures are naturally more complex than a pure ABAP or a Java instance. The SAP AS 6.40 dual-stack system structure is shown in the following figure:
20
Figure 11. SAP AS 6.40 dual-stack file structure
As seen previously with the pure ABAP AS, the system structure for dual-stack systems became simpler with the introduction of the SAP AS 7.0. The only difference between the typical identical system instances is the Software Deployment Manager (SDM) that is installed only in one instance. The SDM is required to install and patch Java programs and is only needed when new programs are installed or during software maintenance. Therefore, there is no need to configure the SDM in a cluster solution. To secure SDM service availability, a backup copy can be installed on any AS when needed. As with SAP AS version 7.10, the SDM will be completely removed from the installation and replaced with a new Java Support Package Manager (JSPM) function. The software maintenance functionality will then be an integrated part of every AS and this function would be redundant. The following figure shows the SAP AS 7.0 dual-stack system structure. As shown in the figure, the ABAP Message server and Enqueue have now been moved to a new, separate ASCS instance that simplifies the dual-stack SAP AS setup.
21
Figure 12. SAP AS 7.0 dual-stack system structure
The following figure shows the SAP AS 7.0 dual-stack directory structure with an ABAP, Java, and a SCS instance. In this file system layout, there is a clear distinction of the different components described.
22
Figure 13. SAP AS 7.0 dual-stack directory structure
Java system architecture In addition to the installation variant for ABAP or dual-stack systems, there is a third way to install pure Java systems. In this case, there is no difference between the SAP AS versions 6.40 and 7.0. This configuration typically uses a SAP Web dispatcher or a hardware load balancer to distribute the HTTP connection load. The Java system application servers are all constructed in the same way with the addition of the Software Deployment Manager (SDM) on one instance. This functionality, however, is removed in SAP AS version 7.10 and will no longer appear in subsequent Java system versions. The following are the three main components of a Java system: The central Enqueue and Messenger services used by all Java instances Java and dispatcher processes to handle the workload A central database for persistent data storage
These components are shown in the following figure:
23
Figure 14. Java system main components
Multiple J2EE instances placed on several physical servers create a Java cluster. The basic rule is that a Java instance can only be configured once per physical server. At a minimum, the Java instance must consist of at least one Java server process and a dispatcher, but can also have multiple Java server processes. The central SCS instance might also be put together with a regular Java instance on one physical server. Similar to the ABAP installation, the profiles and executables of a distributed Java system also reside on one physical server and are shared there. Because of the central character of these files, this server is the server that holds the SCS instance of the SAP system. SAP system single points of failure Single points of failure are SAP system elements that are critical in order to operate a system and must be protected against high available SAP system loss or failure. Single points of failure assessment As mentioned previously, every SAP system has the following central components that are required to be available at all times: A central database for data storage one per system Separate message servers and Enqueue servers for ABAP and Java systems The SAPMNT-share for profiles, executables, and Java Secure Store files of a SAP system. There is one SAPMNT-share per SAP system.
The purpose of the database is to provide persistent data storage for SAP system data and the runtime environment. Databases work with a series of internal mechanisms known as ACID (Atomic, Consistent, Isolated, and Durable). These mechanisms ensure
24
data consistency at all times. For example, there is a mechanism that logs all changes executed during a transaction. If a database operation fails in the middle of a transaction, the logs are used to restore the previous condition. The transaction logs can also be used to reapply transactions to a database image. For example, a database image restored from a backup would not reflect the latest transactional state since the transactions have most likely been executed after the backup was created. The latest transactions would be lost due to the restore if there was no transaction log available to reapply them. Databases are central application components and are often protected by high availability clusters or other technologies like Microsoft SQL Server 2005/2008 R2 Database Mirroring. High availability clusters use the same database image that is accessed from two servers (shared disk) for server redundancy. Database mirroring, on the other hand, is able to maintain a physically independent copy of the critical data. The main purpose of all these technologies is to protect the database service against loss since it is the most critical component of a SAP system. The SAP System Message Server registers all SAP system instances and load balancing user demands by connecting new users to the most available server in the system. Existing connections will remain intact if a message server goes offline, however, no new connections can be made by that server. This makes the Message Server an ideal cluster solution candidate. The Enqueue Server is part of the SAP lock concept. The purpose of the SAP lock concept is to synchronize data access in order to protect the consistency of SAP data objects. This is one of the most important functions of a SAP system. It keeps SAP data consistent by not allowing two users to make changes to the same data object at the same time. Instead, the data would be locked for the first user. The Enqueue Server in the following figure consists of a work process and a lock table in the shared memory of the server that is used to store the lock information for an entire SAP system. The Enqueue work process is needed in distributed systems to insert or verify lock information on behalf of the dialog instances. Local work processes can directly access the lock table and do not need this Enqueue work process. If, however, the lock table is lost by a server failure, lock information can no longer be verified. In a distributed system, this would create a transaction reset and roll-back of all pending transactions, even on dialog instances that would normally resume working, and all session contexts would be lost. An example of a SAP AS 6.40 ABAP with a single point of failure (SPOF) is shown in the following figure:
25
Figure 15. SAP AS 6.40 ABAP SPOF
This figure shows only the critical SAP AS components and is therefore, not complete. Another critical point resides in the file systems and network shares of the SAP installation on Windows. It is important that the SAP system executables and profiles are always installed with the central instance or SCS instance in newer systems. Access to these files is provided through the SAPMNT share that is present only once per SAP system. Executables available on this share are copied to the local machine before an instance starts through the SAPCPE SAP program. This is done to improve the stability of the SAP instance. However, the profiles are only read through this share. The following figure shows the infrastructure of two servers: Server Alpha has the central instance and Server Beta is a SAP application server. Server Alpha hosts the central instance and therefore the SAPMNT share. Both instances have the share SAPLOC that is used to access the local environment of a SAP instance. Both servers have two environmental variables: SAPGLOBALHOST and SAPLOCALHOST. The UNC names \\SAPGLOBALHOST\sapmnt and \\SAPLOCALHOST\saploc were derived from these variables. These names are used in the SAP kernel to search the SAP system profiles and system directories. Server Alpha has both variables set to the name of the local server so all access points are local. However, Server Beta is directed to the central server when accessing SAPGLOBALHOST. SAPLOCALHOST is used for all instance specific operations and therefore is accessed again through a local path.
26
Figure 16. Critical SAP AS components
The mentioned directory structure and the SMPMNT share must be protected within a high availability solution because of their central significance for the SAP system. Since the access to the UNC path is derived from the variable SAPGLOBALHOST, these files are also called global files. Prior to SAP AS version 7.0, in all ABAP or ABAP + Java systems, the central instance was protected in a cluster. The reason was simple: It was not possible to separate the Enqueue and the Message server from the rest of the SAP central instance. Together with the central instance, the Enqueue server and the Message server, the global files and the SAPMNT share were implicitly protected in a cluster as well. With the development of the SAP Standalone Enqueue, it became possible for the first time in the SAP AS 6.40 to extract the central component Message server and Enqueue server into a single instance. By doing so, the cluster configuration for critical SAP services was significantly simplified. While in version 6.40, the SAP System Central Services (SCS) was introduced only for Java-based systems. SCS configurations also became available for ABAP-based systems with SAP AS 7.0. These configurations are called ASCS instances. Note: All high availability configurations of SAP systems today are based on the SCS instances, the protection of the SAPMNT share, and the GLOBAL files in a failover cluster.
27
One of the main benefits of this configuration compared to the protection of a complete central instance in older versions is the fact that only two relatively lightweight services need to be moved and restarted. SCS instances lead to shorter failover times and more stability in the cluster implementation. Since there are no SAP users connected to a SCS instance, the effect of a failover is also much smaller in the SAP system. Using the SAP Replicated Enqueue in addition to SCS high availability configurations enables enterprises to minimize application server interruptions. For more information, see the Measures to Avoid Unplanned Downtime section. The information below confirms which configuration is supported by which version. Up to version 6.40, the central instance is clustered. During an upgrade of an existing 6.40 central instance to 7.0, the established architecture remains intact. SAP has documented the migration steps to support the new ASCS structure in SAP note 1011190. When initially installing SAP AS 7.0, only the SCS/ASCS instance will be clustered. Since the SAP AS 6.40 SR1 release, only the SCS is clustered. No changes are needed to upgrade to 7.0. Since the SAP AS 6.40 SR1, the Java SCS instance together with the ABAP central instance is clustered. With the new installation of SAP AS 7.0, only the ASCS instance and the SCS instance are clustered.
Pure Java systems:
ABAP + Java systems:
SAP Enqueue process special requirements The dependency on the single lock table was not completely resolved by the introduction of the SAP Standalone Enqueue. To address this issue, the SAP Standalone Enqueue was combined with a SAP Replicated Enqueue on another server. The following figure shows the new system design with SCS instances to host the central and critical SAP system services. There would be one for ABAP and one independent SCS instance for Java. Any other component of the SAP System, such as the SAP AS that handles the user workload, would not be considered critical because they are implicitly redundant if more than one is configured in the system. Because of this redundancy, only the SCS instances require high availability protection measures.
28
Figure 17. New system design with SCS instances
If one combines the SAP Standalone Enqueue in the SCS instance with a SAP Replicated Enqueue running on a second server, one can continuously replicate the lock table. In a larger SAP system with several SAP application servers, an operation with minimal interruptions is provided. This is provided even when the central services must be transferred to another server due to a hardware failure. The SAP Replicated Enqueue can only be used for lock table replication and cannot function as a regular Enqueue server of a SAP system. The lock table in the SAP Replicated Enqueue, which holds the replicated lock entries, cannot be used directly for the Enqueue service. During the process of a failover of the Enqueue server to the replication site, the standard Enqueue process is first started and a new, empty lock table is created. The replicated data in the shadow lock table is then read and transferred to the original lock table before the system is operational again. The following figure shows the configuration of several SAP application servers and a central instance in combination with a SAP Replicated Enqueue:
29
Figure 18. A SAP Replicated Enqueue
A SAP Replicated Enqueue should be combined with a high availability cluster solution. One reason for this is to enable the administrator to switch the Message server from one server to another in case of a severe failure. Another reason for this setup is that the SAP Replicated Enqueue is not a fully functional Enqueue server. Instead, it is only used to replicate the lock table. During regular operation, the SAP Replicated Enqueue only inserts lock requests into a standby lock table on a second server. In case the original server dies, the normal Enqueue Server needs to failover to this server and resume work with this replicated lock table. Additionally, high availability cluster solutions are also used to protect the database against hardware failures. Since Message and Enqueue servers in the SCS instance have very little resource requirements on a server within a high availability cluster, it is possible to install additional local application servers. In this context, local means that they are not managed by cluster management and are lost by a failure of the respective server hardware.
SAP standalone engines

The typical AS for all kind of SAP applications is the SAP AS. One of the benefits of the SAP AS is the common architecture that can be reused to provide the runtime environment for the majority of requirements. There are, however, a few special requirements that require a more tailored design or special software components. In SAP terminology, those engines are called the SAP standalone engines.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The SAP Web Dispatcher
30
The SAP Web Dispatcher is a Software Load Balancer for HTTP or HTTPS connections. Typically, it will be installed in a demilitarized zone (DMZ) between the SAP backend systems and public Internet access. Connection requests from the Internet will be passed by the SAP Web Dispatcher to the available SAP system AS in a circular way. The routing algorithms are used to review the capacity and load on all the various instances to determine which server to connect to. With ABAP instances, the number of configured dialog work processes will be evaluated. With Java instances, the number of available server processes determines which server gets the next connection request. The architecture of the SAP Web Dispatchers corresponds with the SAP Internet Communications Manager (ICM) which is a component of every ABAP instance. While an ICM forwards incoming connection requests directly to a dialog work process of an ABAP instance or to the Java dispatcher of a dual-stack installation, the SAP Web Dispatcher passes those requests first to the respective ICM of a SAP instance which in turn further processes the request. The SAP Web Dispatcher basically acts as a software router for the incoming HTTP requests. Because of their central function for Internet communications, the SAP Web Dispatcher is also a critical component in a system landscape that needs to be protected against hardware failures. Since the SAP Web Dispatcher looks like a SAP ABAP instance, it can be integrated relatively easy into a high availability cluster solution and therefore be protected against hardware failure. The typical structure of a SAP landscape using a Web Dispatcher is shown in the following figure:
Figure 19. SAP landscape using a Web dispatcher
SAP on Windows Server 2008 R2 - High Availability Reference Guide In contrast to most of the high availability installations, the installation of a SAP Web dispatcher in a cluster is not supported by SAPINST. SAP note 834184 provides the steps to manually configure a WSFC for the SAP Web Dispatcher in detail. SAP notes can be downloaded from the SAP Service Marketplace at https://service.sap.com. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. Additional SAP Web Dispatcher administration information is available at: http://help.sap.com To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP Web Dispatcher link.
31
SAP standalone gateway The SAP gateway enables SAP systems and external programs to communicate with one another. The protocol for the communication is the Common Programming Interface Communication (CPI-C) which is also used by the Remote Function Call (RFC) interface. Subsequently, all RFC connections in a SAP system rely on the SAP gateway process. By default, each SAP AS has one gateway process configured. In certain cases, it is also possible to configure a standalone gateway process. One example would be to configure a standalone gateway for the System Landscape Directory (SLD). As the SLD is a component of the Java AS, the standalone gateway acts as a bridge to allow ABAP systems to read and write data per the RFC in the SLD. Another typical use case is the installation of a standalone gateway on a single database instance with no ABAP engine. In this case, the gateway is needed in order to make the database calendar (Transaction DB13) functional. This configuration is also described in SAP note 657999. In order to configure a standalone gateway in a failover cluster, it is very simple to add the gateway to the Enqueue and Message server process of a SCS instance. This configuration is described in SAP note 1010990. TREX TREX is an abbreviation for Text Retrieval and Extraction and is a search engine designed to search for structured and unstructured data. TREX provides SAP applications with numerous services for searching, classifying, and text mining in large document collections or unstructured data. In addition, TREX provides SAP applications with services for searching and aggregating business objects or structured data. This search engine is used as a standalone engine in combination with the SAP Enterprise Portal (EP) or the Knowledge Management (KM) application. For access to the TREX search engine, each SAP AS has ABAP or Java components that support the communication with the engine. The most simplified form of a TREX installation is shown in the following figure:
32
Figure 20. Basic TREX installation
TREX is one example of a SAP solution that does not rely on a standard SAP AS, but is run on special server architecture. TREX installations can also be implemented as master/slave configurations spanning several physical servers. Additional information about the distribution and implementation of a TREX engine is available at http://service.sap.com/instguidesnw70. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this information after logging on to the site, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the Search and Classification (TREX) link.
General information about TREX is available at http://help.sap.com. SAP liveCache SAP liveCache is a component of the Advanced Planning and Optimization (APO) application that supports the SAP SCM solution: an application for supply chain management in the mySAP suite. SAP liveCache is a memory resident database for rapid access. The foundation of this technology is derived from the SAP MAXDB, formerly known as SAP DB. In addition to this memory resident database, each APO system has a normal database for the APO data and programs. In order to access data
SAP on Windows Server 2008 R2 - High Availability Reference Guide objects in the liveCache rapidly during operation, those objects are loaded into the liveCache at startup. A special logging mechanism writes savepoints to the disk every few minutes that does not reflect the transactional state of the system.
33
APO systems consist of a SAP AS and a liveCache as standalone engine. From the perspective of high availability, there are two solutions possible to protect the liveCache: A failover cluster for the APO system and the liveCache. LiveCache is supported in the WSFC as of SAP NetWeaver 7.0 SR1. A hot standby liveCache where the database log files are exported to a standby server and constantly applied to a database in recovery mode. This log shipping solution works with two independent servers that do not share common storage.
Additional information about the installation of SAP liveCache and cluster configurations in WSFC is available at http://service.sap.com/instguidesnw70. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. In addition, see the following SAP note about the configuration of liveCache in WSFC at https://service.sap.com/notes. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 780795: SAP liveCache 7.5: WSFC Installation
General information about the administration of the SAP liveCache is available at http://help.sap.com. To access this information after logging on to the site, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP liveCache Technology link.
SAP Content Server The SAP Content Server is an independent server instance for temporary data and Web documents that can be requested by the SAP AS through the Internet. By using the Content Server, large document volumes can be maintained for cached access. A SAP Content Server can be installed together with a SAP AS on a physical server or as standalone instance. It is possible to install this server with or without its own database. When installing this server with its own database, MAXDB is typically used. The simultaneous use of a SAP AS database and a Content Server is not supported. In order to protect the SAP Content Server against loss, it can be configured in a failover cluster.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The following SAP notes located at https://service.sap.com/notes provide information about the installation and clustering of the SAP Content Server. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 175096: SAP Content Server installation guide SAP note 1039401: SAP Content Server Clustering with Windows 2003
34
To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the SAP Content Server link.
Additional SAP Content Server administration information is available at: http://help.sap.com.
35
Unplanned Downtime Avoidance Strategies

Unplanned downtime is one of the biggest concerns in any IT operation. Especially with the dependencies companies have on SAP applications, downtime of such services at undesired times would block the normal execution of work and could generate severe financial impacts due to lost revenue. The reason for unplanned downtime is most often a failure in hardware components that affect application service availability.
Hierarchy of high availability solutions

Unplanned application interruptions often occur simultaneously with hardware resource loss which is important for the application operation. In addition to the importance of resource redundancy, supervision and control of application operations is critical for high availability as well. For highly available SAP system operation, several solutions have evolved that either independently or in combination assure better protection against the unplanned downtime of a SAP application. The following figure shows the technical solution hierarchy:
Figure 21. Technical solution hierarchy
Securing a SAP application against interruption due to the loss of hardware resources generally requires applying several techniques. Applying these techniques will lead to better application protection.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Data storage protection
36
The lowest level of the high availability hierarchy manages the way data is stored and made available in a secure and reliable way. Data storage protection is typically widely implemented in a SAP data center. Most of the storage devices offer some level of protection by default. However, storage subsystems still have a number of challenges for a SAP data center. First, the amount of data grows rapidly over time. In addition, the data needs to be constantly protected to prevent hardware failure and data access performance loss that directly relates to the overall SAP system performance. SAP system performance is typically measured by user transaction response time. SAN infrastructures A Storage Area Network (SAN) provides a centralized approach to maintaining the storage resources needed in a computer system. Traditionally, Direct Attached Storage (DAS) has been used for the computer system local storage requirements. The use of DAS has high space requirements and administration costs. By centralizing data storage into a scalable, network type architecture, administrative costs are lowered, and space is managed more efficiently. SANs can be built on Fibre Channel connections using fiber optic cables and built on the SCSI protocol for block-oriented data transfer. In addition, iSCSI devices are now available that use normal TCP/IP networks for the transport. The SCSI protocol for the data transfer is packaged into the TCP/IP transport. From the high availability perspective, the use of SAN infrastructures in data centers is recommended. Redundancy of physical disks and protection against the single disk failure is maintained in the storage subsystem itself and follows the hierarchical approach shown at the beginning of this section. Depending on the vendor and the type of storage subsystems in a SAN, even data replication over larger distances can be achieved with SAN-based storage. Additional information on the concepts of the SAN infrastructure for highly available Windows systems can be found in the Server Cluster: Storage Area Networks white paper by searching for the title at http://technet.microsoft.com/en-us/default.aspx. Multipathing Data storage protection against unplanned downtime always includes connection protection between a server and its storage. If there is only one storage subsystem host adapter and subsequently only one storage cable connection, any host adapter, cable, or controller failure in the storage array would create an application interruption. The use of a WSFC could help protect the server components such as the host adapter. However, it is preferable to avoid connection failovers in a cluster. These failover types can be avoided by using a redundant host adapter and two cable connections to a storage device that in turn has two storage controllers. This configuration is called multipathing. The Windows operating system supports multipathing through the MPIO driver. Additional information for MPIO configurations is found in the Multipathing and the Microsoft MPIO Driver Architecture white paper at: http://download.microsoft.com/download/3/0/4/304083f1-11e7-44d9-92b92f3cdbf01048/mpio.doc
SAP on Windows Server 2008 R2 - High Availability Reference Guide Server protection
37
Servers host the individual components such as the SAP instances and services that compose a SAP system. The server role and importance depends on its function. A database server for a productive SAP system typically has the highest requirements in availability, stability, and performance. While SAP specific solutions are discussed later in this paper, there are a number of general server recommendations that incorporate high availability. With high availability, redundancy is the method to protect servers against downtime. Inside of a server this could mean that the server has two independent power supplies with two power cords. Of course, each power cord needs to supply enough energy to sustain the operation in case the other one fails. It might also include redundant host adapters for storage or network access. Finally, a conceptually well designed system with hot pluggable components is always valuable. However, there are server components that cannot be easily configured to be redundant. Main memory or CPUs are examples of these critical components as well as the server operating system that also exists only once. There are two solutions that are typically used to address this. One solution would be to use fault tolerant systems built to recover from memory or CPU hardware failures. However, the disadvantage to this solution is the limited performance range and higher prices. The second solution for protecting servers against failures is high availability clusters like the WSFC. With WSFC, two or more servers share storage subsystem access and can take over the storage volumes and restart applications automatically in case a server fails. This concept even maintains redundancy at the operating system level as each server has its own operating system. However, clusters depend on additional software components and need a proper configuration and a change management policy. We will discuss the possible cluster implementations with SAP applications later in this section. Network high availability Networks are the backbone of all corporate communication, both internally and externally. The SAP application network implementation has multiple communication layers based on different functionalities including: A server network that interconnects SAP application servers and the database server. A client network for local users using the SAP GUI or a browser. A demilitarized zone for connection to the public Internet. A provider for access to the public Internet.
Again, component redundancy is the key factor for high availability solutions. However, the architecture of a real implementation reflects additional considerations. For example, public access to the Internet immediately raises security concerns and has more requirements than the internal and isolated server network. While in the server network, besides availability, performance might be another issue. The following figure shows the different SAP network aspects.
38
Figure 22. Different SAP network aspects
Important considerations for highly available SAP networks include: Router and switch redundancy: The server network is redundant up to the used routers or switches. Redundancy can be accomplished by network teaming of Network Interface Cards (NIC). Routers need to monitor each other and take over the functionality of a failed device. Client separation: Clients are typically not connected in a redundant way. However, there needs to be a separation of clients between different switches so that only a single client group can be affected by a hardware failure. DMZ redundancy: In the DMZ, there is typically a redundant SAP Web dispatcher configuration or hardware load balancers. Internet access redundancy: Redundant access to the public Internet is necessary.
Additional information about the SAP systems high availability network requirements can be found in the SAP Help pages at http://help.sap.com. To access the information, do the following: In the left menu pane, click SAP NetWeaver. Under SAP NetWeaver 7.0 Library, choose English. Search for Network High Availability.
A description of the SAP landscape and SAP system network requirements can be found at http://sdn.sap.com.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Application specific configurations High availability clusters are a classic solution for protecting SAP applications against critical hardware resource failure. In the Windows-based SAP installations, SAP supports the database and the SAP SCS instance installation in a WSFC.
39
SAP components can be installed into the cluster by using the SAPINST SAP tool. In the simplest configuration, the cluster consists of two servers and a storage subsystem that the application components are installed on. The storage subsystem has to be accessible by both servers. A SAP system with a SQL Server database is shown in the following figure:
Figure 23. SAP system with a SQL Server database
Each of the cluster nodes has its own local operating system with the SQL Server engine installed locally. Each node must be capable of accessing the external storage subsystem where the applications components are installed. Supported storage systems include Serial Attached SCSI (SAS), Fibre Channel, or iSCSI-based systems. Every WSFC cluster needs to maintain a copy of the cluster database that contains cluster configuration information. This information determines which cluster node can take ownership of the cluster resource group for the SAP application and database in case the communication between the nodes is interrupted. When two servers compete for the cluster resource group, this is known as Split-Brain syndrome and can generate a deadlock. In the simple cluster configuration shown in the previous figure, the cluster database is stored on each node. If the cluster uses a Disk Witness, the Disk Witness will also store
40
a copy. Applications that are protected in a WSFC cluster are configured in cluster groups. A cluster group contains the application resources like the shared disk storage volume that contains the SAP installation file system. In the case that a cluster group needs to be transferred from one server to another, such as during a hardware failure, these resources must become available on the second server before the cluster service can start the application there. Cluster resources can be configured to handle dependencies on other resources. For example, it makes no sense to start a SAP SCS instance before the SAP system database is available. For the exchange of status information between the members of the WSFC cluster, a private network is required. Since the status information that is periodically sent out is similar to a heartbeat, the network is called the cluster heartbeat network. Every SAP application network connection in the cluster is assigned a virtual IP address that is activated on a server by the cluster service when starting the SAP cluster group. While the virtual IP addresses are activated only on the server that runs the application, all network cards also have configured local IP addresses that are permanently assigned. Additional information about Windows Server 2008 R2 failover clusters can be found at: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
Simple cluster for a single SAP system

High availability solutions should be as simple and controllable as possible. To simplify and reduce dependencies of SAP applications in a cluster, the System Central Services (SCS) instance was created. This instance only contains critical SAP system components. The simplest variant of a SAP system in a high availability cluster is a two-node cluster with a SAP SCS instance on one side and a database on the other. In case of a failure, the respective surviving server must be able to start up the cluster group it took over in addition to the existing cluster group running on this server. This means when determining the sizing of the WSFC nodes, enough CPU power and main memory must be available to run both cluster groups simultaneously at any time. Under normal circumstances, however, the cluster groups are distributed on the cluster nodes. Since SCS instances are not used for the transactional load of a SAP system, there must be additional SAP application servers installed inside the cluster or on additional servers outside the cluster. If such SAP application servers are installed on WSFC nodes, these application servers are not configured in a cluster group and will not switch to another WSFC node in case of a hardware failure. The proper server sizing for the additional application servers is a bit more complicated as these resources also have to be taken into consideration. The following figure shows a SAP system with the SCS instance, the database in the WSFC cluster, and additional SAP application servers on separate servers.
41
Figure 24. Simple cluster for single SAP systems
The installation of a SAP system in WSFC cluster solutions is described in the SAP Installation Guide for the respective SAP NetWeaver release at: https://service.sap.com/instguides A user name and password are required to access this Web site. To access this information after logging on to the site, do the following: From the left menu, open SAP NetWeaver. Select SAP NetWeaver 7.0 (2004s). Select Installation. Select Installation Guide - SAP NetWeaver 7.0 SR3 or Installation Guide SAP NetWeaver 7.0 SR2. Select Windows and the installation type (ABAP, ABAP + Java, or Java).
There are a number of SAP notes that provide additional information about WSFC related issues at https://service.sap.com/notes. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password.
SAP on Windows Server 2008 R2 - High Availability Reference Guide The following table lists the most important notes:
SAP note 106275 139513 779253 941092 962955 967123 1010990 1011190 1043592 1172679 Description Availability of SAP components on Microsoft Cluster Server Merge transports for high availability systems Clustering your Java Add-In Systems on Windows MSCS: Post-Upgrade Steps for systems upgraded to NW 7.0 SR<x> Use of virtual TCP/IP host names SAP NetWeaver 7.0 / Business Suite 2005 SR2: Windows Configuring a Standalone Gateway in a HA ASCS Instance MSCS: Splitting the central instance after upgrading to 700 MSCS: Cluster Resource Monitor Crashes on W2K3 SP2 Troubleshooting MSCS Issues
Table 3. WSFC related SAP notes
42
Using multiple clusters for SAP instances and databases

A possible scenario for protecting multiple SAP systems in high availability configurations is to construct separate clusters for the SCS instances and the databases. However, the method chosen to protect the database does not have to be a cluster solution. SQL Server database mirroring would be an alternate solution. Since an SCS instance is a very lightweight service, the active server CPU utilization is typically minimal. In order to maximize the use of the available computer power, it is possible to install local SAP applications servers on each WSFC node. This installation type has been supported since SAP NetWeaver 7.0 SR2. A possible layout in a landscape with several SAP systems is shown in the following figure:
43
Figure 25. SAP instance and database cluster
The description of a local SAP application server installation inside of a WSFC cluster is the same as the standard SAP application server description. The SAP installation guides for NetWeaver 7.0 describe the setup starting with the version SR2. These guides are available at https://service.sap.com/instguides. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this information after logging on to the site, do the following: From the left menu, open SAP NetWeaver. Select SAP NetWeaver 7.0 (2004s). Select Installation. Select Installation Guide - SAP NetWeaver 7.0 SR3 or Installation Guide SAP NetWeaver 7.0 SR2. Select Windows and the installation type.
An overview about the supported WSFC configurations is available from SAP in the MSCS Configuration and Support Information for SAP NetWeaver 04 and the SAP NetWeaver 7.0 Systems white paper at http://sdn.sap.com/irj/sdn/windows.
44
SAP Replicated Enqueue

As described in the SAP architecture and the Requirements section, the Enqueue service or more precisely, the lock data in the Enqueue lock table plays an important role for the overall SAP system operation. Especially in distributed installations with additional SAP AS, the effect of losing the lock table always creates an abort of all pending transactions. In pure Java systems, the effect of losing the lock data is even more severe than in ABAP-based installations. Since the introduction of the SAP SCS instances for Java in SAP AS version 6.40 and for ABAP in SAP AS version 7.00, SAP Replicated Enqueue is an option to solve this issue. The SAP Replicated Enqueue requires the Standalone Enqueue for proper function. Since the SAP Standalone Enqueue is only used in the SCS instance, the installation of a SCS instance is required. The following figure shows two SAP systems where the SCS instance and the database of each system were each configured in an independent WSFC cluster.
Figure 26. SAP Replicated Enqueue
The regular Enqueue service and its lock table are on the server from which the SCS instance was started. The second server in the cluster has a SAP Replicated Enqueue in addition to the active database. Additional application servers are located on servers that are not in a cluster formation. All lock requests from the active Enqueue servers will be mirrored onto the Replicated Enqueue. In case of a severe SCS server hardware problem, the SCS instance will be transferred to the database server and started from there. During this process, the SAP Replicated Enqueue is stopped and the lock information from the mirrored lock table is copied into the new lock table of the regular server. Therefore, the SAP AS outside these clusters does not lose any information and their running transactions are not influenced. The SAP Installation Guide for SAP NetWeaver 7.0 SR2 and SR3 describe the cluster setup for the Enqueue Replication and the Enqueue Replication server installation.
45
SAP note 524816 gives detailed information about the SAP Standalone Enqueue. SAP note 804078 describes the concept of the SAP Replicated Enqueue and how it can be used to protect a SAP system. Attached to this note is also an installation guide for the Enqueue Replication server in a WSFC cluster. In addition, the SAP lock concept and high availability solutions are described at http://help.sap.com. To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Open Technical Operations Manual for SAP NetWeaver. Open Administration of Standalone Engines. Follow the Standalone Enqueue Server link.
Multi-SID cluster
A limitation of older Windows-based cluster configurations was that only one SAP system per cluster could be configured. The reason for the restriction was because of the SAPMNT share. Any access to the SAP system global files in a distributed installation have to use this share. Since the share is configured on the <Drive>:\usr\sap directory, there is only one unique location in the file system. Underneath this path, there is a <SID> directory that hosts all the data for a specific SAP system. The consequence of this structure is that if there is more than one SAP system installed on the server, the share would contain the global data for all SAP systems. Since this share has to be relocated to another server in the WSFC cluster in case of a failover, that operation would impact all SAP systems. Because of this, SAP does not support this configuration. A remedy for the described problem and restriction is resolved by using a new SAP installation method. With this method, the SAP system disks are linked with the <SID> directory under <Drive>:\usr\sap by using junctions. Junctions are similar to symbolic links in the diverse UNIX versions. They are a file system detour that allows access to a designed directory to be automatically transferred to another directory. For example, the following figure shows the principle setup of a WSFC cluster with three SAP systems with AAA, BBB, and CCC designations. Each SAP system has its own hard disk that can be accessed on shared drives from both the servers in the cluster. SAP system AAA and system BBB run on Server A and system CCC runs on Server B.
46
Figure 27. Principle junction architecture.
The SAPMNT share previously was configured as a cluster resource inside the cluster configuration. Now it is in the local operation system of the respective server. The share is stationary and is no longer managed through the cluster. Under the C:\usr\sap directory path are three directories: AAA, BBB, and CCC. These directories have been created in both servers. Depending on system type, the directories in the following table are created on the shared drives: SAP system type All system variants Java ABAP ABAP + Java add-in Shared drive directory \usr\sap\<SID>\SYS \usr\sap\<SID>\SCS<InstanceNr> \usr\sap\<SID>\ASCS<InstanceNr> \usr\sap\<SID>\SCS<InstanceNrJava> \usr\sap\<SID>\ASCS<InstanceNrABAP>
Table 4. Directories per system type
47
Next, all the junctions are created from the local hard drive of every server. To create junctions, the executable linkd.exe from Microsoft is available. The executable is a part of the Microsoft Windows resource kit. The syntax for the commands is: linkd.exe <Argument1> <Argument2> Depending on the system type, the arguments can be accessed from the following table: SAP system type All system variants Java Junction creation arguments <localdrive>\usr\sap\<SID>\SYS <shareddrive>\usr\sap\<SID>\SYS <localdrive>\usr\sap\<SID>\SCS<InstanceNr> <shareddrive>\usr\sap\<SID>\SCS<InstanceNr> ABAP <localdrive>\usr\sap\<SID>\ASCS<InstanceNr> <shareddrive>\usr\sap\<SID>\ASCS<InstanceNr> ABAP + Java add in <localdrive>\usr\sap\<SID>\ASCS<InstanceNrABAP> <shareddrive>\usr\sap\<SID>\ASCS<InstanceNrABAP> <localdrive>\usr\sap\<SID>\SCS<InstanceNrJava> <shareddrive>\usr\sap\<SID>\SCS<InstanceNrJava>
Table 5. Junction creation arguments
As seen in the following figure, after the sample clusters installation, the cluster groups AAA and BBB were then activated on server A, and CCC on server B. All the SAP instances file system accesses were redirected to the respective shared disk. The external access takes place as usual in the cluster through the cluster group virtual IP address. With this configuration, if Server A crashes due to a hardware failure, two things will happen. First, the shared disks of both applications AAA and BBB will be activated on server B. Next, the virtual IP address of cluster group AAA and BBB will be activated on server B. By using the junctions that point from the shared disk to a local hard drive of a cluster server, a client is able to resume its work as usual and can resolve all data. Clients who previously already have worked with the SAP application BBB on server B are not affected. The following figure shows the situation after the failover:
48
Figure 28. Junction configuration after failover
Using the junction configuration, all the SCS instances of a larger SAP landscape can now be configured as one cluster. In general, Multi-SID clusters can also protect the database instances. Because of the varying resource requirements of databases compared to a SAP SCS instance, the sizing could be more difficult. Therefore, a better design would be to place the databases and the SCS instances on two different clusters. The following figure shows this system structure:
49
Figure 29. Separate database and SCS instance clusters for simplified sizing
The database servers would have an additional SAP Standalone Gateway configured. This is required as a local service for administration. Finally, each of the database servers would also get their own SAPOSCOL service installed for performance monitoring. Multi-SID clusters demand a different approach during a cluster installation, but require no changes in the application operation. As a minimum requirement, the use of SAP SCS instances is required. For a pure Java system, it is already possible in version 6.40. With ABAP or dual-stack systems, version 7.0 must be employed. The installation of multi-SID cluster solutions will be described in a separate installation guide for the NetWeaver 7.0 SR3 release. SAP note 106275 describes how the SAP supports a multi-SID cluster for the SAP AS 7.0. The Multi-SID WSFC Installation for SAP NetWeaver 7.0 compact disc master is available at http://service.sap.com/swdc. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password.
SAP on Windows Server 2008 R2 - High Availability Reference Guide To access this information after logging on to the site, do the following: From the left menu pane, click on Download. From the left menu pane, select Installations and Upgrades. From the left menu pane, click on Entry by Application group. In the list that appears in the right pane, click SAP NetWeaver. From the right menu pane, select SAP NetWeaver. From the right menu pane, select SAP NetWeaver 7.0. From the right menu pane, select Installation and Upgrade. From the right menu pane, select Windows Server. Select MS SQL SERVER as the database, and then scroll down to the list of downloadable objects.
50
The SAP Installation Guide is available at https://service.sap.com/instguides. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. To access this guide after logging on to the site, do the following: Select Installation Guide - SAP NetWeaver 7.0 SR3. Scroll down to the Multi-SID Installation on Windows WSFC section. Select Windows as the platform and SQL Server as the database type.
Multi-node cluster
Besides the previous limitation of only one SAP system configuration per cluster, there was also a restriction in the number of cluster member nodes supported for SAP clusters. While Windows Server 2003 could support up to eight servers and Windows Server 2008 R2 could support up to 16 servers in a WSFC cluster, SAP only supported two-node clusters before SAP NetWeaver 7.0 SR2. Because these limitations no longer exist with NetWeaver 2004 SR2, multi-node clusters are now possible. However, if Replicated Enqueue is used, SCS must still be configured to run on two nodes. The following figure shows a cluster with three servers. Two of the servers actively run a SCS instance and the SAP system database while the third server is a backup in case an error occurs on either of the first two servers. With proper sizing of the main storage and the CPUs in the middle server, it is possible for both SAP instances to run in the middle server at the same time.
51
Figure 30. A cluster with three servers
Multi-Node clusters are supported with SAP NetWeaver 7.0 SR2 using the SAP installation tool, SAPINST. The installation of additional nodes in a WSFC cluster is described in the SAP Installation Guide.
SAP application servers

High availability through redundant hardware and automatic failover is standard when it involves the SAP system single points of failure. The SAP system architecture also has components, such as the SAP application server (AS), that provide redundant services inside of a SAP system if more than one is available. In a redundant configuration, more than one SAP AS exists in each SAP logon group. Any user connected to a SAP AS is affected if the server fails. All pending transactions are lost and the user will also be logged off of the SAP system. If there are additional SAP application servers in the failed SAP logon group, the user could immediately logon again. The SAP message server that routes connections during the logon process facilitates a load balancing mechanism that tracks which servers are reachable and have the least workload. The user would therefore be reconnected to the next available server in the group and could resume their work immediately. This limits damage to repeating the last failed transaction as would be necessary in a cluster solution as well. In general, every SAP system in a critical business application must have more than one SAP AS to be able to continue to work in case of a server error.
52
IT infrastructure protection
Applications always have a direct relationship to the servers in a data center. Across these servers necessary resources like CPU, memory or disk storage are made available. At the same time, applications and server operating systems are consumers of central IT services. These services include: Centralized backup processes File and print services Active Directory Deployment services Patch server Network services such as DNS or DHCP
Not only does the server that the application is running on need to be protected against interruptions, but all data center services and resources that are significant to application operations must be protected as well. This fact is especially important because the data center central services serve all applications and could cause an interruption on a larger scale than the failure of a single AS. For example, after a DNS service interruption, no name into an IP address resolution in a data center can be carried out. The following list contains some critical services that might require protection: DNS DHCP WINS NFS server Fileserver with SMB/CIFS Print server Authentication Time synchronization Backup functions Central monitoring service
Since there are many critical protection services, detailed discussion of these services is beyond the scope of this white paper. Additional documentation is available at http://technet.microsoft.com. If third party solutions are being used, the high availability discussion should incorporate the vendor perspective as well.
53
Hyper-V host cluster

In order to protect SAP Installations against unplanned downtime and to help minimize planned downtime in general, Hyper-V servers can be configured in a Hyper-V host cluster. This implementation consists of a WSFC configuration on the Hyper-V parent. A Hyper-V host cluster requires all VHD files for VMs to be stored on a SAN that is accessible from all the Hyper-V servers in a specific setup. In Windows Server 2008 R2, the WSFC was extended to manage VM failover in case of an unplanned physical server outage. The VMs on either server in a Hyper-V host cluster can be configured as highly available and become a cluster resource. To implement high availability for VMs, Hyper-V host clustering configuration is recommended. The following figure shows an example of a Hyper-V host cluster.
Figure 31. Hyper-V host cluster
Additional Hyper-V host cluster implementation information is available at: http://technet.microsoft.com/en-us/library/cc732181(WS.10).aspx.
54
Planned Downtime Minimization Solutions

Planned downtime refers to the period of time that an application is not available due to maintenance work or system upgrades. Contrary to unplanned downtime, the time period for this work is planned in advance and affected users are typically notified. Planned downtime is required for the following work: Hardware maintenance and resource extensions Offline backups Operating system and application patches Operating system and application upgrades Unicode migration Configuration changes that require restarts Deployments, transports and upgrades Failover or disaster recovery tests
Planned downtime is crucial for reliable and safe application operation in addition to computer systems and their supporting environment. By applying the recommended fixes for known software bugs, increasing the computer system resources, or testing the data center high availability solutions, the vulnerability of an application service against unplanned downtime is largely minimized. Contrary to this requirement, 24x7 application availability is becoming increasingly necessary and the time and frequency an application service is unavailable must be minimized. In fact, planned downtime in general has more to do with the application service unavailability than unplanned downtime. Reducing planned downtime should therefore be part of any high availability strategy. When determining possible solutions for planned downtime, it is important to consider the frequency that downtime occurs. If, for example, an offline database backup requires downtime once a week, it is a premium candidate for a technical solution that helps to eliminate this downtime. Many activities occur only once a month or annually such as an operating system or application upgrade. The less frequent the downtime, the higher the probability to find a convenient time slot where this work can be performed. In any case, proper planning and change management is one of the essential tasks to minimize and manage planned downtime. On the technical level, there are many options available to minimize planned downtime for dedicated tasks including the creation of backups and the operating system or database engine patching process. In any case, the appropriate hardware architecture in addition to proper planning is required to achieve this goal.
Planning ahead for minimizing planned downtime

One of the key components of each data center operation is the definition of the Service Level Agreement (SLA) that defines the operational hours and application uptime. Since applications are services that are provided by the IT department to the internal and external users in a company, the term SLA describes the expected availability. A productive system SLA example might appear as follows:
SAP on Windows Server 2008 R2 - High Availability Reference Guide Operating hours: 8:00 to 20:00 CST, 5 days per week, Monday through Friday. System availability: 22 hours, 7 days a week 99.5 percent annual availability during the defined operating hours
55
In the above example, the IT department would have a maintenance window of two hours per week. They would need to take additional measures to ensure that an unplanned downtime would not exceed 0.5 percent of the uptime or 22 hours. Change management strategy deployment Having a limited time for maintenance requires the IT staff to get the most out of the available time. Typically, the work flow during any maintenance action involves having a backup copy of the existing state, performing the required work, and testing throughout before the system is returned to production. Proper preparation is one of the key factors for success. Testing changes must occur first on test systems in order to verify the side effects of the change. This process generates information regarding the time requirements to perform this task. An additional benefit is that the IT staff learns about the required steps while working with the test systems. This also helps to minimize the downtime when the same work has to be done on the productive server. Another important task is planning ahead to have enough resources like disk storage or main memory for the future growth of the SAP system. By providing enough resources, the SAP system stability and quality of service improve and frequent shut downs for the installation of additional components can be avoided. In productive SAP systems, a common strategy is to inflate the required hardware resources at the start date of the productive use and maintain enough headroom for at least six months of growth. Any further extension should also reflect this principle. Besides adding resources, there are also strategies such as archiving that can be incorporated to minimize the storage requirements of a SAP database. Planning the operating system and application software maintenance is another operational aspect. It is essential to know the software vulnerabilities and install fixes in a timely manner. Typically, installing fixes needs to be synchronized and installed in a sequential manner. For example, test and QA systems are updated first to work out the installation issues. The production systems are updated only after the issues are resolved. The amount of security vulnerabilities in a system can be minimized by a process called hardening. Hardening a SAP system is configuring the SAP system with only the minimum platform functions that are necessary for operating the system. Additional information about IT landscape hardening can be found by searching for the SAP Hardening and Patch Management Guide for Windows Server white paper at: https://www.sdn.sap.com/irj/sdn
Backup and patching solutions

As mentioned, the tasks of creating backups or the patching cycle of the operating system and application layer are by far the most frequent reasons for planned downtime. Fortunately, there are technical and architectural solutions for the IT landscape available that help IT departments to avoid or minimize this aspect of planned downtime.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Snapshot backup
56
Backing up a large database might take a long time. The primary issue when creating a backup is that it must be transactional consistent in order to use it for a potential restore. Transactional consistency means that all transactions are either finished or not contained in the backup. SQL Server database backups are created by using the backup database command. This command first executes a checkpoint which means that all pages that have been changed since the last checkpoint and still reside in memory are flushed from the database server main memory to the storage subsystem. After this operation is complete, the database files are backed up by copying the data to another disk or a tape device. To maintain the transactional consistency, the transaction log file is also copied during this process. The transaction log is used to roll back or undo transactions that were not finished at the time the backup was made. Despite the fact that SQL Server backups are online, the backups produce an additional load for the storage subsystem. Therefore, one usually tries to minimize the time of an online backup. In order to minimize the time a backup will impact normal system operation, it is possible to use the snapshot feature of SQL Server 2005/2008 R2 to create the backup. Snapshot backups reduce unavailability of the SQL Server 2005/2008 R2 database during a backup to a couple of seconds. This is especially useful for moderate to very large databases where availability is very important. SQL Server snapshot backup is accomplished in cooperation with third party hardware or software vendors, or both. These vendors use SQL Server 2005/2008 R2 features that are designed for this purpose. The underlying backup technology creates a point-intime copy of the database image that is being backed up. The instantaneous copying is typically accomplished by splitting a mirrored set of disks or by creating a copy of a disk block when it is written. This preserves the original. At restore time, the original is made available immediately and the synchronization of the underlying disks occurs in the background if necessary. This restores operations almost instantaneously. The following figure shows an example of snapshot technology with NetApp FAS storage system and the NetApp SnapManager for Microsoft SQL Server and SnapDrive for Windows solution. In this example, the time required for backups and restore can be reduced to seconds by using SnapManager.
57
Figure 32. Snapshot backup configuration
Detailed information about the SQL Server 2005/2008 R2 Snapshot Backup feature is available at http://msdn.microsoft.com/en-us/library/ms189548.aspx.
Optimized server maintenance system architecture

Through a suitable system design, the downtime due to patch installation or hardware component installation can be significantly reduced. The concept here follows the principle of a rolling upgrade where single components in a server landscape can be isolated and stopped while other components continue to run the application service. Server and operating system maintenance SAP system users are typically affected during the process of SAP component patching. But, the SAP system can be specially configured to enable Windows operating system patching during operation without affecting users. The basic principle is to have sufficient system redundancy in order to take out a portion of the running SAP system and still be able to continue the operation. This is achieved by having groups of AS instances configured in logon groups. SAP users will be assigned to the AS instances depending on the user profile. For example, all human resource users would be in one group while all financial users would reside in another. Should one of the instances in the group need to have a patch installed, this server can be removed from its logon group and no more users will get assigned to this server when logging on. This is done in the transaction SMLG that manages the SAP logon groups. Similarly, the server has to be removed from the group of batch servers (Transaction SM61), update server group (Transaction SM14) and RFC server group (Transaction RZ12). In case the server is used as a spool server, an alternate spool and logical print server need to be activated (Transaction SPAD). Another prerequisite would be to define the parameter rdisp/gui_auto_logout in the instance profile of this server (Transaction RZ10). With this parameter set, all inactive users are automatically logged off of this instance after
SAP on Windows Server 2008 R2 - High Availability Reference Guide the specified amount of time has expired. With this configuration, the instance is made idle quickly and can be shut down.
58
It is required that all the remaining SAP AS instances in a logon are able to handle the workload sufficiently. In larger systems, it might be appropriate to prepare one universal AS instance that can join several groups. This can be achieved by installing several SAP AS instances on a server and start the respective instance when needed. Such a standby AS could be used temporarily to maintain the transactional performance of a SAP system. The following figure shows the setup of this landscape:
Figure 33. Hot-standby server for AS maintenance
There are still the central elements of the SAP system including the servers for the SCS instance and database. If both components are in a WSFC cluster, a server can be isolated using a planned failover to the respective standby server. This switch can actually happen at a convenient time with little effort. The empty server can subsequently be patched and restored to operation. SQL Server instance maintenance Expanding on the previous maintenance concept by using SQL Server Database Mirroring adds the option to patch the database engine installed on a server while the SAP system continues to work. The basic principle is to switch the database to the mirror copy when a patch needs to be installed at the database engine of the original server. After successful installation, the database is switched back and the same process would be executed on the mirror side as well. See the following figure for an example of this:
59
Figure 34. Database mirroring for SQL Server instance maintenance
More information on SQL Server database mirroring can be found in the Disaster Recovery Solutions section. An example of a patch cycle for Windows and SQL Server patches by using the above concept could look like: Patch schedule: Windows and SQL Server patches applied monthly (if applicable) SAP service packs applied during quarterly release Patch first in sandbox and test. Only place in production after a few days of successful testing. If no reboot is required, apply the patches and the patch process is finished. If reboot is required, perform the following steps on each AS: Isolate Dialog/Batch server from: Logon group RFC group Update group
Patch sequence:
Patch process:
SAP on Windows Server 2008 R2 - High Availability Reference Guide Batch group Spool (or have redundant spool server)
60
Drain connections, patch, reboot, then add back into the respective group and proceed to next server. If required, take the temporary AS into the respective group. Suspend mirroring, patch, and reboot secondary server, re-synch, fail over to secondary, patch and reboot primary server short SAP downtime during failover has to be planned. There is no need to fail back. Relocate the SAP central instance in the WSFC cluster to the database server. Patch and reboot inactive node. Fail over database and CI, patch and reboot other node short SAP downtime during failover has to be planned. Distribute database and central instance on the two nodes as before for better performance.
Perform the following steps on the mirrored database servers:
Perform the following steps on the SAP central instance server:
SAP application planned downtime reduction

SAP also works on measures to improve planned downtime by providing enhancements on the application layer. For example, previously with R/3, any SAP profile parameter change required a SAP instance stop and restart in order to activate the change, but today, most parameters can be tuned online. There are still issues with the installation of SAP support packs. For example, if the executables of a SAP kernel have to be patched, the SAP instance also needs to be restarted. SAP is currently working on a rolling kernel upgrade procedure to avoid downtime during the exchange of kernel executables. There are many less frequent tasks such as application upgrades or migrations where SAP can optimize the required downtime. See the following section for more information. Additional SAP application planned downtime information can be found in the following resources: Strategies to avoid or minimize planned downtime on the SAP application level are described in the High Availability for mySAP.com Solutions white paper available at: https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/8f87a790-0201-0010558e-bcf2096ff33b A collection of planned downtime information regarding SAP upgrades, database reorganization, and database backups is available at http://help.sap.com.
SAP on Windows Server 2008 R2 - High Availability Reference Guide To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Search for Planned and Unplanned Downtime.
61
Information about SAP upgrades can be found at https://service.sap.com. Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. After logging on to the SAP support portal, click the Quick Links menu and search for /upgrade. More information on SAP upgrades can be found on SDN by searching for upgrade at: https://www.sdn.sap.com/irj/sdn The SAP Service Marketplace has the following related notes available at: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 139513: Merge transports for high availability systems SAP note 361735: Inactive import of reports
Hyper-V Live Migration

While a Hyper-V host cluster can manage a VM failover from one physical server to another, this process always involves an interruption of the application inside of the VM. However, if the migration of a VM is planned, the downtime caused by the relocation of the VM can be avoided. The following figure illustrates the Live Migration process.
Figure 35. Live Migration
SAP on Windows Server 2008 R2 - High Availability Reference Guide The Hyper-V host cluster can perform a Live Migration of the VMs without application service downtime. The following example describes this process in more detail.
62
A Hyper-V host cluster configured for Live Migration and a VM running a SAP application is actively used by clients. At some point, the administrator must migrate this VM to another server in the Hyper-V host cluster since the server that the VM resides on must go into maintenance. Initially, while the VM is still actively used on the primary server, an empty VM is created on the second server and the memory image of this VM is copied to the second server. If the memory pages on the primary server are changed during this process, Hyper-V detects this and copies those pages again. Eventually, the number of pages that are different between the two servers is significantly reduced. When the difference is small enough, Hyper-V pauses the VM on the primary server and copies the last set of changed pages to the new server. Subsequently, the client access is re-routed to the new server and the VM on the primary server can be deleted. Since the final state transfer happens very quickly and no TCP timeout occurs, the client does not recognize this transfer. Note: It is important to note that Live Migration does not work for unplanned downtime. In the case of a server failure, the VMs will fail over using failover clusters. The Live Migration process must be planned and requires an active primary system for the duration of the migration. Since the Hyper-V host cluster and Live Migration use the same setup, this solution is an extension of the high availability solution with WSFC. Live Migration provides the capabilities for minimizing planned downtime in a virtual environment. These capabilities are not available for applications that must be installed directly on the physical server. More details on how to set up a Live Migration cluster are available in the following documents: Windows Server 2008 R2 Hyper-V Live Migration white paper available at: http://download.microsoft.com/download/C/C/7/CC7A50C9-7C10-4D70-A427C572DA006E61/LiveMigrationWhitepaper.xps. Best Practices for SAP on Hyper-V white paper available at: http://www.microsoft.com/virtualization/en/us/solution-business-apps.aspx. Hyper-V: Live Migration Network Configuration Guide available at: http://technet.microsoft.com/en-us/library/ff428137(WS.10).aspx.
63
Data Inconsistency Protection Solutions

Logical errors are always affecting business data. While the loss of critical hardware resources can almost always be detected immediately, this can also be addressed through immediate and automatic actions, like when performing an application failover inside a cluster. The issue with logical errors is somewhat more complicated. Typically, data inconsistencies are only discovered when the data is accessed. If problems are discovered at this point, typically the only resolution is to restore the last backup copy of the undamaged data. The extent and the duration of this rescue operation, as well as the effect on the running productive operation, are totally dependent on the extent of the damage and the data relevance for the operation. In worst case scenarios, however, data inconsistencies can cause production system downtime while complete restoration of consistent data is performed. Additionally, if a restore to a consistent point in time is required, work that has been performed after the last backup was created might get lost.
Logical error reasons

There are many reasons for inconsistencies. They might be caused by technical problems or hardware errors where the data is unintentionally overwritten. For example, storage adapter problems are a typical reason for data corruption. Another reason could be a programming error in an ABAP report or Java program that accidentally changes or deletes the data in the database. Errors can also happen quite often through faulty human operation such as accidental data deletion. Finally, there is always a probability for data damage through sabotage or viruses that should not be overlooked. The following figure shows the reasons for computer system downtime and their relative importance source ZDNet, October 2002:
Figure 36. Computer downtime reasons
SAP on Windows Server 2008 R2 - High Availability Reference Guide Database data inconsistencies
64
Data consistency in the SAP system central data repository is one of the most fundamental requirements of a stable SAP operation. After all, data only exists once in an application. As we have discussed in the previous section, there are numerous triggers that can cause data corruption at any given time during operation. What is especially difficult in this class of problems is the early recognition of an error condition before it causes damage. Database consistency checks do not typically take place during normal operation, or at least during normal workload. A procedure to perform such a consistency check with a SQL Server database is discussed in the Data Inconsistency Protection Solutions section in this white paper. Other database vendors have developed similar procedures. These procedures typically can be found in the respective database vendors documentation. Sabotage and accidental data deletion Accidental data deletion in the database or on the file system level can have very serious consequences for application operations. It can cause a serious disruption to the normal course of operations, or even bring them to a total stand still. In order to avoid such problems, one must try to address IT operation security aspects through a concept of authorization in which only designated persons have permissions to work in their area of expertise. Of course, especially in the group of administrators, there is overlapping authorization needed for their daily work. Indeed, it appears that accidental deletion of data is not a rare problem. From past experiences with large data center operations of SAP infrastructures, there are many reports about SAP application downtime due to deleted data or tables in the database. Once the damage has occurred, the only way to recover is to restore the missing data from the last backup. This can be challenging since in most cases only the missing data needs to be restored. If for example only a single table is affected, it does not make sense to restore the complete database from the last backup. This procedure would cause the data in all other tables not affected by the issue to be set to the same backward state as the affected table. Therefore, restoring these tables would potentially cause more data to be lost. Databases with snapshot technology can be especially helpful to minimize the downtime duration. A snapshot is a transactional consistent point in time after which all changes are directed to a different physical location. In other words, they are an exact image of the database to the point of its generation and can, without recovery, be put into operation. During the system recovery process, it is possible to export tables from the snapshot and import them into the active system. Alternately, in case of a very serious problem, the complete condition of a snapshot could again be restored in the active database. However, under these circumstances, changes in other tables might be lost too. Data loss through viruses and worms Viruses, worms and Trojans are an unfortunate reality that every business must address. Malicious code, that tries to secretly enter computer systems, might start activities which range from espionage to data erasure. This attack is not necessarily external. For example, employees who work with their laptop at home or in a hotel might bring an unwanted guest back when they plug their computer into the company network. Another reason that viruses might appear is because infected personal software is installed on a PC attached to the company network.
65
Security measures that are taken to minimize data loss or espionage of confidential information are significant. The measures taken include the implementation of firewalls, virus scanners, and surveillance tools, as well as employee policies. Optimal security requires that appropriate measures are taken on all levels of the IT operation including: Virus scanners on the computer level. Demilitarized zone for outbound communication. Firewalls at the network level. Well-developed authentication and audit procedures at the application level.
Security measures also include operational tasks, such as the timely installation of security patches to close any possible gaps immediately after such vulnerabilities have been published. A detailed discussion of the threats, as well as possible concepts and measures are outside the scope of this white paper. The Microsoft TechNet library provides detailed information about Microsoft products and technology for IT professionals. The Microsoft TechNet library can be found at: http://technet.microsoft.com/en-us/default.aspx
Backup and recovery

Backup of all application data, configuration information, operating system installation, and file systems is one of the most critical tasks in the daily operation of data centers and the first measure for protecting against the loss of critical data. The main purpose of a backup is to maintain a copy of the critical data and configurations that enables swift application service restoration in the case of a severe error that damages this data or the runtime environment. The basic backup hierarchy is shown in the following figure:
Figure 37. Basic backup hierarchy
Appropriate backups can be used to restore individual files in case a single file has been deleted. However, these backups can also help to recover a complete system if a severe
66
failure happens. Even in the case where a disaster destroyed the original computer systems, backups can be applied to a second computer and operation will be restored. A backup and restore strategy is the last step in a recovery from an unforeseen event that will return a database to some predefined point: Most likely to the last completed transaction prior to the failure. All aspects of the backup and restore strategy should be well documented and reviewed regularly. Most importantly, they should be tested regularly to ensure that the data and the media for backups are valid and that the processes work as expected. Database backup strategies The backup and restore components provided with Microsoft SQL Server 2000 and later enable the administrator to reproduce a database to an exact replica of the original database at any point in the database history from the time an appropriate backup strategy was implemented. There are several backup types available: Full backup: Makes a complete backup of the database to the last completed transaction affected during the backup process. Differential backup: Makes a copy of the database pages changed since the last full backup. It is a useful backup mechanism to back up a database without consuming as many resources as a full backup. In a restore operation, this is used in conjunction with a full backup. Transaction log backup: Makes a backup of all the completed transactions that have taken place since the log was last backed up. A transaction log backup is used in conjunction with a full backup, and potentially differential backups, to enable an administrator to restore a database to a specific point in time or to the last completed transaction that was backed up. File backup: When a database consists of multiple files, each file can be backed up individually. This provides an accelerated backup process as well as a faster restore process. File backups are used in conjunction with transaction log backups. Step-by-Step Guide for Windows Server Backup in Windows Server white paper available at http://technet.microsoft.com/de-de/default.aspx. Note: This document can be found by searching for the title. SAP Backup and Restore Information MS SQL Server help documentation available at http://help.sap.com.
Additional backup solution information is available in the following resources:
To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Choose Technical Operations Manual. Choose General Administration Tasks. Choose Database Administrations. Follow the MS SQL Server link. Follow the Monitor for Backup and Restore Information link.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Blog entry: How does Microsoft perform backups in their SAP system landscape available at:
67
http://blogs.msdn.com/saponsqlserver/archive/2008/03/28/how-does-microsoftperform-backups-in-their-sap-system-landscape.aspx
Database log shipping

Microsoft SQL Server log shipping is a technology that has been available since the release of SQL Server 2000. The basic concept is also described in the Disaster Recovery Solutions section for the purpose of maintaining a database for disaster recovery. While the idea of disaster recovery is to have a second remote database copy to continue operations in case of a catastrophic event, log shipping is also used to maintain a database copy in the event of a severe logical error or inconsistency. Log shipping is basically nothing more than a fully automated continuous backup of the transaction logs of an active database to a remote computer and the application of these backups to a standby database. The time interval at which the transaction log backups are applied to the standby system is a configurable parameter. If there is a save delay between the time a transaction has been performed at the productive database and the transaction log backup is applied at a standby database, the application process to the standby database can be stopped when a problem at the productive system occurs. The standby database would be kept in a consistent and usable state up to the point where the error occurred. It could be used for continuing the productive work. The standby database only needs to be activated and users diverted to the new server. Or the data could be extracted out of the standby database and applied to the primary database in order to compensate for a user error. In practical implementation, the delay time between a standby and a productive system should not be less than one hour. SQL Server log shipping provides a very efficient data replication method and protects against downtime caused by logical errors or inconsistencies. The principal functional setup is shown in the following figure. SAP transactions are executed at the primary database server. Every transaction is recorded in the local transaction log that is copied to local disk at the primary system (1). After the log has been created on the primary side, it is copied over the network to the secondary database server (2). The secondary database server then reads the transaction log, incorporates it into its own log buffer, and applies it to the standby database (3). During application of the transaction log, all recorded transactions in the log are executed on the standby server.
68
Figure 38. SQL Server log shipping
For more information about SQL Server log shipping, please refer to the Disaster Recovery Solutions section.
Snapshots
A snapshot is an image of information that has been frozen at a certain point in time. The snapshot delivers an accurate picture of the information at an accurately defined point in time. Snapshots are typically taken from fast changing data, like in a file system or in a database. Technically, snapshots typically use the copy-on-write principle. In this principle, all data on a storage media is represented as a chunk of data blocks. Data blocks access is provided by pointers. Each block has an individual pointer that describes where this block resides on the media. A snapshot first takes all the pointers at a certain point in time and saves them. Any time a data block is changed, the data block is first copied into a snapshot file and the system uses a new pointer for the changed block. By copying only the pointers to data initially and copying data blocks only if changes occur, snapshots are very fast and require relatively little disk space. However there will be some impact for copying changed blocks to the snapshot. Database snapshots with SQL Server 2005/2008 R2 With SQL Server 2005/2008 R2 Enterprise Edition, database snapshots can be created. A database snapshot is a read-only, transactional consistent view of a database. Transactional consistent means that only those transactions which have been finished by a commit work statement are taken into the snapshot. Snapshots can be generated automatically, at any point in time, and also be used for reading access during daily operations, such as report generation. The snapshot copy of the database can be queried by client applications and, in the event of the original database becoming damaged or unusable, it can be reverted to the state it was in when the snapshot was created. Since every new snapshot requires storage space, it is recommended that the older snapshots are always deleted after a certain time. The optimal time for retaining snapshots depends a bit on the individual requirements, but a time interval of one or two days for retaining snapshots is sufficient.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Additional information about SQL Server 2005/2008 R2 available by searching for snapshots in the SQL Server 2005 online books at: http://msdn.microsoft.com/en-us/library/ms130214.aspx. Further information can be found on the SQLCAT Web site at:
69
http://sqlcat.com/whitepapers/archive/2008/02/11/database-snapshot-performanceconsiderations-under-i-o-intensive-workloads.aspx. Snapshots with storage solutions As we have seen, the snapshot feature of SQL Server 2005 provides an opportunity to create a transactional consistent snapshot on the database level. In addition, there is another snapshot feature on the storage level. This kind of snapshot works on the physical storage layer and is not restricted to certain data types like database data. From the technical perspective, a snapshot is represented by pointers to the blocks of data on the storage volume. During a write in a specific data block, this block is copied into a snapshot file and the pointer in use by the system is pointing to the new block while a copy of the old pointer is maintained. In case one would need to go back to the time when the snapshot was created, the old pointer would be re-activated. In other words, the snapshot would be reverted. Another possibility is the deletion of a snapshot which would merge all copied blocks with the original data. This would require the recorded changes to adhere to the new standard. In order to recover a system after a severe error, volume snapshot copies can be activated as read-only and any damaged data can be extracted from this copy. Since the creation of the snapshot during normal operation might be done online and without severe performance impact on the system, it is possible to maintain several snapshot copies per day. This process can be automated and might run without human intervention. A very important aspect of snapshots is scheduled backup execution. Snapshots do not replace backups. Since snapshot copies of the data can reside on the same physical media as the original data, it is possible to lose this data if the physical volume becomes defective. Snapshots are a great and efficient way to maintain a consistent state in time that can be reverted if needed. Hyper-V snapshots for virtual machines With Hyper-V snapshots, administrators can capture current VM time images that can be accessed at any stage. Since a consistent VM state can be created by using snapshots, this feature can be used before any critical VM change such as when applying patches, changing configurations, or upgrading applications. If any of these steps fail, the VM can be easily reverted to the previous state. VM snapshots can be created inside the Hyper-V administration GUI or by using System Center Virtual Machine Manager. Hyper-V enables users to create a hierarchy of snapshots. When a snapshot is created, the existing VM VHD file is frozen in its current state and any change that occurs inside a VM after a snapshot is created is transferred to a new file that is called an AVHD file. If another snapshot is created after the first one, the first AVHD file is frozen and any change made afterward is transferred to another AVHD file.
70
Database consistency checks

The Microsoft SQL Server provides the DBCC CHECKDB command to check data consistency and make repairs to the detected inconsistencies. Since the execution of a DBCC CHECKDB can impact the transactional performance of a SAP system, such testing should take place during low volume times or in a maintenance window. As a rule of thumb, the time required for the examination of a 70 GB SAP database on a four processor computer, would be approximately one hour as a single threaded task. The SQL Server Instance parameter max degree of parallelism is set to 1. Database consistency checks examine the entire database contents for inconsistencies. All tables and index data is checked and verified as readable. In order to automate this task, consistency checks can be scheduled in the DBA planning calendar to be executed when the overall transactional load is low such as during the weekend. In the following figure, the DBA Planning Calendar screen is shown.
Figure 39. DBA Planning Calendar screen
It is a good strategy to run the DBCC check outside downtime and with the normal system workload during low volume operation such as over the weekend. Based on the SQL instance configuration, the DBCC check command only requires one CPU core. Therefore, with a modern multi-core server, there are still enough cores available to maintain the SAP operation.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Large database consistency
71
With very large databases over a terabyte in size, consistency checks using the DBCC CHECKDB command in a maintenance window or during low volume times might impact production for too long. To be able to regularly examine the database consistency for such systems, one can choose another approach. It is possible, for example, to restore the last backup of a productive database on a test system periodically and then carry out the consistency check on this server. The advantage of this approach is that the command runtime does not negatively affect the productive SAP system performance anymore since it is executed on the test system. A second advantage would be the fact that the test system gets frequently updated with the latest productive data. Finally, restoring the last backup of the productive SAP database to a test system is a very efficient check if the backup itself is usable. Finding backups that are not readable or do not contain the right data is a real disaster once the backup is really needed. The process of restoring a backup also helps the administrators to train in the procedure. Proper skills in the backup and restore process are very helpful if needed in a real emergency. The training advantage should not be underestimated. SAP and Microsoft support deal with dozens of cases every single year where database backups prove not to be restorable due to mechanical issues such as tape errors, human error, or simply lost tapes. Additional data consistency check information is available in the following locations: SAP note 142731: DBCC Checks of SQL Server available at: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP Help also has information about database server checks at http://help.sap.com. To access this information, do the following: From the left menu pane, click SAP NetWeaver. Under SAP NetWeaver 7.0 Library, choose English. Choose Technical Operations Manual. Choose General Administration. Choose Database Administration. Follow the MS SQL Server link. Under Periodic Tasks, follow the Database Server Checks link. http://msdn.microsoft.com/en-us/library/ms130214.aspx Note: Search for DBCC CHECKDB:
SQL Server 2005 books online:
72
Disaster Recovery Solutions

Disasters are events that significantly impact the local computer system that hosts an application as well as the entire IT environment. Examples include earthquakes, fire, floods, and acts of terrorism. Consequently, damages impact not only the data center hardware, but also impact infrastructures such as air conditioning, the power grid, or communication lines. It is clear that any instance of IT systems that exists physically only on one site assumes the risk of long outages if such an event happens. Even worse, if storage subsystems are damaged and backup copies reside in the same geographic neighborhood, there is a possibility of not only losing the application service, but the data as well. The consequences of such an event would be catastrophic to the enterprise. The only protection for events of this scale is a geographic distribution of the IT systems that host the applications over a long distance and maintain a redundant copy of the elementary data on each side. This concept needs to be supplemented with automated failover solutions for the quick recovery of failed applications, proper planning, and a well prepared and educated staff. Disaster recovery consists of the measures taken to recover from a catastrophic event. During this phase, the complete infrastructure including the hardware is unusable. From the perspective of business continuity, there are two important definitions that help to define the optimal solution: Recover Time Objective (RTO): Describes the maximum time interval until the application service has to be available again. This time is measured between the time the event occurred and the time the application service is usable again. In practical implementations, it might range from minutes to several hours and has a direct impact to the data replication technology and network bandwidth requirements between two sites. Recovery Point Objective (RPO): Describes how much data might be lost in case a catastrophic failure occurs. The amount of data is expressed in time going back to the last transactional and functional consistent state. In an optimal case, a RPO of zero would result in a synchronous replication of every transaction between two sites. This definition again has consequences in the used technology and network bandwidth.
Besides these two considerations from the business process perspective, there are two more prerequisites that influence the decision for a disaster recovery solution. These are: The distance between the primary and the disaster recovery site. The available network bandwidth between the primary site and the disaster recovery site.
The following chapter describes in detail the technical components available with Windows-based systems to achieve geographical distribution. Additionally, the solutions for maintaining data copies over large distances based on SQL Server are also covered. Disaster recovery solutions are typically combined with other solutions to protect against outages. It is typically not desirable for example that a local server failure results in a complete site transfer. Here, a local cluster would protect against hardware failures while the geographic dispersion is only used in case of a real disaster. The decision when a site transfer is required is typically not automated and requires human interaction.
73
SAP system protection in a geographically dispersed cluster

In order to protect the Single Points of Failure (SPOF) of a SAP instance and database, WSFC implementations are used in a geographic dispersed installation. Applications in a multi-site cluster are typically set up to fail over just like a single site cluster. WSFC itself provides health monitoring and failure detection of the applications, the nodes, and the communication links. SAP supports the use of the WSFC service in geographic dispersed configurations, but requires that the behavior is identical to a local WSFC installation. In other words, Windows clustering does not detect the extended nature of these types of clusters. This can be achieved with the following settings: Storage arrays that are visible on both sides for the SAP file systems and database Changed quorum implementation from shared disk to majority node set cluster VLAN configurations for a single subnet because of the SAP system requirements
Note: Because of the complexity of geographically dispersed clusters, the hardware vendor must be involved with the design, setup, configuration, and subsequent support of the implementation. SAP support is limited to the standard WSFC cluster implementation that does not recognize geographic dispersion. From the perspective of a SAP application, it looks as if the cluster is local. Storage replication In order to enable a fast failover on a secondary site in case a catastrophic event occurs, a synchronous copy of the file system in use by the SAP system has to be maintained on each site. This block-level replication can be achieved with hardware or software-based replication. Hardware-based replication With this method, the complete replication task is done at the storage level. The following figure shows the basic setup:
74
Figure 40. Hardware-based replication
The advantage of this solution is that it works completely independent from the application. However, as the replication is performed by the storage controller, the SAN storage devices have to be from the same vendor and there is a high bandwidth requirement for the replication. Additionally, software components from the storage vendors are required to enable WSFC to appropriately use this configuration. Examples of storage-based replication providers are: EMC with SRDF HP Storage Works Business Copy EVA NetApp MetroCluster with SyncMirror IBM GDPS with PPRC Hitachi Storage Clusters
Note: For hardware or software-based replication solutions to work, they are required to replicate SQL Server write I/Os in exactly the same order as originally issued by the database. Software-based replication With this method, any change on the active side is copied over the network to the secondary side and replicated there. This requires the use of a software product that is not part of the initial cluster setup. While these software components increase the implementation cost, the advantage is that different storage devices can be used. It is even possible to have SAN storage on one side and a Direct Attached Storage (DAS) on the other side. Examples of vendors for software-based replication products include:
SAP on Windows Server 2008 R2 - High Availability Reference Guide NSI Double-Take Legato RepliStor Symantec Storage Replicator SteelEye DataKeeper Neverfail ClusterProtector
75
The following figure shows the principle setup when using this method.
Figure 41. Software-based replication
Cluster quorum configuration In simple terms, quorum for a cluster is the number of elements in a cluster that must be online in order to enable proper cluster function. If one or more nodes in a cluster can no longer communicate to the other nodes in the cluster because of a split situation for example interrupted network connections there must be a voting mechanism that determines which side has the majority (quorum) to actively hold the applications in the cluster. Each WSFC cluster has a special resource known as the quorum resource. While with Windows Server 2003 almost all server clusters used a disk in cluster storage as the quorum resource, a different approach is used with Windows Server 2008 R2. If a node could communicate with the specified disk, the node could function as a part of a cluster, and otherwise it could not. This made the quorum resource a potential single point of failure. In Windows Server 2008 R2, a majority of votes is what determines whether a cluster achieves quorum. Nodes can vote, and where appropriate, either a disk in cluster
76
storage, called a disk witness, or a file share, called a file share witness can vote. There is also a quorum mode called No Majority: Disk Only that functions like the disk-based quorum in Windows Server 2003. Aside from that mode, there is no single point of failure with the quorum modes since what matters is the number of votes, not whether a particular element is available to vote. There is a comprehensive description about the available quorum options for Windows Server 2008 R2 available at http://technet.microsoft.com/en-us/library/cc770620.aspx. Majority Node Set configuration for Windows Server 2003 In a Majority Node Set (MNS) cluster, each node in the cluster maintains a copy of the quorum data locally on its system disk. This MNS quorum is constantly synchronized and kept consistent by the cluster itself. If the configuration of the cluster changes, that change is reflected across the different member nodes. The change is only considered to have been committed if it has been successfully distributed to: (Number of nodes configured in the cluster/2) + 1 This ensures that a majority of the nodes have an up-to-date copy of the data. The cluster service itself will only start up and therefore bring resources online if a majority of the nodes configured as part of the cluster are up and running the cluster service. If there are fewer nodes, the cluster will not have the quorum and therefore, the cluster service waits to restart until more nodes join. In the case of a failure or split-brain, all partitions that do not contain a majority of nodes are terminated. This ensures that if there is a partition running that contains a majority of the nodes, it can safely start up any resources that are not running on that partition. This ensures that it can be the only partition in the cluster that is running resources. MNS quorum implementations are recommended for geographically dispersed clusters. By having a single MSCS member node in a separate location, split-brain situations can be avoided by using one node as an arbiter. See the following figure for an example of this:
77
Figure 42. MSCS member node in a separate location
SAP supports a Majority Node Set Cluster if it is part of a cluster solution offered by the Original Equipment Manufacturer (OEM), or Independent Hardware Vendor (IHV). File share witness for Windows Server 2003 The file share witness feature is an improvement to the current Majority Node Set (MNS) quorum model of Windows Server 2003. This feature enables the use of a file share that is external to the cluster as an additional vote to determine the status of the cluster in a two-node MNS quorum cluster deployment. One of the disadvantages of a two-node MNS cluster is that it cannot sustain the failure of any cluster node without losing the majority of nodes. In other words, it cannot continue operation. The only solution to overcome this problem is to configure at least three nodes in a MNS cluster. The three cluster nodes need to be continuously available and should be in different physical locations. With the File Share Witness feature, it is possible to use an external file share instead of the third cluster node also referred to as the witness. By using the File Share Witness, a two-node MNS cluster can be configured and remains operational even if one cluster node dies. The file share used acts as an additional vote to determine which node takes ownership of the configured cluster resources. Additional information about the File Share Witness feature is available in the Microsoft Knowledge Base Article 921181 at http://support.microsoft.com/kb/921181.
SAP on Windows Server 2008 R2 - High Availability Reference Guide Network configuration Any WSFC configuration requires at least two network adapters for the following purposes: A public network that is used for the communication between the SAP central instance, SAP AS, and SAP system client connections. A cluster private network that is used internally for status exchange and WSFC cluster heartbeat information between the member nodes.
78
Each of these network adapters is required to have its own physical IP address and corresponding host name. The cluster service in a WSFC cluster is unaware of a possible geographical dispersion and assumes that its public and private network interfaces still exist in the same network segment with the same IP subnet. This is because cluster software is unable to determine network topology and because it operates on IP failover that only functions within the same subnet. To accommodate these restrictions for geographic dispersion, organizations can implement VLAN technology. Virtual LANs (VLANs) can be viewed as a group of devices on different physical LAN segments that can communicate with each other as if they were all on the same physical LAN segment. Even though some of the cluster service network communication limitations have been removed in Windows Server 2008 R2, a single subnet is still required. This is still true for SAP component communications as well. With Windows Server 2003, the limitation for the heartbeat roundtrip time is 500 milliseconds. This fixed parameter is directly dependant on the latency and bandwidth of the network connections used between the two sites. With Windows Server 2008 R2, this parameter became configurable between 250 and 2000 ms on the same subnet. Theoretically, even different subnets are possible with Windows Server 2008 R2, but due to the requirements of the SAP instances, SAP installations are only possible in a single subnet configuration. Additional geographically dispersed cluster information is available in the following resources: White paper: Geographically Dispersed Clusters in Windows Server 2003 http://www.microsoft.com/windowsserver2003/techinfo/overview/clustergeo.mspx White paper: Server Cluster Quorum Options in Windows Server 2003 http://technet.microsoft.com (Note: Search for the title.) White paper: Stretching Microsoft Cluster with Geo-Dispersion http://www.microsoft.com/technet/prodtechnol/windows2000serv/maintain/optimize/g eoclust.mspx White paper: Server Clusters: Majority Node Set Quorum http://technet.microsoft.com (Note: Search for the title.) Microsoft Storage solutions http://www.microsoft.com/windowsserversystem/storage/default.aspx Microsoft Knowledge base article: Microsoft Cluster Services Installation Resources http://support.microsoft.com/kb/259267 Multi-Site clustering with Windows Server 2008 R2 https://www.microsoft.com/windowsserver2008/en/us/clustering-multisite.aspx
79
Microsoft SQL Server database log shipping

SQL Server log shipping was first implemented in SQL Server 2000 and provides a convenient way to maintain a standby database even on a geographical distance. Its basic functionality is the automatic transfer of transaction logs from the primary database to a second database on another server. There are three operations to complete a log ship process: Back up the transaction log at the primary database server. Copy the transaction log file backup to the secondary database server. Restore the log backup on the secondary database server.
The following figure shows the general setup:
Figure 43. Database log shipping with Microsoft SQL Server
Transactional log backups on the primary database server are copied to a local disk on this server and transferred over the network to the standby database in the configured time interval. Transactional log backups received on the standby server are applied to the database. It is also possible to transfer the transactional log backups from the primary to multiple standby servers. The process of changing the database role from primary to secondary or to bring the secondary database online in the event of the primary database becoming unavailable is not an automatic process. The secondary database can be brought online manually. During the process of setting up SQL Server log shipping, initially a database backup copy is restored on the standby server. With log shipping in place, every transactional change is reproduced on the standby side.
80
By design, SQL Server log shipping might only maintain the SAP system database in geographic dispersed way. As the complete functionality of a SAP system requires a SAP central instance with the network shares and possibly an AS, these have to be maintained separately. SQL Server log shipping is therefore not considered a full disaster recovery solution, but is a simple method of maintaining a copy of the database of a SAP system and can be combined with other technologies like database mirroring or WSFC clusters. The following figure shows the general setup with a local WSFC cluster:
Figure 44. Local WSFC cluster
Additional SQL Server log shipping information is available in the following resources: SAP Service Marketplace URL: https://service.sap.com/notes Note: Access to this Web site is available only to registered SAP customers and partners and requires a user name and password. SAP note 493290: Configuring SQL Server Log shipping SAP note 1101017: Log shipping on SQL Server 2005
SQL Server 2000 high availability series http://www.microsoft.com/technet/prodtechnol/sql/2000/deploy/harag05.mspx White paper: SAP with SQL Server 2005 http://www.microsoft.com/sql/techinfo/whitepapers/sap-with-sql-server.mspx
SAP on Windows Server 2008 R2 - High Availability Reference Guide White paper: Using SQL Server 2005 with SAP R/3 http://www.microsoft.com/technet/itsolutions/msit/operations/sql2005sap.mspx SAP Help documentation: SAP High Availability http://help.sap.com To access this information, do the following: From the left menu pane, click SAP NetWeaver. Choose English under SAP NetWeaver 7.0 Library. Choose Technical Operations Manual. Choose General Administration Tasks. Choose High Availability. Follow the SAP High Availability documentation link. In the left menu pane, choose Database High Availability. Choose High Availability for the MS SQL Server Database.
81
Database mirroring with SQL Server 2005/2008 R2

Database mirroring is a database feature developed as part of Microsoft SQL Server 2005. Conceptually, database mirroring consists of a database, called the principal database, that resides on a Microsoft SQL Server 2005/2008 R2 database instance and a mirror that resides on a different Microsoft SQL Server 2005/2008 R2 database. With SQL Server 2005/2008 R2 database mirroring; all the database transactions of a productive database are replicated on a standby database. Similar to the log shipping procedure, the transaction logs of the database play a major role. As seen in the following figure, transaction logs are used in all SQL Server databases to record data changes during transactions. The recordings first go into a memory area allocated to the database log buffer. From there, the data is written as quickly as possible into a log file. In systems with active database mirroring, the log buffer content is simultaneously transferred to the mirror server at the same time.
82
Figure 45. Database Mirroring with SQL Server 2005
The mirror server that receives these transaction log records writes them into the mirror database log buffer before it writes them into a local transaction log file. The received transaction log records are then applied to the mirror database. During the transaction log application, all transactions executed on the active database are then executed on the mirror side. Therefore, both databases can be maintained on the same transactional level. While, for database mirroring, the active and the mirror database must always be available, there is an optional third role: the witness. With this optional configuration, in case of error, an automatic failover to a mirror database can take place. When the database mirroring is used in a high availability configuration, within seconds, the mirror server can take on the role of the active server. The mirror database becomes available in cases where the witness confirms the failover. Database mirroring assures the availability of a consistent standby database in case of a productive database interruption. By encoding the data packets in the network transport, good data security is assured. SQL Server 2005/2008 R2 enables three different mirror configurations including: Asynchronous mirroring Synchronous mirroring Synchronous mirroring with automatic failover
SAP on Windows Server 2008 R2 - High Availability Reference Guide Asynchronous database mirroring
83
Asynchronous data transfer between a productive database and a standby database means that there is no waiting to acknowledge the transfer before the pending transaction is concluded with the commit work statement. The primary advantage of this operation is that the transaction processing is minimized. The time required to acknowledge a transaction onto a mirror server can mean that with low network bandwidth, there is a significant performance bottleneck. Though the transaction performance is improved with asynchronous database mirroring, there is a significant disadvantage. For example, one cannot guarantee that all transactions were safely transported to the mirror server at any point in time. In case of an error, this situation can lead to a loss of committed transactions. With the asynchronous mirroring, SQL Server 2005/2008 R2 cannot switch automatically to a standby server in case of an error without an additional Microsoft partner solution. However, the standby database is still available, but only with the potential loss of committed transactions. The database can be designed to continue with productive operation, but the failover needs to be initiated manually. Asynchronous database mirroring is best applied in disaster recovery scenarios. Because of the greater distances between mirror servers, network bandwidths are often limited in this case. Presently, the log shipping technology introduced with SQL Server 2000 is often deployed in this scenario. Synchronous mirroring with automatic failover in case of error With synchronous database mirroring, the advantage is that the database transactions are seen as complete only if the writing process on the mirror side is complete. In this type of operation, it is guaranteed that the mirror copy always has the exact same transactional level as the original. Because of this increased data security, the automatic switching of the database operation in case of an error is possible. The prerequisite for the automatic failover configuration, however, is the installation of an additional database server or the witness. This witness is basically an additional instance of a SQL Server that is only needed for determining which mirror site is able to take over. This can be basically any SQL Server instance. Even the free SQL Server Express Edition would work. In case of a failure of the active database server, the mirror server and the witness supply for the majority (quorum) that defines who can actively hold the database. Even if the primary server recovers, the active database role is not accidentally returned to the primary server. This is because the quorum defines the mirror server as the active database owner after a failover occurs. Due to the bandwidth requirements of synchronous database mirroring, fast network connections with low latency are required. This typically also determines the maximum distance that two sites can be apart from each other. In current technologies, this distance is about 50 km. SAP database mirroring configurations There is detailed information about database mirroring installation and configuration for SAP available in the SAP note 965908. This note is also important when determining how to combine database mirroring with other technologies like log shipping.
84
Additional information about SQL Server database mirroring is available in the following resources: White paper: SAP with SQL Server 2005 http://www.microsoft.com/sql/techinfo/whitepapers/sap-with-sql-server.mspx White paper: Using SQL Server 2005 with SAP R/3 http://www.microsoft.com/technet/itsolutions/msit/operations/sql2005sap.mspx Books online: Microsoft SQL Server 2005 http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx White paper: SQL Server 2008 Technologies for SAP Solutions https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/60a236a2-81042b10-5ebe-8fef61cc82fd
Disaster recovery solutions for virtual machines

As described in the Hyper-V host cluster section, the foundation for all Hyper-V high availability solutions is the WSFC. Similar to physical installation disaster recovery solutions, Hyper-V disaster recovery is based mainly on geographically dispersed WSFC clusters and SAN storage with storage replication. The significant aspect here is that the SAN storage vendor has to provide integration components for the cluster so that the cluster can handle the VHD files as cluster resources on this type of storage. Currently, there are many storage vendors that support disaster recovery configurations for Hyper-V including Live Migration over two geographically dispersed sites. For more disaster recovery information, please see http://www.microsoft.com/virtualization/en/us/solution-continuity.aspx.

SAP High-Availability PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SAP High-Availability PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Microsoft Collaboration Brief

SAP Applications on Windows Server 2008 R2 High Availability Reference Guide

SAP Applications on Windows Server 2008 R2 High Availability Reference Guide

SAP Applications on Windows Server 2008 R2 High Availability Reference Guide

Service level agreements............................................................................................ 7

High availability solution risks and side effects ............................................................ 9

Hyper-V virtualization and availability........................................................................ 10

SAP standalone engines........................................................................................... 29

Backup and patching solutions ................................................................................. 55

Optimized server maintenance system architecture .................................................. 57

SAP application planned downtime reduction ........................................................... 60

SAP on Windows Server 2008 R2 High Availability Reference Guide

Backup and recovery ................................................................................................ 65

Database log shipping .............................................................................................. 67 Snapshots................................................................................................................. 68

Database consistency checks ................................................................................... 70

Disaster recovery solutions for virtual machines ....................................................... 84

SAP Applications on Windows Server 2008 R2 High Availability Reference Guide

SAP on Windows Server 2008 R2 - High Availability Reference Guide

High Availability Considerations

Critical application availability requirements

Classes of availability problems

Service level agreements

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Availability Percent 95 98 99 99.9 99.99 99.999

Achieved Days 346.75 357.7 361.35 364.635 364.9635 364.99635

Planned Days 365 365 365 365 365 365

Table 2. Assumed unavailability

SAP on Windows Server 2008 R2 - High Availability Reference Guide

High availability solution risks and side effects

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Hyper-V virtualization and availability

SAP on Windows Server 2008 R2 - High Availability Reference Guide

SAP on Windows Server 2008 R2 - High Availability Reference Guide

SAP Architecture and Requirements

SAP NetWeaver and its components

Figure 2. SAP NetWeaver framework

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 3. SAP NetWeaver software development environment

SAP Application Server architecture

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 4. SAP Web application processing options

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 5. A pure ABAP system using SAP AS version 6.40

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 6. SAP central instance

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 7. Simplified ABAP system setup using SAP AS version 7.0

Figure 8. ASCS instance directory structure

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 9. ASCS instance and dialog instance

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 10. A dual-stack system using SAP AS version 6.40

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 11. SAP AS 6.40 dual-stack file structure

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 12. SAP AS 7.0 dual-stack system structure

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 13. SAP AS 7.0 dual-stack directory structure

These components are shown in the following figure:

SAP on Windows Server 2008 R2 - High Availability Reference Guide

Figure 14. Java system main components

SAP on Windows Server 2008 R2 - High Availability Reference Guide

SAP on Windows Server 2008 R2 - High Availability Reference Guide