Sie sind auf Seite 1von 17

\

Technical Report

Nondisruptive Operations and SMB File Shares


for Clustered Data ONTAP
Meghan Liese, NetApp
April 2013 | TR-4100

TABLE OF CONTENTS
1

Introduction ........................................................................................................................................... 3

Scope ..................................................................................................................................................... 3

Minimum Requirements ....................................................................................................................... 3

NDO for CIFS Overview ........................................................................................................................ 3

LIF Migrate............................................................................................................................................. 4

High-Availability (HA) Pair Failover and Giveback ............................................................................ 5

Volume Move......................................................................................................................................... 7

Nondisruptive Upgrade (NDU) ........................................................................................................... 10


8.1

NDU with a Maintenance Window .................................................................................................................11

8.2

NDU without a Maintenance Window ............................................................................................................12

References ................................................................................................................................................. 16
Version History ......................................................................................................................................... 17

LIST OF TABLES
Table 1) NDO overview for NAS and SAN environments. ..............................................................................................4
Table 2) File access continuity during failover and giveback. .........................................................................................7
Table 3) Data ONTAP support for moving volumes between releases. .......................................................................12

LIST OF FIGURES
Figure 1) Four-node Cluster-Mode diagram. ..................................................................................................................6
Figure 2) Four-node cluster after takeover. ....................................................................................................................6
Figure 3) Four-node cluster after giveback. ....................................................................................................................7
Figure 4) NDU overview. ..............................................................................................................................................11
Figure 5) Four-node cluster prior to rolling NDU. ..........................................................................................................13
Figure 6) Move volumes off nodes 3 and 4. .................................................................................................................14
Figure 7) NDU nodes 3 and 4 creating a mixed-version cluster during rolling upgrade. ...............................................14
Figure 8) Move all volumes from nodes 1 and 2 to volumes 3 and 4. ..........................................................................15
Figure 9) NDU nodes 1 and 2.......................................................................................................................................15
Figure 10) Redistribute volumes across cluster............................................................................................................16

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

1 Introduction

Nondisruptive operation (NDO) is a fundamental capability of NetApp clustered Data ONTAP . However,
there are intricacies of the Server Message Block (SMB) protocol that prevent continuous data
availability in some nondisruptive operations. Such interruptions to data availability are a direct results of
the stateful protocol, and they have the same impact regardless of the underlying storage infrastructure.
This paper focuses on NDO for applications connected to SMB file shares when storage failover (SFO),
aggregate relocate (ARL), logical interface (LIF) migrate, and NetApp DataMotion for Volumes (volume
move) are executed.

2 Scope
The information presented in this paper is specific to NetApp clustered Data ONTAP storage systems.
Some information applies to Data ONTAP 7-Mode storage systems; however, that is not the focus of this
paper. For 7-Mode NDO information, refer to the 7-Mode product documentation or the Nondisruptive
Upgrade Technical FAQ.
This paper refers to the Data ONTAP software upgrade as a nondisruptive upgrade (NDU). Although the
entire process is not completely nondisruptive for applications accessing SMB shares, the overall
upgrade process follows the nondisruptive upgrade procedure. Despite some of the disruptive nuances
associated with the statefulness of the SMB protocol, the procedure is nondisruptive for other protocols.

3 Minimum Requirements
There are limitations with SMB 1.0 that prevent an open session, between an application and file share,
from staying open during a LIF migration. SMB 2.0 introduces durable file handles, which allow an
application to use the existing open file handles after a LIF migration. The minimum requirement for NDO
is SMB 2.0 or later.
SMB 1.0 and 2.0 experience an interruption when a storage failover occurs, meaning that one node of an
HA pair takes over the partner node. For details, see section 6, High-Availability Pair Failover and
Giveback.

In clustered Data ONTAP 8.2, NDO for SMB 3.0 is introduced for Microsoft Hyper-V environments with
Continuously Available (CA) File Shares to support NDO. For information on nondisruptive storage
solutions with SMB 3.0, see sections 5, 6, and 7.
Note:

DataMotion for Volumes is nondisruptive for SMB 1.0 and later.

4 Overview of NDO for CIFS


Nondisruptive operations for NetApp clustered Data ONTAP facilitate continuous data availability to client
and host applications for planned and unplanned events that historically have required downtime.
A planned downtime event happens when the storage system has to undergo a lifecycle operation, such
as performance management or technology refresh, or a maintenance operation, such as a software
upgrade or hardware replacement.
An unplanned downtime event happens when a software or hardware failure causes a disruption that
takes data offline, such as a controller panic.
Unplanned downtime events are generally a small percentage of the overall downtime experienced by IT
departments. However, applications connected to SMB file shares are susceptible to outage during
unplanned events due to the nature of the protocol and the lack of persistent connections when storage
controllers fail over to the redundant partner. Planned downtime events account for a large majority of the

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

downtime that IT departments experience. NDO solutions for planned downtime events for all
environments, including applications connected to SMB file shares, are based on nondisruptive migration
of network and storage resources within a cluster to meet the demands of the use case. The following
utilities deliver solutions that facilitate NDO for continuous data availability:

LIF migrate

DataMotion for Volumes (also referred to as volume move or vol move)

High-availability (HA) pair storage failover (SFO)

Aggregate relocate (ARL)

Table 1 is an overview of NDO utilities and the associated support for NAS and SAN protocols. The
following sections detail the information outlined in the table.
Table 1) NDO overview for NAS and SAN environments.

Nondisruptive
Operation

NAS
SMB 1.0

LIF migrate

Possibly
disruptive

SAN

SMB 2.x and


SMB3.0

(Hyper-V Only)

SMB 3.0

NFS

Nondisruptive*

Nondisruptive

Nondisruptive

N/A

(Refer to section 5)

Volume move

Nondisruptive

Nondisruptive

Nondisruptive

Nondisruptive

Nondisruptive

Aggregate
relocate

Disruptive

Disruptive

Nondisruptive

Nondisruptive

Nondisruptive

Disruptive

Disruptive

Nondisruptive

Nondisruptive

Nondisruptive

(Refer to Section 6)

(Refer to Section 6)

(For controller upgrade)

Storage failover

* Applications connected to SMB 2.x file shares may incur a disruption when the application allows
multiuser access to the same file or for applications that do not pick up an open file handle within 15
minutes from when the LIF migrate was executed.

5 LIF Migrate
A LIF migrate allows an administrator to migrate or fail over a LIF to another network port within the
cluster. LIF migrate balances network traffic across the cluster, moving network resources for optimized
access to storage resources that may have moved for performance or capacity optimization. Additionally,
nondisruptive LIF migrate is used when a node goes down and is failed over to the partner node. LIF
migrate is nondisruptive for applications connected to SMB 2.x and 3.0 file shares. There is a caveat for
multiuser access to the same file that demotes exclusive locks and limits the durability of the open file
handle. If an application does not reconnect to the open file handle within 15 minutes of the LIF migrate,
the file handle is closed out and the application is not able to reconnect nondisruptively. LIF migrate might

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

cause a disruption to applications connected to SMB 1.0 file shares due to the inherent limitations of the
protocol without durable or persistent file handles.

6 High-Availability Pair Failover and Giveback


In Data ONTAP 8.2, SMB 3.0 Continuously Available File Shares that are servicing requests from a
server running Hyper-V have support for nondisruptive SFO for planned and unplanned events. SMB 3.0
with CA File Shares for Hyper-V mirrors the lock state information for the open file on both HA pair
partners. Therefore if either partner has to undergo an SFO, the partner node reopens the session for the
open file when the SFO event has completed. The reconnect completes in a short enough time for the
failover to remain transparent to the virtual machine and its applications. Any in-flight I/O requests are
retried based on the retry algorithm of the protocol.
Storage failover is disruptive for any application accessing SMB 1.0, 2.x, or 3.0 (non-CA) file shares.
When a node that owns a volume mapped to the file share goes down, the lock state information for the
file share is stored in memory on the controller. The controller fails over to its partner node; however, the
partner does not have access to the lock state information of any open files. The application has to
reopen the file. A disruption also occurs when the giveback is issued to the partner of the storage
resources. Information about file access between a client and the storage is not persistent during a
failover or giveback. Access must be reestablished when ownership of the aggregates transfers between
partner nodes.

6.1

Impact on Data Availability for Applications Accessing SMB 1.0, 2.x, and 3.0
(Non-CA) File Shares

The failover of a NetApp storage controller takes advantage of the ability of the HA pair controller to fail
over aggregates to partner nodes when the node experiencing the planned or unplanned event is down.
The session information is not persistent across a failover or the subsequent giveback. The required
information for the session is lost and must be established on the partner node after a failover or giveback
completes.
Note:

An alternative nondisruptive solution is to use volume move to move datasets to another storage
resource on a different node in the cluster.

For an application that is accessing an SMB file share, a disruption occurs if any of the following is true:

The file being accessed resides on a node that has to fail over to its partner node due to a planned or
unplanned event.

The file being accessed resides on a node that has been failed over, and a subsequent giveback
operation is required.

The application is connected to an SMB 1.0 file share, and a LIF is migrated or failed over to another
node within the cluster.

To demonstrate these rules, consider the following example. There are four scenarios to be considered
regarding clients accessing data in the example four-node cluster:

A file is accessed on volume A through LIF 1.


A file is accessed on volume B, C, or D through LIF 1.
A file is accessed on volume A through LIF 2, 3, or 4.
A file is accessed on volume B, C, or D through LIF 2, 3, or 4.

Figure 1 shows four nodes, identified as node 1, node 2, node 3, and node 4. This example illustrates
what happens to data access when node 1 is down due to a planned or unplanned event.

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 1) Four-node clustered Data ONTAP.

When node 1 goes down, data availability is affected in different ways throughout the cluster. Figure 2
illustrates the scenario when node 1 is no longer serving data in the cluster. Aggregates owned by node 1
are failed over to the HA pair partner, node 2. The black arrows indicate direct access.
Figure 2) Four-node cluster after takeover.

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 3) Four-node cluster after giveback.

Table 2) File access continuity during failover and giveback.

During a Failover Is

During a Giveback Is

Disruptive

Disruptive

Volume B, C, or D through LIF 1

Nondisruptive

Nondisruptive

Volume A through LIF 2, 3, or 4

Disruptive

Disruptive

Volume B, C, or D through LIF 2, 3, or 4

Nondisruptive

Nondisruptive

Access to a File on
Volume A through LIF 1

7 Volume Move
Volume move is completely nondisruptive to an application connected to any file share via SMB 1.0, SMB
2.x, or SMB 3.0. Volume move nondisruptively moves a volume to another physical location in the cluster.
The volume may move to an aggregate owned by its partner node or any other node in the cluster without
affecting the availability of the data contained in the volume. For more information on DataMotion for
Volumes, refer to TR-3975: DataMotion for Volumes Overview for Clustered Data ONTAP.

8 Aggregate Relocate
Aggregate relocate is primarily used for controller hardware upgrades. The aggregates are relocated
between the HA pair controllers while each controller is upgraded with a new controller. During the time

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

the upgrade is being executed, all the aggregates owned by the HA pair continue to serve data by the
node that is not being upgraded. For example, consider that node A and node B are an HA pair and will
be upgraded to node C and node D (newer versions of controllers). Node A will be replaced first by node
C; during this time all aggregates owned by node A will be relocated to node B.
For applications accessing SMB 1.0, 2.x and 3.0 (non-CA) file shares, the relocation of the aggregates to
the partner node causes a disruption to the application. Considering the example, node A contains the
lock state information in memory for the open file. When node A is taken down, the lock state is lost and a
new connection must be established. Each time the aggregates move and the node that previously
owned the aggregates is taken down, the lock state is lost for the open file.
For clusters of four nodes or more, clients with open files located on the nodes being relocated are
affected. Any data access by CIFS clients to other nodes in the cluster is unaffected. For example,
consider a four-node cluster: node 1, node 2, node 3, and node 4. Node 1 and node 2 are an HA pair and
will be upgraded to node 5 and node 6. Any files opened by an application in an SMB share on node 3
and node 4 remain nondisrupted throughout the controller upgrade and aggregate relocate process being
executed on node 1 and node 2. Data availability throughout the cluster is similar to when a node incurs
an SFO event. See Figures 1, 2, and 3 in section 6 for details about data availability throughout the
cluster when a node goes down.
For Hyper-V environments, an application connected to an SMB 3.0 Continuously Available File Share is
nondisruptive when aggregate relocate is executed on the storage resources. Specifically, the locks are
mirrored on the partner node. Considering the example, node A contains the active lock information and
node B has a mirrored version of the lock state for the open file. When node A is taken down, node B has
enough open state information about the file or directory for the client to reclaim the open file connection.
Figure 4 shows the high-level transitional states of aggregate relocate on a two-node cluster. The primary
locks and mirrored locks state information is represented in memory for each controller at each of the four
transitional states.

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 4) High-level flow of aggregate relocate showing the transition of the lock state information as
controllers are being upgraded.

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

9 Nondisruptive Upgrade
Nondisruptive upgrade (NDU) is the process of upgrading Data ONTAP software on each of the two
nodes in an HA pair controller configuration without interrupting I/O to connected client machines. The
objective is to enable upgrading and maintenance of any aspect of the storage system without affecting
the systems ability to respond to foreground I/O requests. This means that I/O interruptions are brief
enough so that applications continue to operate without the need for downtime, maintenance, or user
notification.
System NDU is a process that takes advantage of HA pair controller technology to minimize client
disruption during an upgrade of Data ONTAP or controller firmware. System NDU entails a series of
takeover and giveback operations that allow the partner nodes to transfer the data delivery service while
the controllers are upgraded, thereby maintaining continuous data I/O for clients/hosts.
Because the controller for each node in the HA pair configuration is connected to both its own storage
shelves and the storage shelves of its partner node, a single node can provide access to all volumes and
LUNs, even when the partner node is shut down. This procedure allows each node of the HA pair
controllers to be upgraded individually to a newer version of Data ONTAP or firmware, and it also
provides the ability to transparently perform hardware upgrades and maintenance on the HA pair
controller nodes.
Before an NDU, the customer should create an NDU plan. An upgrade plan can be developed by using

Upgrade Advisor, available on the NetApp Support site, as part of My AutoSupport . Upgrade Advisor is
available for all systems reporting on AutoSupport.
Customers who have storage solutions with applications accessing SMB 1.0, 2.x, or 3.0 (non-CA) file
shares are limited by the statefulness of the protocol when doing software upgrades. A limitation exists
that prevents relevant file access information between the client and server (controller) from being
persistent during an HA failover event. The HA failover event is required to reboot the controller with the
new software image. In an HA solution, a failover allows data to be continually served by the HA partner
node while a reboot is done on the node that is undergoing a software upgrade. When the failover
completes, all information about file access from a previous session is lost. Although this is an
unfortunate shortcoming of the SMB protocol, the limitation does exist and is a disruptive occurrence for
any storage appliance.

10

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 5) Overview of NDU.

9.1

NDU with a Maintenance Window

The first solution for upgrading Data ONTAP software uses the standard NDU procedure. This section
discusses the best practices to simplify the procedure for storage environments that service applications
connected to SMB file shares, as well as the expected impact on data availability during the upgrade,
followed by an overview of NDU using HA pair controller failover and giveback functionality.

Planning a Maintenance Window


A customer who is planning to do a traditional NDU procedure using SFO, with applications accessing
SMB file shares on the storage appliance, needs to plan for short outages during the NDU. The most
straightforward and least time-consuming process for upgrading software is to incur a short disruption to
the application when the storage controller fails over to its partner node and then again when a giveback
happens after the upgrade. For minimal impact on the applications, the planned down time, referred to as
a maintenance window, should take place during a period with the least amount of activity.

Terminating CIFS Sessions


Because the SMB protocol is session oriented, sessions must be terminated before the upgrade
procedures to prevent data loss.

11

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Executing the NDU


The upgrade uses HA failover and giveback to upgrade storage controller software. The entire procedure
can be found in the Upgrade and Revert Guide of the Data ONTAP product documentation or by using
Upgrade Advisor on the NetApp Support site as part of My AutoSupport.

9.2

NDU Without a Maintenance Window

For customers who do not have the flexibility to provide a maintenance window while the NDU is
completed, volumes can be moved from the system that will undergo the NDU while the controllers are
upgraded. A volume move is nondisruptive for applications connected to SMB file shares (as well as all
other commonly supported protocols).

Planning the Execution Time


A customer who is planning to do a nondisruptive upgrade for a clustered Data ONTAP system with
applications connected to SMB file shares might consider using volume move along with the standard
NDU procedure to prevent downtime. However, NetApp recommends following the best practices outlined
in TR-3975 and TR-4075 for optimizing volume move. The most important thing to consider is that a
volume move should be executed during a period when the least amount of load is expected on the
system.

Rolling Upgrades and Volume Move


A rolling upgrade introduces a mixed-version cluster state. A volume move has some restrictions
regarding moving a volume between two nodes with different versions of Data ONTAP. Table 3 shows
the supported paths for a volume move when a cluster is in a mixed-version state, for the purpose of a
rolling upgrade.
Table 3) Data ONTAP support for moving volumes between versions.

Volume Move Between Data ONTAP Versions


Supported for Volume Move?
Source Volume

Destination Volume

Data ONTAP 8.0

Data ONTAP 8.1

No

Data ONTAP 8.1

Data ONTAP 8.0

No

Data ONTAP 8.1

Data ONTAP 8.1.x

Yes

Data ONTAP 8.1.x

Data ONTAP 8.1

Yes

Data ONTAP 8.1

Data ONTAP 8.x

Yes

Data ONTAP 8.x

Data ONTAP 8.1

No

There are some best practices to be considered when planning to use a volume move to evacuate
volumes mapped to SMB shares from a node that is executing an NDU.
Table 3 lists scenarios where volume move can be used. For customers who are running Data ONTAP
8.0, all volume moves need to complete prior to a nondisruptive rolling upgrade to Data ONTAP 8.1.

12

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

During a nondisruptive rolling upgrade from Data ONTAP 8.0 to 8.1 (or between any two subsequent
Data ONTAP versions), no active volume move jobs should be in progress. Volume move is not
supported between aggregates owned by nodes running Data ONTAP 8.0 and 8.1. As a best practice,
volume move should not be used until the rolling upgrade is complete. For customers who are running
Data ONTAP 8.1, a volume move can be executed from an aggregate owned by a node running Data
ONTAP 8.1 to an aggregate owned by a node running a later version of Data ONTAP. The reverse
moving a volume from an aggregate on a node running a later version of Data ONTAP to a node running
an earlier versionis not supported. When planning the rolling upgrade, volumes must be moved
according to these restrictions. For example, given a four-node cluster similar to the one in Figure 6,
some nodes might need to be updated before volumes are moved. Consider the scenarios illustrated by
Figures 6 through 10.
Figure 6) Four-node cluster prior to rolling NDU.

13

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 7) Move volumes off nodes 3 and 4.

Figure 8) NDU nodes 3 and 4 create a mixed-version cluster during the rolling upgrade.

14

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 9) Move all volumes from nodes 1 and 2 to volumes 3 and 4.

Figure 10) NDU nodes 1 and 2.

15

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Figure 11) Redistribute volumes across the cluster.

A rolling upgrade is intended to complete in the shortest possible time. Although there is no defined
period of time in which a rolling upgrade must complete, using volume move during a rolling upgrade
extends the time it takes for the rolling upgrade to complete. The time it takes for the volume to be moved
is the product of several variables, but it can be a lengthy process for larger volumes on busy systems.
This time needs to be taken into account when planning the rolling upgrade.
Moving volumes between nodes with different versions of Data ONTAP has restrictions, and the paths
shown in Table 2 should be considered when executing a rolling upgrade.
For nodes that are moving volumes in the cluster, a LIF migration might be necessary to maintain direct
access to a file after a volume has been moved. There might be some variance in performance
throughput for a file that was accessed directly and subsequently received indirect access after a volume
move.

References
The following references were used in this technical report:

TR-3450: High-Availability Pair Controller Configuration Overview and Best Practices


https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=38499&contentID=58472

TR-3975: DataMotion for Volumes Overview


https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=63909&contentID=71149

TR-4075: DataMotion for Volumes Best Practices and Optimization


https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=72418&contentID=75392

16

Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode

NetApp Confidential Limited Use

Version History
Version

Date

Document Version History

Version 1.0

August 2012

Initial release

Version 2.0

April 2013

SMB 3.0 content entry and SMB 2.0 content update

Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product
and feature versions described in this document are supported for your specific environment. The NetApp
IMT defines the product components and versions that can be used to construct configurations that are
supported by NetApp. Specific results depend on each customer's installation in accordance with published
specifications.

NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided herein. The
information in this document is distributed AS IS, and the use of this information or the implementation of
any recommendations or techniques herein is a customers responsibility and depends on the customers
ability to evaluate and integrate them into the customers operational environment. This document and
the information contained herein may be used solely in connection with the NetApp products discussed
in this document.

17

2013 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp,
Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, AutoSupport, DataMotion, and
Data ONTAP are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Microsoft is a
registered trademark and Hyper-V is a trademark of Microsoft Corporation. All other brands or products are trademarks or registered
Nondisruptive Upgrade and CIFS for Systems Operating in Cluster-Mode
NetApp Confidential Limited Use
trademarks of their respective holders and should be treated as such. TR-4100-0812

Das könnte Ihnen auch gefallen