Sie sind auf Seite 1von 148

vSAN Stretched Cluster & 2 Node

Guide
First Published On: 10-19-2016
Last Updated On: 07-12-2018

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Table of Contents

1. Overview
1.1.Introduction
2. Support Statements
2.1.vSphere Versions
2.2.vSphere & vSAN
2.3.Hybrid and All-Flash Support
2.4.On-disk Formats
2.5.vSAN Witness Host
2.6.Supported on vSAN but not vSAN Stretched Clusters
3. New Concepts in vSAN - Stretched Clusters
3.1.vSAN Stretched Clusters vs. Fault Domains
3.2.The Witness Host
3.3.Read Locality in vSAN Stretched Clusters
3.4.Witness Trac Separation (WTS)
3.5.Per Site Policies
4. Requirements
4.1.VMware vCenter Server
4.2.A Witness Host
4.3.Networking and Latency Requirements
5. Conguration Minimums and Maximums
5.1.Virtual Machines Per Host
5.2.Hosts Per Cluster
5.3.Witness Host
5.4.vSAN Storage Policies
5.5.Fault Domains
6. Design Considerations
6.1.Witness Host Sizing
6.2.Cluster Compute Resource Utilization
6.3.Network Design Considerations
6.4.Cong of Network from Data Sites to Witness
6.5.Bandwidth Calculation
6.6.The Role of vSAN Heartbeats
7. Cluster Settings – vSphere HA
7.1.Cluster Settings – vSphere HA
7.2.Turn on vSphere HA
7.3.Host Monitoring
7.4.Admission Control
7.5.Host Hardware Monitoring – VM Component Protection
7.6.Datastore for Heartbeating
7.7.Virtual Machine Response for Host Isolation
7.8.Advanced Options
8. Cluster Settings - DRS
8.1.Cluster Settings - DRS
8.2.Partially Automated or Fully Automated DRS
9. VM/Host Groups & Rules
9.1.VM/Host Groups & Rules
9.2.Host Groups
9.3.VM Groups
9.4.VM/Host Rules
9.5.Per-Site Policy Rule Considerations
10. Installation
10.1.Installation
10.2.Before You Start
10.3.vSAN Health Check Plugin for Stretched Clusters
11. Using a vSAN Witness Appliance
11.1.Using a vSAN Witness Appliance
11.2.Setup Step 1: Deploy the vSAN Witness Appliance
11.3.Setup Step 2: vSAN Witness Appliance Management
11.4.Setup Step 3: Add Witness to vCenter Server

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

11.5.Setup Step 4: Cong vSAN Witness Host Networking


11.6.Setup Step 5: Validate Networking
12. Conguring vSAN Stretched Cluster
12.1.Conguring vSAN Stretched Cluster
12.2.Creating a New vSAN Stretched Cluster
12.3.Converting a Cluster to a Stretched Cluster
12.4.Congure Stretched Cluster Site Anity
12.5.Verifying vSAN Stretched Cluster Component Layouts
12.6.Upgrading a older vSAN Stretched Cluster
13. Management and Maintenance
13.1.Management and Maintenance
13.2.Maintenance Mode Consideration
14. Failure Scenarios
14.1.Failure Scenarios
14.2.Individual Host Failure or Network Isolation
14.3.Individual Drive Failure
14.4.Multiple Simultaneous Failures
14.5.Recovering from a Complete Site Failure
14.6.How Read Locality is Established After Failover
14.7.Replacing a Failed Witness Host
14.8.VM Provisioning When a Site is Down
14.9.Site Failure or Network Partitions
14.10.Ecient inter-site resync for stretched clusters
14.11.Failure Scenario Matrices
15. Appendix A
15.1.Appendix A: Additional Resources
15.2.Location of the vSAN Witness Appliance OVA
16. Appendix B
16.1.Appendix B: Commands for vSAN Stretched Clusters

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

1. Overview
vSAN Stretched Cluster is a specic conguration implemented in environments where disaster/
downtime avoidance is a key requirement.

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

1.1 Introduction

The vSAN Stretched Cluster feature was introduced in vSAN 6.1. A vSAN Stretched Cluster is a specic
conguration implemented in environments where disaster/downtime avoidance is a key requirement.
This guide was developed to provide additional insight and information for installation, conguration
and operation of a vSAN Stretched Cluster infrastructure in conjunction with VMware vSphere. This
guide will explain how vSphere handles specic failure scenarios and discuss various design
considerations and operational procedures for Stretched Clusters using vSAN Releases including 6.5,
6.2, and 6.1

VMware vSAN Stretched Clusters with a Witness Host refers to a deployment where a user sets up a
vSAN cluster with 2 active/active sites with an identical number of ESXi hosts distributed evenly
between the two sites. The sites are connected via a high bandwidth/low latency link.

The third site hosting the vSAN Witness Host is connected to both of the active/active data-sites. This
connectivity can be via low bandwidth/high latency links.

Each site is congured as a vSAN Fault Domain. The nomenclature used to describe a vSAN Stretched
Cluster conguration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of
ESXi hosts at data site B, and Z is the number of witness hosts at site C. Data sites are where virtual
machines are deployed. The minimum supported conguration is 1+1+1(3 nodes). The maximum
conguration is 15+15+1 (31 nodes). In vSAN Stretched Clusters, there is only one witness host in any
conguration.

A virtual machine deployed on a vSAN Stretched Cluster will have one copy of its data on site A, a
second copy of its data on site B and any witness components placed on the witness host in site C.
This conguration is achieved through fault domains alongside hosts and VM groups, and anity
rules. In the event of a complete site failure, there will be a full copy of the virtual machine data as well
as greater than 50% of the components available. This will allow the virtual machine to remain
available on the vSAN datastore. If the virtual machine needs to be restarted on the other site, vSphere
HA will handle this task.

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

2. Support Statements
vSAN Stretched Cluster congurations require vSphere 6.0 Update 1 (U1) or greater.

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

2.1 vSphere Versions

VMware vSAN Stretched Cluster congurations require vSphere 6.0 Update 1 (U1) or greater. This
implies both vCenter Server 6.0 U 1 and ESXi 6.0 U1. This version of vSphere includes vSAN version
6.1. This is the minimum version required for vSAN Stretched Cluster support.

Advanced feature support in Stretched Clusters requires a combination of vSAN version, On-Disk
format version, architecture, and host count.

vSAN 6.1 vSAN 6.2 vSAN 6.5 vSAN 6.6


Feature
Requirements Requirements Requirements Requirements

Stretched v2 On-Disk v2 On-Disk v2 On-Disk v2 On-Disk


Clusters format format format format

v3 On-Disk v3 On-Disk v3 On-Disk


Deduplication & format format format
Compression All-Flash All-Flash All-Flash
architecture architecture architecture

v3 On-Disk v3 On-Disk v3 On-Disk


Checksum
format format format

v3 On-Disk v3 On-Disk v3 On-Disk


IOPS Limits
format format format

v3 On-Disk v3 On-Disk
iSCSI Service
format format

Local Protection v5 On-Disk


- Mirroring format

v5 On-Disk
Local Protection format
- Erasure Coding All-Flash
hardware

Site Anity - v5 On-Disk


Mirroring format

v5 On-Disk
Site Anity - format
Erasure Coding All-Flash
hardware

v5 On-Disk
Encryption
format

The latest On-Disk format is generally recommended per version of vSAN.

The exception to this rule is vSAN 6.6 Clusters that are not using Per-Site Policies or Encryption. On-
Disk format v3 may be used with vSAN 6.6 when these are not used.

2.2 vSphere & vSAN

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

VMware vSAN 6.1 introduced several features including All-Flash and Stretched Cluster functionality.
There are no limitations on the edition of vSphere used for vSAN. However, for vSAN Stretched Cluster
functionality, vSphere Distributed Resource Scheduler (DRS) is very desirable . DRS will provide initial
placement assistance, load balance the environment when there's an imbalance, and will also
automatically migrate virtual machines to their correct site in accordance to VM/Host anity rules. It
can also help with migrating virtual machines to their correct site when a site recovers after a failure.
Otherwise the administrator will have to manually carry out these tasks.

Note: DRS is only available in Enterprise edition and higher of vSphere.

2.3 Hybrid and All-Flash Support

VMware vSAN Stretched Clusters are supported on both Hybrid congurations (hosts with local
storage comprised of both magnetic disks for capacity and ash devices for cache) and All-Flash
congurations (hosts with local storage made up of ash devices for capacity and ash devices for
cache).

2.4 On-disk Formats

VMware supports vSAN Stretched Clusters require a minimum v2 On-Disk format.

The v1 On-Disk format is based on VMFS and is the original On-Disk format used for vSAN.

The v2 On-Disk format is the version which comes by default with vSAN version 6.x. Customers that
upgraded from the original vSAN 5.5 to vSAN 6.0 may not have upgraded the On-dDisk format for v1
to v2, and are thus still using v1.

In vSAN 6.2 clusters, the v3 On-Disk format allows for additional features, such as Erasure Coding,
Checksum, and Deduplication & Compression.

In vSAN 6.6 clusters, the v3 On-Disk may be used with the exception of when Per-Site Policies or
Encryption are used. To use Per-Site Policies or Encryption the v5 On-Disk format is required.

VMware recommends upgrading to the latest On-Disk format for improved performance, scalability,
and feature capabilities.

2.5 vSAN Witness Host

Both physical ESXi hosts and vSAN Witness Appliances (nested ESXi) are supported as a Stretched
Cluster Witness Host.

VMware provides a vSAN Witness Appliance for those customers who do not wish to use a physical
host for this role. The vSAN Witness Appliance must run on an ESXi 5.5* or higher host. This can
include an ESXi Free licensed host, a vSphere licensed (ESXi) host, or a host residing in OVH (formerly
vCloud Air), a vCloud Air Network (VCAN) partner, or any hosted ESXi installation.

Witness host(s) or Appliances cannot be shared between multiple vSAN Stretched Clusters or 2 Node
Clusters.

*When vSAN 6.7, the physical host the vSAN Witness Appliance is running on, must also meet the CPU
requirements of vSphere 6.7. Supported CPU information can be found in the vSphere 6.7 Release
Notes: https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-esxi-vcenter-server-67-release-
notes.html

2.6 Supported on vSAN but not vSAN Stretched Clusters

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The following are limitations on a vSAN Stretched Cluster implementation:

• In a vSAN Stretched Clusters, there are only 3 Fault Domains. These are typically referred to as
the Preferred, Secondary, and Witness Fault Domains. Standard vSAN congurations can be
comprised of up to 32 Fault Domains.
• Pre-vSAN 6.6, the maximum value for Number Of Failures To Tolerate in a vSAN Stretched
Cluster conguration is 1. This is the limit due to the maximum number of Fault Domains being
3.
• In vSAN 6.6, Number Of Failures To Tolerate has been renamed Primary Failures To Tolerate.
Local Protection has been added with Secondary Failures To Tolerate, providing additional data
availability scenarios. More information can be found specic to these rules in the Per-Site
Policies section.

Support statements specic to using vSAN Stretched Cluster implementations:

• SMP-FT, the new Fault Tolerant VM mechanism introduced in vSphere 6.0:


 Is not supported on Stretched Cluster vSAN deployments where the FT primary VM and
secondary VM are not running in the same location.
 Is supported on Stretched Cluster vSAN deployments where the FT primary and
secondary VM are running within the same location. (This can be achieved by creating a
VM/Host rule for that particular VM, and setting PFTT=0 and SFTT=1 with anity to the
same location as the VM/Host rule denition.)
 Is supported when using 2 Node congurations in the same physical location. SMP-FT
requires appropriate vSphere licensing. *The vSAN Witness Appliance managing a 2
Node cluster may not reside on the cluster it is providing quorum for. SMP-FT is not a
feature that removes this restriction.
• The Erasure Coding feature introduced in vSAN 6.2:
 Is not supported because Stretched Cluster Congurations prior to vSAN 6.6 due to only
having 3 Fault Domains.
Erasure Coding requires 4 Fault Domains for RAID5 type protection and 6 Fault Domains
for RAID6 type protection.
 Is supported for Local Protection within a site when using vSAN 6.6 and Per-Site Policies
.
• The vSAN iSCSI Target Service is not supported on vSAN Stretched Clusters.

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

3. New Concepts in vSAN - Stretched


Clusters
A common question is how Stretched Clusters dier from Fault Domains, which is a vSAN feature that
was introduced with vSAN version 6.0.

10

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

3.1 vSAN Stretched Clusters vs. Fault Domains

A common question is how stretched cluster diers from fault domains, which is a vSAN feature that
was introduced with vSAN version 6.0. Fault domains enable what might be termed “rack awareness”
where the components of virtual machines could be distributed across multiple hosts in multiple racks,
and should a rack failure event occur, the virtual machine would continue to be available. However,
these racks would typically be hosted in the same data center, and if there was a data center wide
event, fault domains would not be able to assist with virtual machines availability.

Stretched clusters essentially build on what fault domains did, and now provide what might be termed
“data center awareness”. VMware vSAN Stretched Clusters can now provide availability for virtual
machines even if a data center suers a catastrophic outage.

3.2 The Witness Host

Witness Purpose
The witness host is a dedicated ESXi host, or vSAN Witness Appliance, whose purpose is to host the
witness component of virtual machines objects.

The witness must have connection to both the master vSAN node and the backup vSAN node to join
the cluster. In steady state operations, the master node resides in the “Preferred site”; the backup
node resides in the “Secondary site”. Unless the witness host connects to both the master and the
backup nodes, it will not join the vSAN cluster.

Witness Connectivity
The witness host must be managed by the same vCenter Server managing the vSAN Cluster. There
must be connectivity between vCenter Server and the witness host in the same fashion as vCenter
controlling other vSphere hosts.

The witness host must also have connectivity between the witness host and the vSAN nodes. This is
typically performed through connectivity between the witness host vSAN VMkernel interface and the
vSAN data network.

In vSAN 6.5 a separately tagged VMkernel interface may be used instead of providing connectivity
between the vSAN Witness Host and the vSAN data network. This capability can only be enabled from
the command line, and is only supported when used with 2 Node Direct Connect congurations.

These will be covered more thoroughly in a later section.

Updating the Witness Appliance


The vSAN Witness Appliance can easily be maintained/patched using vSphere Update Manager in the
same fashion as traditional ESXi hosts. It is not required to deploy a new vSAN Witness Appliance
when updating or patching vSAN hosts. Normal upgrade mechanisms are supported on the vSAN
Witness Appliance.

3.3 Read Locality in vSAN Stretched Clusters

In traditional vSAN clusters, a virtual machine’s read operations are distributed across all replica copies
of the data in the cluster. In the case of a policy setting of NumberOfFailuresToTolerate =1, which
results in two copies of the data, 50% of the reads will come from replica1 and 50% will come from
replica2. In the case of a policy setting of NumberOfFailuresToTolerate =2 in non-stretched vSAN

11

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

clusters, results in three copies of the data, 33% of the reads will come from replica1, 33% of the reads
will come from replica2 and 33% will come from replica3.

In a vSAN Stretched Cluster, we wish to avoid increased latency caused by reading across the inter-site
link. To insure that 100% of reads, occur in the site the VM resides on, the read locality mechanism was
introduced. Read locality overrides the NumberOfFailuresToTolerate=1 policy’s behavior to distribute
reads across the components.

DOM, the Distributed Object Manager in vSAN, takes care of this. DOM is responsible for the creation
of virtual machine storage objects in the vSAN cluster. It is also responsible for providing distributed
data access paths to these objects. There is a single DOM owner per object. There are 3 roles within
DOM; Client, Owner and Component Manager. The DOM Owner coordinates access to the object,
including reads, locking and object conguration and reconguration. All objects changes and writes
also go through the owner. The DOM owner of an object will now take into account which fault domain
the owner runs in a vSAN Stretched Cluster conguration, and will read from the replica that is in the
same domain.

There is now another consideration with this read locality. One must avoid unnecessary vMotion of the
virtual machine between sites. Since the read cache blocks are stored on one site, if the VM moves
around freely and ends up on the remote site, the cache will be cold on that site after the move. (Note
that this only applies to hybrid congurations, as all-ash congurations do not have an explicit read
cache.) Now there will be sub-optimal performance until the cache is warm again. To avoid this
situation, soft anity rules are used to keep the VM local to the same site/fault domain where
possible. The steps to congure such rules will be shown in detail in the vSphere DRS section of this
guide.

VMware vSAN 6.2 introduced Client Cache, a mechanism that allocates 0.4% of host memory, up to
1GB, as an additional read cache tier. Virtual machines leverage the Client Cache of the host they are
running on. Client Cache is not associated with Stretched Cluster read locality, and runs
independently.

3.4 Witness Trac Separation (WTS)

By default, when using vSAN Stretched Clusters or 2 Node congurations, the Witness VMkernel
interface tagged for vSAN trac must have connectivity with each vSAN data node's VMkernel
interface tagged with vSAN trac.

In vSAN 6.5, an alternate VMkernel interface can be designated to carry trac destined for the
Witness rather than the vSAN tagged VMkernel interface. This feature allows for more exible network
congurations by allowing for separate networks for node-to-node and node-to-witness trac.

2 Node Direct Connect


This Witness Trac Separation provides the ability to directly connect vSAN data nodes in a 2 Node
conguration. Trac destined for the Witness host can be tagged on an alternative interface from the
directly connected vSAN tagged interface.

12

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

13

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In the illustration above, the conguration is as follows:

• Host 1
 vmk0 - Tagged for Management Trac
 vmk1 - Tagged for Witness Trac - This must* be done using esxcli vsan network ip add
-i vmk1 -T=witness
 vmk2 - Tagged for vSAN Trac
 vmk3 - Tagged for vMotion Trac
• Host 2
 vmk0 - Tagged for Management Trac
 vmk1 - Tagged for Witness Trac - This must* be done using esxcli vsan network ip add
-i vmk1 -T=witness
 vmk2 - Tagged for vSAN Trac
 vmk3 - Tagged for vMotion Trac
• vSAN Witness Appliance
 vmk0 - Tagged for Management Trac***
 vmk1 - Tagged for vSAN Trac****

*Enabling Witness Trac is not available from the vSphere Web Client.
**Any VMkernel port, not used for vSAN Trac, can be used for Witness Trac. In a more simplistic
conguration, the Management VMkernel interface (vmk0) could be tagged for Witness Trac. The
VMkernel port used, will be required to have connectivity to the vSAN Trac tagged interface on the
vSAN Witness Appliance.
***The vmk0 VMkernel Interface, which is used for Management trac may also be used for vSAN
Trac. In this situation, vSAN Trac must be unchecked from vmk1.
****The vmk1 VMkernel interface must not have an address that is on the same subnet as vmk0.
Because vSAN uses the default tcp/ip stack, in cases where vmk0 and vmk1 are on the same subnet,
trac will use vmk0 rather than vmk1. This is detailed in KB 2010877 . Vmk1 should be congured
with an address on a dierent subnet than vmk0.

The ability to connect 2 Nodes directly removes the requirement for a high speed switch. This design
can be signicantly more cost eective when deploying tens or hundreds of 2 Node clusters.

vSAN 2 Node Direct Connect was announced with vSAN 6.5, and is available with vSAN 6.6, 6.5, and
6.2*. *6.2 using vSphere 6.0 Patch 3 or higher without an RPQ.

Stretched Cluster Congurations


Witness Trac Separation is supported on Stretched Cluster congurations as of vSAN 6.7.

3.5 Per Site Policies

Per Site Policies for vSAN 6.6 Stretched Clusters


Prior to vSAN 6.6
Up until vSAN 6.6, protection of objects in a Stretched Cluster conguration was comprised of one
copy of data at each site and a witness component residing on the Witness host.

This conguration provided protection from a single failure in any 1 of the 3 sites, due to each site
being congured as a Fault Domain. Using existing policies, 3 Fault Domains allow for a maximum
number of a single failure.

During normal operation, this was not a signicant issue. In the event of a device or node failure,
additional trac could potentially traverse the inter-site link for operations such as servicing VM reads
and writes, as well as repairing the absent or degraded replica.

14

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Stretched Cluster bandwidth sizing is sized based on the number of writes a cluster requires. Capacity
for resync operations are taken into account with 25% of the available bandwidth allocated. Reads
however are not taken into account normally in sizing.

Impact during when an object is absent or degraded


Availability scenarios dier depending on the type of failure or lack of availability.

If a host goes oine, or a capacity device is unmounted, the components will not be replaced until
either the 60 minute threshold is reached. This is congurable, but VMware recommends not to adjust
this setting. During the timeframe that the object is absent, if the object is present on the alternate site
from the virtual machine, reads from the object will cause additional overhead while traversing the
inter-site link. Resycs will not occur until after 60 minutes. After the 60 minute threshold occurs, reads
and resyncs will traverse the inter-site link until the object is replaced on the site it is absent from.

When a device fails due to a hardware error, data is immediately resynched to repair the object data.
Like an absent object after the 60 minute threshold, a degraded event will cause immediate reads and
resyncs across the inter-site link.

The impact can be insignicant if there are few items that need to be replaced or if the inter-site link is
oversized. The impact can be signicant if there are many items to be replaced or the inter-site link is
already at full utilization.

Also consider that an additional failure until the object is repaired will cause the object(s) to become
inaccessible. This is because up until vSAN 6.6, Stretched Clusters only protect from a single failure.

New Policy Rules in vSAN 6.6


In vSAN 6.6 a few rule changes were introduced. Use of these rules provide additional protection or
exibility for Stretched Cluster scenarios.

• Failures to Tolerate is now called Primary Failures To Tolerate, this is the only rule that received
a name change. It still behaves the same, and in a Stretched Cluster, the only possible values
are 0 or 1.
• Failure Tolerance Method has not changed, but when used in conjunction with another rule, it
change change object placement behavior.

15

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

• Secondary Failures To Tolerate is a new rule that specically changes the local protection
behavior of objects in each site of a vSAN 6.6 Stretched Cluster.
• The nal new rule is Anity. This rule is only applicable when Primary Failures To Tolerate is 0.
When Primary Failures To Tolerate is 0, this rule provides the administrator the ability to choose
which site the vSAN object should reside on, either the Preferred or Secondary Fault Domain.

These new policy rules provide:

• Local Protection for objects on vSAN 6.6 Stretched Clusters


• Site Anity for objects vSAN 6.6 Stretched Clusters when protection across sites is not desired.

Stretched
Behavior Specic to Stretched vSAN 6.6
New Rule Old Rule Cluster Possible
Clusters Requirements
Values

Available for traditional vSAN


Primary or Stretched Cluster vSAN 1 - Enables
Failures Failures to congurations. Specic to Protection
On-Disk Format
To Tolerate Stretched Clusters, this rule Across Sites
v5
Tolerate (FTT) determines whether an object 0 - Protection
(PFTT) is protected on each site or in a Single Site
only on a single site

On-Disk Format
v5
Stretched
Cluster
Proper local and
Secondary Only available for vSAN 6.6
remote host
Failures Stretched Clusters and
count to satisfy 0, 1, 2, 3 - Local
To denes the number of disk or
Failure protection FTT
Tolerate host failures a storage object
Protection
(SFTT) can tolerate.
Method and
Number of
Failures
independently

RAID1
(Mirroring) -
2n+1 hosts for
Performance
Failure Mirroring
Failure RAID5/6
Tolerance (0,1,2,3)
Protection Performance (Erasure
Method 2n+2 hosts for
Method Coding) -
(FTM) Erasure Coding
Capacity*
(1,2)
*All-Flash only

On-Disk Format
v5
Provides a choice of the Preferred Fault
Stretched
Preferred or Secondary Fault Domain
Anity Cluster
Domain for vSAN object Secondary Fault
Primary Failures
placement. Domain
to Tolerate = 0

16

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The only upgrade requirement for vSAN 6.5 customers to use the new rules in vSAN 6.6, is the
requirement to upgrade the On-Disk format from Version 3 to Version 5. Bandwidth requirements do
not change.

Upon upgrade from a vSAN 6.5 Stretched Cluster to a vSAN 6.6 Stretched Cluster, an existing
Stretched Cluster policy of FTT=1 with FTM=Mirroring will become a PFTT=1, FTM=Mirroring.

Secondary Failures To Failure Tolerance Hosts Recommended


Hosts Required Per Site
Tolerate Method Per Site

0 1

1 Mirroring 3 4
(Hybrid or All-
2 Flash architecture) 5 6

3 7 8

1 Erasure Coding 4 5
(requires All-Flash
2 architecture) 6 7

Data access behavior using the new Policy Rules


vSAN Stretched Clusters have traditionally written a copy of data to each site using a Mirroring Failure
Tolerance Method. These were full writes to each site, with reads being handled locally using the Site
Anity feature. Write operations are dependent on VM Storage Policy rules in a vSAN 6.6 Stretched
Cluster.

Primary Failures To Tolerate behavior

• When a Primary Failures to Tolerate rule is equal to 1, writes will continue to be written in a
mirrored fashion across sites.
• When a Primary Failures to Tolerate rule is equal to 0, writes will only occur in the site that is
specied in the Anity rule.
• Reads continue to occur from the site a VM resides on.

Secondary Failures To Tolerate behavior

• When a Secondary Failures to Tolerate rule is in place, the behavior within a site adheres to the
Failure Tolerance Method rule.
• As illustrated in the above table, the number of failures to tolerate, combined with the Failure
Tolerance Method, determine how many hosts are required per site to satisfy the rule
requirements.
• Writes and reads occur within each site in the same fashion as they would in a traditional vSAN
cluster, but per site.
• Only when data cannot be repaired locally, such as cases where the only present copies of data
reside on the alternate site, will data be fetched from the alternate site.

Anity

• The Anity rule is only used to specify which site a vSAN object, either Preferred or Secondary,
will reside on.
• It is only honored when a Primary Failures To Tolerate rule is set to 0.
• VMware recommends that virtual machines are run on the same site that their vSAN objects
reside on.
 Because the Anity rule is a Storage Policy rule, it only pertains to vSAN objects and not
virtual machine placement.

17

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 This is because read and write operations will be required to traverse the inter-site link
when the virtual machine and vSAN objects do not reside in the same site.

vSAN Stretched Cluster Capacity Sizing when using Per-Site Policy


Rules
Prior to Per-Site policy rules, vSAN Stretched Cluster capacity sizing was primarily based on the
Mirroring Failure Tolerance Method, assuming a FTT=1.

This is because only a single copy of data resided in each site.

With Per-Site Policy Rules, capacity requirements can change entirely based on Policy Rule
requirements.

The following table illustrates some capacity sizing scenarios based on a default site policy with a
vmdk requiring 100GB. For single site scenarios, assuming Preferred Site.

Capacity Capacity
Required Required
vSAN FTT/ Capacity
Protection FTM SFTT in in
Version PFTT Requirement
Preferred Secondary
Site Site

Pre-
Across
vSAN 1 Mirroring NA 100GB 100GB 2x
Sites Only
6.6

vSAN Across
1 Mirroring 0 100GB 100GB 2x
6.6 Sites Only

Across
Sites with
Local
Mirroring 1 Mirroring 1 200GB 200GB 4x
(RAID1
Single
Failure)

Across
Sites with
Local
Mirroring 1 Mirroring 2 300GB 300GB 6x
(RAID1
Double
Failure)

Across
Sites with
Local
Mirroring 1 Mirroring 3 400GB 400GB 8x
(RAID1
Triple
Failure)

18

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Capacity Capacity
Required Required
vSAN FTT/ Capacity
Protection FTM SFTT in in
Version PFTT Requirement
Preferred Secondary
Site Site

Across
Sites with
Local
Erasure Erasure
1 1 133GB 133GB 2.66x
Coding Coding
(RAID5/
Single
Failure)

Across
Sites with
Local
Erasure Erasure
1 2 150GB 150GB 3x
Coding Coding
(RAID6/
Double
Failure)

Single Site
with
Mirroring
0 Mirroring 1 200GB 0 2x
(RAID1
Single
Failure)

Single Site
with
Mirroring
0 Mirroring 2 300GB 0 3x
(RAID1
Double
Failure)

Single Site
with
Mirroring
0 Mirroring 3 400GB 0 4x
(RAID1
Triple
Failure)

Single Site
with
Erasure
Erasure
Coding 0 1 133GB 0 1.3x
Coding
(RAID5/
Single
Failure)

19

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Capacity Capacity
Required Required
vSAN FTT/ Capacity
Protection FTM SFTT in in
Version PFTT Requirement
Preferred Secondary
Site Site

Single Site
with
Erasure
Erasure
Coding 0 2 150GB 0 1.5x
Coding
(RAID6/
Double
Failure)

vSAN Stretched Cluster Witness Bandwidth considerations when


using Per-Site Policy Rules
As Per-Site Policy Rules add local protection, object are distributed into even more components.
Because the bandwidth requirements to the Witness Host are based on the number of components,
using these policy rules will increase the overall component count.

The following is an example of the impact of changing a Storage Policy to include local protection in a
Stretched Cluster scenario, with a virtual machine with a single vmdk that is smaller than 255GB

• Using Pre-vSAN 6.6 Policy Rules


 Would consume 9 components
 3 Components for the vmdk (1 in the Preferred Site, 1 in the Secondary Site, 1 on
the Witness Host)
*Up to a vmdk size of 255GB

 3 Components for the VM Home space (1 in the Preferred Site, 1 in the


Secondary Site, 1 on the Witness Host)

20

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 3 Components for the Virtual Swap le (1 in the Preferred Site, 1 in the
Secondary Site, 1 on the Witness Host)

• Using vSAN 6.6 Policy Rules with Protection across sites (PFTT=1) and Mirroring (RAID1 Single
Failure) within Sites
 Would consume 17 components
 7 Components for the vmdk (3 in the Preferred Site, 3 in the Secondary Site, 1 on
the Witness Host)
*Up to a vmdk size of 255GB

 7 Components for the VM Home space (3 in the Preferred Site, 3 in the


Secondary Site, 1 on the Witness Host)

21

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 3 Components for the Virtual Swap le (1 in the Preferred Site, 1 in the
Secondary Site, 1 on the Witness Host)

• Using vSAN 6.6 Policy Rules with Protection across sites (PFTT=1) and Erasure Coding (RAID5
Single Failure) within Sites
 Would consume 21 components
 9 Components for the vmdk (4 in the Preferred Site, 4 in the Secondary Site, 1 on
the Witness Host)
*Up to a vmdk size of 255GB

22

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 9 Components for the VM Home space (4 in the Preferred Site, 4 in the


Secondary Site, 1 on the Witness Host)

 3 Components for the Virtual Swap le (1 in the Preferred Site, 1 in the
Secondary Site, 1 on the Witness Host)

• Using vSAN 6.6 Policy Rules with Protection in a single site (PFTT=0) and Mirroring (RAID1
Single Failure) - Preferred Fault Domain shown below
 Would consume 9 components
 3 Components for the vmdk (3 in the Preferred Site, 0 in the Secondary Site, 0 on
the Witness Host)
*Up to a vmdk size of 255GB

 3 Components for the VM Home space (3 in the Preferred Site, 0 in the


Secondary Site, 0 on the Witness Host)

23

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 3 Components for the Virtual Swap le (1 in the Preferred Site, 1 in the
Secondary Site, 1 on the Witness Host)

• Using vSAN 6.6 Policy Rules with Protection in a single site (PFTT=0) and Erasure Coding
(RAID5 Single Failure) - Preferred Fault Domain shown below
 Would consume 11 components
 4 Components for the vmdk (4 in the Preferred Site, 0 in the Secondary Site, 0 on
the Witness Host)
*Up to a vmdk size of 255GB

 4 Components for the VM Home space (4 in the Preferred Site, 0 in the


Secondary Site, 0 on the Witness Host)

24

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 3 Components for the Virtual Swap le (1 in the Preferred Site, 1 in the
Secondary Site, 1 on the Witness Host)

**Notice in each case that the VM SWAP object retains the PFTT=1 and FTM=Mirroring storage policy.
This is because the VM SWAP object has a hardcoded policy.

Stretched Cluster congurations can accommodate a maximum of 45,000 components. Implementing


the Local Protection Per-Site Policy Rules can increase the overall component count signicantly. The
example VM conguration used above could allow for up to 5,000 VM's on a Stretched Cluster
conguration. That's 9 components X 5,000 VM's = 45,000 components (max). By assigning local
protection with Mirroring to the same VM conguration, would allow for almost 2,600 VM's of the
same conguration, only by adding local Mirrored protection. Choosing Erasure Coding rather than
Mirroring for local protection would reduce the number of identical VMs to about 2,100.

Not every environment is going to be uniform like these calculations might indicate. A vmdk that is
larger than 255GB is going to require at least one component for every 255GB chunk. Specifying a
Policy Rule of Stripe Width, or possibly breaking a component into smaller chunks after a rebalance is
going to increase the component count as well.

The witness bandwidth requirement is 2Mbps for every 1000 components. Using this formula, some
additional

• 200 virtual machines with 500GB vmdks (12 components each) using Pre-vSAN 6.6 policies
would require 4.8Mbps of bandwidth to the Witness host
 3 for swap, 3 for VM home space, 6 for vmdks = 12
 12 components X 200 VMs = 2,400 components
 2Mbps for every 1000 is 2.4 X 2Mbps = 4.8Mbps
• The same 200 virtual machines with 500GB vmdks using vSAN 6.6 Policy Rules for Cross Site
protection with local Mirroring would require
 3 for swap, 7 for VM home space, 14 for vmdks = 24
 24 components X 200 VMs = 4,800 components

25

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 2Mbps for every 1000 is 4.8 X 2Mbps = 9.6Mbps


• The same 200 virtual machines with 500GB vmdks using vSAN 6.6 Policy Rules for Cross Site
protection with local Erasure Coding would require
 3 for swap, 9 for VM home space, 18 for vmdks = 30
 30 components X 200 VMs = 6,000 components
 2Mbps for every 1000 is 6 X 2Mbps = 12Mbps

These examples show that by adding local protection, component counts increase, as well as witness
bandwidth requirements.

vSAN 6.6 Per-Site Policy Rules Summary


With the introduction of Per-Site Policy Rules, vSAN 6.6 adds two important capabilities to vSAN
Stretched Clusters.

1. Local Protection
2. Site Anity

As these features provide additional protection and data availability, it is important to consider
capacity and bandwidth sizing scenarios.

26

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

4. Requirements
List of requirements for implementing vSAN Stretched Cluster.

27

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

4.1 VMware vCenter Server

A vSAN Stretched Cluster conguration can be created and managed by a single instance of VMware
vCenter Server. Both the Windows version and the vCenter Server Appliance are supported for
conguration and management of a vSAN Stretched Cluster.

4.2 A Witness Host

In a vSAN Stretched Cluster, the witness components are only ever placed on the Witness host. Either
a physical ESXi host, or a special vSAN Witness Appliance provided by VMware, can be used as the
witness host.

If a vSAN Witness Appliance is used for the Witness host, it will not consume any of the customer’s
vSphere licenses. A physical ESXi host that is used as a witness host will need to be licensed
accordingly, as this can still be used to provision virtual machines should a customer choose to do so.

It is important that the witness host is not added to the vSAN cluster. The witness host is selected
during the creation of a vSAN Stretched Cluster.

The witness appliance will have a unique identier in the vSphere web client UI to assist with
identifying that a host is in fact a witness appliance (ESXi in a VM). It is shown as a “blue” host, as
highlighted below:

Note: This is only visible when the appliance ESXi witness is deployed. If a physical host is used as the
witness, then it does not change its appearance in the web client. A dedicated witness host is required
for each Stretched Cluster.

4.3 Networking and Latency Requirements

When vSAN is deployed in a Stretched Cluster across multiple sites using Fault Domains, there are
certain networking requirements that must be adhered to.

Layer 2 and Layer 3 Support


Both Layer 2 (same subnet) and Layer 3 (routed) congurations are used in a recommended vSAN
Stretched Cluster deployment.

• VMware recommends that vSAN communication between the data sites be over stretched L2.
• VMware recommends that vSAN communication between the data sites and the witness site is
routed over L3.

Note: A common question is whether L2 for vSAN trac across all sites is supported. There are some
considerations with the use of a stretched L2 domain between the data sites and the witness site, and
these are discussed in further detail in the design considerations section of this guide. Another
common question is whether L3 for vSAN trac across all sites is supported. While this can work, it is
not the VMware recommended network topology for vSAN Stretched Clusters at this time.

28

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Note: It is a common question if NAT is supported for the Witness Host (physical or appliance), NAT is
explicitly not supported at this time.

vSAN trac between data sites is multicast . Witness trac between a data site and the witness site is
unicast . This changes to only unicast when all hosts in a vSAN cluster have been upgraded to vSAN
6.6 .

Supported Geographical Distances


For VMware vSAN Stretched Clusters, geographical distances are not a support concern. The key
requirement is the actual latency numbers between sites.

Data Site to Data Site Network Latency


Data site to data site network refers to the communication between non-witness sites, in other words,
sites that run virtual machines and hold virtual machine data. Latency or RTT (Round Trip Time)
between sites hosting virtual machine objects should not be greater than 5msec (< 2.5msec one-way).

Data Site to Data Site Bandwidth


Bandwidth between sites hosting virtual machine objects will be workload dependent. For most
workloads, VMware recommends a minimum of 10Gbps or greater bandwidth between sites. In use
cases such as 2 Node congurations for Remote Oce/Branch Oce deployments, dedicated 1Gbps
bandwidth can be sucient with less than 10 Virtual Machines.

Please refer to the Design Considerations section of this guide for further details on how to determine
bandwidth requirements.

Data Site to Witness Network Latency


This refers to the communication between non-witness sites and the witness site.

In most vSAN Stretched luster congurations, latency or RTT (Round Trip Time) between sites hosting
VM objects and the witness nodes should not be greater than 200msec (100msec one-way).

In typical 2 Node congurations, such as Remote Oce/Branch Oce deployments, this latency or
RTT is supported up to 500msec (250msec one-way).

The latency to the witness is dependent on the number of objects in the cluster. VMware recommends
that on vSAN Stretched Cluster congurations up to 10+10+1, a latency of less than or equal to 200
milliseconds is acceptable, although if possible, a latency of less than or equal to 100 milliseconds is
preferred. For congurations that are greater than 10+10+1, VMware requires a latency of less than or
equal to 100 milliseconds.

Data Site to Witness Network Bandwidth


Bandwidth between sites hosting VM objects and the witness nodes are dependent on the number of
objects residing on vSAN. It is important to size data site to witness bandwidth appropriately for both
availability and growth. A standard rule of thumb is 2Mbps for every 1000 components on vSAN.

Please refer to the Design Considerations section of this guide for further details on how to determine
bandwidth requirements.

Inter-Site MTU Consistency


Knowledge Base Article 2141733 details a situation where data nodes have an MTU of 9000 (Jumbo
Frames) and the vSAN Witness Host has an MTU of 1500. The vSAN Health Check looks for a uniform
MTU size across all VMkernel interfaces that are tagged for trac related to vSAN, and reports any
inconsistencies. It is important to maintain a consistent MTU size across all vSAN VMkernel interfaces

29

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

on data nodes and the vSAN Witness Host in a vSAN Stretched Cluster to prevent trac
fragmentation.

As KB 2141733 indicates, the corrective actions are either to reduce the MTU from 9000 on the data
node VMkernel interfaces or increase the MTU value on the vSAN Witness Host's VMkernel interface
that is tagged for vSAN Trac. Either of these are acceptable corrective actions.

The placement of the vSAN Witness Host will likely be the deciding factor in which conguration will
be used. Network capability, control, and cost to/from the vSAN Witness Host as well as overall
performance characteristics on data nodes are items to consider when making this design decision.

In situations where the vSAN Witness Host VMkernel interface tagged for vSAN trac is not
congured to use Jumbo Frames (or cannot due to underlying infrastructure), VMware recommends
that all vSAN VMkernel interfaces use the default MTU of 1500.

As a reminder, there is no requirement to use Jumbo Frames with VMkernel interfaces used for vSAN.

30

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

5. Conguration Minimums and


Maximums
The maximum number of virtual machines per ESXi host is unaected by the vSAN Stretched Cluster
conguration.

31

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

5.1 Virtual Machines Per Host

The maximum number of virtual machines per ESXi host is unaected by the vSAN Stretched Cluster
conguration. The maximum is the same as normal vSAN deployments.

VMware recommends that customers should run their hosts at 50% of maximum number of virtual
machines supported in a standard vSAN cluster to accommodate a full site failure.

In the event of full site failures, the virtual machines on the failed site can be restarted on the hosts in
the surviving site.

5.2 Hosts Per Cluster

The minimum number of hosts in a vSAN Stretched Cluster is 3. In such a conguration, Site 1 will
contain a single ESXi host, Site 2 will contain a single ESXi host and then there is a Witness host at the
third site, the witness site. The nomenclature for such a conguration is 1+1+1. This is commonly
referred to as a 2 Node conguration.

The maximum number of hosts in a vSAN Stretched Cluster is 31. Site 1 contains 15 ESXi hosts, Site 2
contains 15 ESXi hosts, and the Witness host on the third site makes 31. This is referred to as a 15+15
+1 conguration.

5.3 Witness Host

There is a maximum of 1 Witness host per vSAN Stretched Cluster. The Witness host requirements are
discussed in the Design Considerations section of this guide. VMware provides a fully supported vSAN
Witness Appliance, in Open Virtual Appliance (OVA) format. This is for customers who do not wish to
dedicate a physical ESXi host as the witness. This OVA is essentially a pre-licensed ESXi host running in
a virtual machine, and can be deployed on a physical ESXi host at the third site.

5.4 vSAN Storage Policies

Number of Failures To Tolerate (FTT) - Pre-vSAN 6.6


Primary Number of Failures To Tolerate (PFTT) - vSAN 6.6
The FTT/PFTT policy setting, has a maximum of 1 for objects. In Pre-vSAN 6.6 Stretched Clusters FTT
may not be greater than 1. In vSAN 6.6 Stretched Clusters PFTT may not be greater than 1. This is
because Stretched Clusters are comprised of 3 Fault Domains.

Secondary Number of Failures To Tolerate (SFTT) - vSAN 6.6


When used, the SFTT rule determines the Failure Tolerance Method for local protection in a Stretched
Cluster.

Failure Tolerance Method (FTM)


Failure Tolerance Method rules provide object protection with RAID-1 (Mirroring) for Performance and
RAID-5/6 (Erasure Coding) for Capacity.

In Pre-vSAN 6.6 Stretched Clusters, Mirroring is the only FTM rule that can be satised, due to the 3
fault domains present. Also, when Primary Failures to Tolerate is 1 in a vSAN 6.6 Stretched Cluster,
data is Mirrored across both sites and is placed based on the Secondary Failure to Tolerate policy
along with FTM.

32

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In vSAN 6.6 Stretched Clusters, Erasure Coding can be implemented using local protection, provided
the host count and capacity are available. All-Flash vSAN is a requirement for supporting Erasure
Coding.

Anity
Anity rules are used when the PFTT rule value is 0. This rule has 2 values, Preferred or Secondary.
This determines which site an Anity based vmdk would reside on.

Other Policy Rules


Other policy settings are not impacted by deploying vSAN in a Stretched Cluster conguration and
can be used as per a non-stretched vSAN cluster.

Additional vSAN 6.6 Policy Rule Changes


Can be found in the Per-Site Policies section

https://storagehub.vmware.com/#!/vmware-vsan/vsan-stretched-cluster-2-node-guide/per-site-
policies/1

5.5 Fault Domains

Fault domains play an important role in vSAN Stretched Clusters. Similar to the Number Of Failures To
Tolerate (FTT) policy setting discussed previously, the maximum number of fault domains in a vSAN
Stretched Cluster is 3. The rst fault domain is the “Preferred” data site, the second fault domain is the
“Secondary” data site and the third fault domain is the Witness host site.

33

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

6. Design Considerations
The witness host must be capable of running the same version of ESXi as vSAN data nodes.

34

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

6.1 Witness Host Sizing

The vSAN Witness host can be either a traditional physical ESXi host or the provided and packaged
vSAN Witness Appliance (OVA). The purpose of the Witness host is to store witness components for
virtual machine objects.

vSAN Witness Appliance (Virtual Machine)


Deploying the vSAN Witness Appliance that is provided by VMware is the recommended deployment
choice for a vSAN Witness Host. When choosing this deployment option, there are some requirements
to consider.

Licensing
A license is hard coded in the vSAN Witness Appliance and is provided for free from VMware.

vSAN Witness Appliance Version


A vSAN Witness Appliance is provided with each release of vSAN. The underlying vSphere version is
the same as the version running vSAN. Upon initial deployment of the vSAN Witness Appliance, it is
required to be the same as the version of vSAN.

Example: A new vSAN 6.5 deployment requires a 6.5 version of the vSAN Witness Appliance.

When upgrading the vSAN Cluster, upgrade the vSAN Witness Appliance in the same fashion as
upgrading vSphere. This keeps the versions aligned

Example: Upgrade vSAN 6.5 hosts to 6.6 using VMware Update Manager. Upgrade vSAN Witness
Appliance (6.5 to 6.6) using VMware Update Manager.

vSAN Witness Appliance Size


When using a vSAN Witness Appliance, the size is dependent on the congurations and this is decided
during the deployment process. vSAN Witness Appliance deployment options are hard coded upon
deployment and there is typically no need to modify these.

Compute Requirements
The vSAN Witness Appliance, regardless of conguration, uses at least two vCPUs.

Memory Requirements
Memory requirements are dependent on the number of components.

Storage Requirements
Cache Device Size: Each vSAN Witness Appliance deployment option has a cache device size of 10GB.
This is sucient for each for the maximum of 45,000 components. In a typical vSAN deployment, the
cache device must be a Flash/SSD device. Because the vSAN Witness Appliance has virtual disks, the
10GB cache device is congured as a virtual SSD. There is no requirement for this device to reside on a
physical ash/SSD device. Traditional spinning drives are sucient.

Capacity Device Sizing: First consider that a capacity device can support up to 21,000 components.
Also consider that a vSAN Stretched Cluster can support a maximum of 45,000 components. Each
Witness Component is 16MB, as a result, the largest capacity device that can be used for storing of
Witness Components is approaching 350GB.

35

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

vSAN Witness Appliance Deployment Sizes & Requirements


Summary
• Tiny - Supports up to 10 VMs/750 Witness Components
 Compute - 2 vCPUs
 Memory - 8GB vRAM
 ESXi Boot Disk - 12GB Virtual HDD
 Cache Device - 10GB Virtual SSD
 Capacity Device - 15GB Virtual HDD

• Normal - Supports up to 500 VMs/21,000 Witness Components


 Compute - 2 vCPUs
 Memory - 16GB vRAM
 ESXi Boot Disk - 12GB Virtual HDD
 Cache Device - 10GB Virtual SSD
 Capacity Device - 350GB Virtual HDD

• Large - Supports over 500 VMs/45,000 Witness Components


 Compute: 2 vCPUs
 Memory - 32 GB vRAM
 ESXi Boot Disk - 12GB Virtual HDD
 Cache Device - 10GB Virtual SSD
 Capacity Devices - 3x350GB Virtual HDD
8GB ESXi Boot Disk*, one 10GB SSD, three 350GB HDDs
Supports a maximum of 45,000 witness components

Where can the vSAN Witness Appliance run?


The vSAN Witness Appliance must run on an ESXi 5.5 or greater host. Several scenarios are supported
ocially:

It can be run in any of the following infrastructure congurations (provided appropriate networking is
in place):

• On a vSphere environment backed with any supported storage (vmfs datastore, NFS datastore,
vSAN Cluster)
• On vCloud Air/OVH backed by supported storage
• Any vCloud Air Network partner hosted solution
• On a vSphere Hypervisor (free) installation using any supported storage (vmfs datastore or NFS
datastore)

Support Statements specic to placement of the vSAN Witness Appliance on a vSAN cluster:

• The vSAN Witness Appliance is supported running on top of another non-Stretched vSAN
cluster.
• The vSAN Witness Appliance is supported on a Stretched Cluster vSAN for another vSAN
Stretched Cluster, and vice-versa.
• vSAN 2-node cluster hosting witness for another vSAN 2-node cluster witness, and vice versa,
is not recommended and requires an RPQ.

Physical Host as a vSAN Witness Host


If using a physical host as the vSAN Witness Host there are some requirements to consider.

Licensing

36

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If using a physical host as a vSAN Witness Host, it must be licensed with a valid vSphere license. This
does not require the same licensed edition as the vSAN Cluster it is supporting.

vSphere Build
If using a physical host as a vSAN Witness Host, it must be running the same build of vSphere as the
Stretched Cluster or 2 Node Cluster that it is participating with.

Compute and Memory Requirements


The minimum specications required for ESXi meet the minimum requirements for use as a vSAN
Witness Host. Minimum requirements for vSphere are dependent on the build of vSphere, and can be
found in the documentation section for each edition in VMware Documentation: https://
www.vmware.com/support/pubs/

Storage Requirements
Storage requirements do not change for a physical host being used as a vSAN Witness Host in
comparison to the vSAN Witness Appliance. An ESXi boot device, a cache device, and one or more
capacity devices are still required.

Required

• 1st device - vSphere Boot Device - Normal vSphere Requirements


• 2nd device - vSAN Cache Device - No requirement for Flash/SSD, but it must be tagged as
Flash/SSD in ESXi to be used as though it were a Flash/SSD device. This must be at least
10GB in size.
• 3rd device - Can be up to 350GB and will support metadata for up to 21,000 components
on a vSAN Cluster

Optional

• 4th device - Can be up to 350GB and will support metadata for up to 21,000 components
on a vSAN Cluster
• 5th device - Can be up to 350GB and will support metadata for up to 21,000 components
on a vSAN Cluster.

Other workloads
If using a physical host as a vSAN Witness Host, it may run other workloads. Because the physical
vSAN Witness Host is external to the vSAN Cluster it is contributing to, those workloads will not be
part of the vSAN Cluster. The vSAN Disk Group, and the disks it includes, may not be used for those
workloads.

*Important consideration: Multiple vSAN Witness Appliances can run on a single physical host. Using
vSAN Witness Appliances is typically more cost eective than dedicating physical hosts for the
purpose of meeting the vSAN Witness Host need.

6.2 Cluster Compute Resource Utilization

For full availability, VMware recommends that customers should be running at 50% of resource
consumption across the vSAN Stretched Cluster. In the event of a complete site failure, all of the
virtual machines could be run on the surviving site.

VMware understands that some customers will want to run levels of resource utilization higher than
50%. While it is possible to run at higher utilization in each site, customers should understand that in
the event of failure, not all virtual machines will be restarted on the surviving site.

37

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

With the introduction of Per-Site Policies in vSAN 6.6, capacity requirements are dependent on the
policies used.

Capacity Capacity
Required Required
vSAN FTT/ Capacity
Protection FTM SFTT in in
Version PFTT Requirement
Preferred Secondary
Site Site

Pre-
Across
vSAN 1 Mirroring NA 100% 100% 200%
Sites Only
6.6

vSAN Across
1 Mirroring 0 100% 100% 200%
6.6 Sites Only

Across
Sites with
Local
Mirroring 1 Mirroring 1 200% 200% 400%
(RAID1
Single
Failure)

Across
Sites with
Local
Mirroring 1 Mirroring 2 300% 300% 600%
(RAID1
Double
Failure)

Across
Sites with
Local
Mirroring 1 Mirroring 3 400% 400% 800%
(RAID1
Triple
Failure)

Across
Sites with
Local
Erasure Erasure
1 1 133% 133% 266%
Coding Coding
(RAID5/
Single
Failure)

38

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Capacity Capacity
Required Required
vSAN FTT/ Capacity
Protection FTM SFTT in in
Version PFTT Requirement
Preferred Secondary
Site Site

Across
Sites with
Local
Erasure Erasure
1 2 150% 150% 300%
Coding Coding
(RAID6/
Double
Failure)

Single Site
with
Mirroring
0 Mirroring 1 200% 0 200%
(RAID1
Single
Failure)

Single Site
with
Mirroring
0 Mirroring 2 300% 0 300%
(RAID1
Double
Failure)

Single Site
with
Mirroring
0 Mirroring 3 400% 0 400%
(RAID1
Triple
Failure)

Single Site
with
Erasure
Erasure
Coding 0 1 133% 0 133%
Coding
(RAID5/
Single
Failure)

Single Site
with
Erasure
Erasure
Coding 0 2 150% 0 150%
Coding
(RAID6/
Single
Failure)

6.3 Network Design Considerations

39

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Stretched Cluster Network Design Considerations


Sites
A vSAN Stretched Cluster requires three sites.
Data Sites - Contains vSAN Data nodes

• Preferred site - Specied to be the primary owner of vSAN objects. This is an important
designation specically in cases of connectivity disruptions.
• Secondary site

Witness Site - Contains vSAN Witness host

• Maintains Witness Component data from Preferred/Secondary sites when applicable


*When using "Site Anity" Witness Components will not reside in the Witness site

When using vSAN Stretched Clusters in a single datacenter, dierent rooms, or dierent racks could
be considered separate sites. When using vSAN 2 Node, both vSAN data nodes are typically in the
same physical location.

Connectivity and Network Types

Preferred Site Secondary Site Witness Site

Connectivity to Connectivity to
vCenter & other vCenter & other
Management Connectivity to vCenter*
vSAN Hosts vSAN Hosts
Network Can be Layer 2 or Layer 3
Can be Layer 2 or Can be Layer 2 or
Layer 3 Layer 3

Recommend Layer 2. Recommend Layer 2. No requirement for VM


In the event of site In the event of site Network.
failure VMs failure VMs Running VMs on the vSAN
VM Network restarting on the restarting on the Witness Appliance is not
Secondary site do Secondary site do supported.
not require IP not require IP Running VMs on a Physical
address changes address changes Witness Host is supported.

If vMotion is desired If vMotion is desired


between Data Sites, between Data Sites,
Layer 2 or Layer 3 Layer 2 or Layer 3
There is no requirement for
vMotion are supported are supported
vMotion networking to the
Network vMotion is not vMotion is not
Witness site.
required between required between
this Data site & the this Data site & the
Witness Site Witness Site

40

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

To the Secondary
Site: To the Preferred Site:
Layer 2 or Layer 3 Layer 2 or Layer 3
Recommend L2 for Recommend L2 for
Multicast (6.5 & Multicast (6.5 &
below) below)
To the Preferred Site: Layer 3
Layer 2 or Layer 3 Layer 2 or Layer 3 for
To the Secondary Site: Layer 3
vSAN for Unicast (6.6 & Unicast (6.6 & above)
Network above)
Connectivity to each site must
To the Witness Site:
be independent.
To the Witness Site: Layer 3
Layer 3
Connectivity to the
Connectivity to the other sites must be
other sites must be independent.
independent.

Port Requirements
VMware vSAN requires these ports to be open, both inbound and outbound:

Connectivity
Port Protocol
To/From

vSAN
12345,
Clustering UDP vSAN Hosts
23451
Service

vSAN
2233 TCP vSAN Hosts
Transport

vSAN VASA
vSAN Hosts
Vendor 8080 TCP
and vCenter
Provider

vSAN
vSAN Hosts
Unicast
and vSAN
Agent (to 12321 UDP
Witness
Witness
Appliance
Host)

TCPIP Stacks, Gateways, and Routing


• TCPIP Stacks
At this time, the vSAN trac does not have its own dedicated TCPIP stack. Custom TCPIP
stacks are also not applicable for vSAN trac.

• Default Gateway on ESXi Hosts


ESXi hosts come with a default TCPIP stack. As a result, hosts have a single

41

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

default gateway. This default gateway is associated with the Management


VMkernel interface (typically vmk0). It is a best practice to implement storage
networking, in this case vSAN networking, on an alternate VMkernel interface,
with alternate addressing.

vSAN networking uses the same TCPIP stack as the Management VMkernel
interface, trac defaults to using the same default gateway as the Management
VMkernel interface. With the vSAN network isolated from the Management
VMkernel interface, it is not possible to use the default gateway. Because of this,
vSAN Data Nodes cannot communicate with the Witness Host by default.

One solution to this issue is to use static routes. This allows an administrator to
dene a new routing entry indicating which path should be followed to reach a
particular network. In the case of the vSAN network on a vSAN Stretched Cluster.

Static routes could be added as follows:

1. Hosts on the Preferred Site have a static route added so that requests to
reach the witness network on the Witness Site are routed out the vSAN
VMkernel interface
2. Hosts on the Secondary Site have a static route added so that requests to
reach the witness network on the Witness Site are routed out the vSAN
VMkernel interface
3. The Witness Host on the Witness Site have static route added so that
requests to reach the Preferred Site and Secondary Site are routed out the
WitnessPg VMkernel interface
4. If using Layer 3 between the Preferred Site & Secondary Sites, static routes
may be required to properly communicate across the inter-site link.
*Note, this may result in an alert (which may be disregarded provided
connectivity is veried) that the vSAN network does have a matching
subnet.

Static routes are added via the esxcli network ip route or esxcfg-route commands.
Refer to the appropriate vSphere Command Line Guide for more information.

Caution when implementing Static Routes: Using static routes requires administrator intervention.
Any new ESXi hosts that are added to the cluster at either site 1 or site 2 needed to have static
routes manually added before they can successfully communicate to the witness, and the other
data site. Any replacement of the witness host will also require the static routes to be updated to
facilitate communication to the data sites.

• Considerations for 2 Node vSAN


While 2 Node vSAN congurations are essentially a 1+1+1 vSAN Stretched Cluster, networking
considerations are a bit dierent due to the fact that 2 Node vSAN congurations are typically
installed in a single location. Because each data node is in the same physical location,
independent routing to the Witness Host from each node is not required.

 Traditional vSAN 2 Node Congurations

From the initial oering of 2 Node vSAN, the vSAN network must be able to reach the
Witness Host's VMkernel interface tagged for vSAN Trac. The same static routing
requirements mentioned above are required for the vSAN VMkernel interface with one
exception. With both hosts residing in the same location, the same gateway would be
used.

42

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

 2 Node Direct Connect Congurations

Witness Trac Separation (WTS) support was introduced in vSAN 6.5. Metadata trac
destined for the Witness Host may be sent using an alternate VMkernel interface. This is
accomplished by tagging an alternate VMkernel interface with "Witness Trac". By
redirecting metadata communication with the Witness Host, vSAN data can occur
directly between data nodes. The ability to Direct Connect the data nodes for vSAN data
removes the requirement for a switch. Hosts with onboard 10GbE connectivity can be
very cost eective in a 2 Node Direct Connect conguration.

While 2 Node Direct Connect became ocially supported with the release of vSAN 6.5,
the code shipped with some builds of vSAN 6.2.
As a result, 2 Node Direct is also supported on vSAN 6.2, specically with builds starting
with ESXi 6.0 Update 3. Note that vCenter must also be at version 6.0 Update 3 to use
Direct Connect with 2 Node vSAN.

 vSAN Witness Appliance Networking

When deploying 2 Node vSAN with Direct Connect, it may be desired to use only the
Management interface (vmk0) of a vSAN Witness Appliance instead of the dedicated
WitnessPg VMkernel interface (vmk1). This is a supported conguration. This is not
recommended for 2 Node vSAN congurations that are not using the Direct Connect
feature (Witness Trac Separation).

Additionally, the IP address of the Management VMkernel interface (vmk0) and the
WitnessPg VMkernel interface (vmk1) cannot reside on the same subnet. This is
because, as mentioned above, that vSAN uses the default TCPIP stack, as does ESXi
management. If vmk0 (Management) and vmk1 (vSAN Trac) are congured on the
same segment, vSAN Trac will ow through the Management interface (vmk0) rather
than the interface used for vSAN Trac (vmk1). This is not a supported conguration,
and will result in vSAN Health Check errors. This is a multi-homing issue and is
addressed in KB article 2010877: https://kb.vmware.com/kb/2010877

Topology - L2 Design Versus L3 Design

Consider a design where the vSAN Stretched Cluster is congured in one large L2 design as follows,
where the Preferred Site (Site 1) and Secondary Site (Site 2) are where the virtual machines are
deployed. The Witness site contains the Witness Host:

43

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In the event of the link between Switch 1 and Switch 2 is broken (the link between the Site 1 and Site
2). Network trac will now route from Site 1 to Site 2 via Site 3. Considering there is a much lower
bandwidth requirement for connectivity to the Witness Host, customers would see a decrease in
performance if network trac is routed through a lower specication Site 3.

If there are situations where routing trac between data sites through the witness site does not
impact latency of applications, and bandwidth is acceptable, a stretched L2 conguration between
sites is supported. However, in most cases, VMware feels that such a conguration is not feasible for
the majority of customers.

To avoid the situation outlined, and to ensure that data trac is not routed through the Witness Site,
VMware recommends the following network topology:

• Between Site 1 and Site 2, implement either a stretched L2 (same subnet) or a L3 (routed)
conguration.

• Implement an L3 (routed) conguration between data sites and the Witness site.
 Ensure that Sites 1 and 2 can only connect to Site 3 directly, and not through the
alternate site.

 Static routing will be required from data hosts (Site 1 & Site 2) to the Witness in Site 3.
 Hosts in Site 1 should never traverse the inter-site link to reach Site 3.
 Hosts in Site 2 should never traverse the inter-site link to reach Site 3.

 Static routing will be required from the Witness host (Site 3) to the data hosts (Site 1 &
Site 2)
 The Witness should never route through Site 1, then across the inter-site link to
reach Site 2.
 The Witness should never route through Site 2, then across the inter-site link to
reach Site 1.

• In the event of a failure on either of the data sites network, this conguration will also prevent
any trac from Site 1 being routed to Site 2 via Witness Site 3, and thus avoid any performance
degradation.

44

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

*If connectivity between the vSAN Network is congured to use L3:

• Each host in Site 1 will require a static route for the vSAN VMkernel interface to route across
the inter-site link to each vSAN VMkernel interface for hosts in Site 2.
• Each host in Site 2 will require a static route for the vSAN VMkernel interface to route across
the inter-site link to each vSAN VMkernel interface for hosts in Site 1.

6.4 Cong of Network from Data Sites to Witness

The next question is how to implement such a conguration, especially if the witness host is on a
public cloud? How can the interfaces on the hosts in the data sites, which communicate to each other
over the vSAN network, communicate to the witness host?

Option 1: Physical Witness on-premises connected over L3 & static routes

In this rst conguration, the data sites are connected over a stretched L2 network. This is also true for
the data sites’ management network, vSAN network, vMotion network and virtual machine network.
The physical network router in this network infrastructure does not automatically route trac from the
hosts in the data sites (Site 1 and Site 2) to the host in the Site 3. In order for the vSAN Stretched
Cluster to be successfully congured, all hosts in the cluster must communicate. How can a stretched
cluster be deployed in this environment?

The solution is to use static routes congured on the ESXi hosts so that the vSAN trac from Site 1
and Site 2 is able to reach the witness host in Site 3, and vice versa. While this is not a preferred
conguration option, this setup can be very useful for proof-of-concept design where there may be
some issues with getting the required network changes implemented at a customer site.

In the case of the ESXi hosts on the data sites, a static route must be added to the vSAN VMkernel
interface which will redirect trac for the witness host on the witness site via a default gateway for
that network. In the case of the witness host, the vSAN interface must have a static route added which
redirects vSAN trac destined for the data sites’ hosts. Adding static routes is achieved using the

45

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

esxcfg-route –a command on the ESXi hosts. This will have to be repeated on all ESXi hosts in the
stretched cluster.

For this to work, the network switches need to be IP routing enabled between the vSAN network
VLANs, in this example VLANs 11 and 21. Once requests arrive for a remote host (either witness ->
data or data -> witness), the switch will route the packet appropriately. This communication is essential
for vSAN Stretched Cluster to work properly.

Note that we have not mentioned the ESXi management network here. The vCenter server will still be
required to manage both the ESXi hosts at the data sites and the ESXi witness. In many cases, this is
not an issue for customer. However, in the case of stretched clusters, it might be necessary to add a
static route from the vCenter server to reach the management network of the witness ESXi host if it is
not routable, and similarly a static route may need to be added to the ESXi witness management
network to reach the vCenter server. This is because the vCenter server will route all trac via the
default gateway.

As long as there is direct connectivity from the witness host to vCenter (without NAT’ing), there should
be no additional concerns regarding the management network.

Also note that there is no need to congure a vMotion network or a VM network or add any static
routes for these network in the context of a vSAN Stretched Cluster. This is because there will never be
a migration or deployment of virtual machines to the vSAN Witness host. Its purpose is to maintain
witness objects only, and does not require either of these networks for this task.

Option 2: Virtual Witness on-premises connected over L3 & static routes

Requirements: Since the virtual ESXi witness is a virtual machine that will be deployed on a physical
ESXi host when deployed on-premises, the underlying physical ESXi host will need to have a minimum
of one VM network precongured. This VM network will need to reach both the management network
and the vSAN network shared by the ESXi hosts on the data sites. An alternative option that might be
simpler to implement is to have two precongured VM networks on the underlying physical ESXi host,
one for the management network and one for the vSAN network. When the virtual ESXi witness is
deployed on this physical ESXi host, the network will need to be attached/congured accordingly.

Once the vSAN Witness Appliance has been successfully deployed, the static route conguration must
be congured.

As before, the data sites are connected over a stretched L2 network. This is also true for data sites’
management network, vSAN network, vMotion network and virtual machine network. Once again, the
physical network router in this environment does not automatically route trac from the hosts in the
Preferred and Secondary data sites to the host in the witness site. In order for the vSAN Stretched
Cluster to be successfully congured, all hosts in the cluster require static routes added so that the
vSAN trac from the Preferred and Secondary sites is able to reach the Witness host in the witness
site, and vice versa. As mentioned before, this is not a preferred conguration option, but this setup
can be very useful for proof-of-concept design where there may be some issues with getting the
required network changes implemented at a customer site.

46

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Once again, the static routes are added using the esxcfg-route–a command on the ESXi hosts. This will
have to be repeated on all ESXi hosts in the cluster, both on the data sites and on the witness host.

The switches should be congured to have IP routing enabled between the vSAN network VLANs on
the data sites and the witness site, in this example VLANs 11 and 21. Once requests arrive for the
remote host (either witness -> data or data -> witness), the switch will route the packet appropriately.
With this setup, the vSAN Stretched Cluster will form.

Note that once again we have not mentioned the management network here. As mentioned before,
vCenter needs to manage the remote ESXi witness and the hosts on the data sites. If necessary, a
static route should be added to the vCenter server to reach the management network of the witness
ESXi host, and similarly a static route should be added to the ESXi witness to reach the vCenter server.

Also note that, as before, that there is no need to congure a vMotion network or a VM network or add
any static routes for these network in the context of a vSAN Stretched Cluster. This is be cause there
will never be a migration or deployment of virtual machines to the vSAN witness. Its purpose is to
maintain witness objects only, and does not require either of these networks for this task.

Option 3: 2 Node Conguration for Remote Oce/Branch Oce Deployment

In the use case of Remote Oce/Branch Oce (ROBO) deployments, it is common to have 2 Node
congurations at one or more remote oces. This deployment model can be very cost competitive
when a running a limited number of virtual machines no longer require 3 nodes for vSAN.

47

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

VMware vSAN 2 Node congurations are vSAN Stretched Clusters comprised of two data nodes and
one witness node. This is a 1+1+1 Stretched Cluster conguration. Each data node behaves as a data
site, and the two nodes are typically in the same location. The witness VM could reside at the primary
data center or another location.

Management trac for the data nodes is typically automatically routed to the vCenter server at the
central data center. Routing for the vSAN network, as shown in previous scenarios, will require static
routes between the vSAN interfaces on each data node and the witness VM running in the central data
center.

Because they reside in the same physical location, networking between data nodes is consistent with
that of a traditional vSAN cluster. Data nodes still require a static route to the vSAN Witness host
residing in the central data center. The vSAN Witness Appliance's secondary interface, designated for
vSAN trac will also require a static route to each of data node ’s vSAN trac enabled VMkernel
interface.

Adding static routes is achieved using the esxcfg-route –a command on the ESXi hosts and witness
VM.

In the illustration above, the central data center management network is on VLAN 10. For vCenter to
manage each of the 2 node (ROBO) deployments, there must be a route to each host’s management
network. This could be on an isolated management VLAN, but it is not required. Depending on the
network conguration, vCenter itself may require static routes to each of the remote ESXi host
management VMkernel interfaces. All the normal requirements for vCenter to connect to ESXi hosts
should be satised.

The management VMkernel for the witness VM, in the central data center, can easily reside on the
same management VLAN in the central data center, not requiring any static routing.

The vSAN network in each site must also have routing to the respective witness VM vSAN interface.

48

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Because the VMkernel interface with vSAN trac enabled uses the same gateway, static routes will be
required to and from the data nodes to the vSAN Witness Appliances. Remember a vSAN Witness
Appliance will never run an VM workloads, and therefore the only trac requirements are for
management and vSAN witness trac, because its purpose is to maintain witness objects only.

For remote site VMs to communicate with central data center VMs, appropriate routing for the VM
Network will also be required.

*Local Witness with 2 Node Congurations: A typical Option 3 conguration would include the
Witness being remote, but running the vSAN Witness Appliance on alternate infrastructure in the
same site is supported if so desired. In this conguration a Layer 2 conguration would likely be used,
which may not require any static routing. This vSAN Witness Appliance may NOT run on the 2 Node
cluster, but may run on alternate ESXi 5.5 or higher infrastructure.

Technical Note: Witness Appliance Placement

Witness Appliances may not be placed on an alternate Stretched Cluster.

The illustration shows a dual 2 Node conguration, Clusters A & B, each housing the Witness
Appliance for the alternate cluster.

This is not a recommended conguration but can be supported if an RPQ has been granted. Please
engage your VMware representative to inquire about this conguration.

6.5 Bandwidth Calculation

As stated in the requirements section, the bandwidth requirement between the two main sites is
dependent on workload and in particular the number of write operations per ESXi host. Other factors
such as read locality not in operation (where the virtual machine resides on one site but reads data
from the other site) and rebuild trac, may also need to be factored in.

Requirements Between Data Sites

Reads are not included in the calculation as we are assuming read locality, which means that there
should be no inter-site read trac. The required bandwidth between the two data sites (B) is equal to
the Write bandwidth (Wb) * data multiplier (md) * resynchronization multiplier (mr):

B = Wb *md * mr

The data multiplier is comprised of overhead for vSAN metadata trac and
miscellaneous related operations. VMware recommends a data multiplier of 1.4. The
resynchronization multiplier is included to account for resynchronizing events. It is

49

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

recommended to allocate bandwidth capacity on top of required bandwidth capacity


for resynchronization events.

Making room for resynchronization trac, an additional 25% is recommended.


• Data Site to Data Site Example 1

Take a hypothetical example of a 6 node vSAN Stretched Cluster (3+3+1) with the following:

• A workload of 35,000 IOPS


• 10,000 of those being write IOPS
• A “typical” 4KB size write (This would require 40MB/s, or 320Mbps bandwidth)

Including the vSAN network requirements, the required bandwidth would be 560Mbps.

B = 320 Mbps * 1.4 * 1.25 = 560 Mbps.

• Data Site to Data Site Example 2

Take a 20 node vSAN Stretched Cluster (10+10+1) with a VDI (Virtual Desktop Infrastructure) with
the following:

• A workload of 100,000 IOPS


• With a typical 70%/30% distribution of writes to reads respectively, 70,000 of those are
writes. A “typical” 4KB size write(This would require 280 MB/s, or 2.24Gbps bandwidth)

Including the vSAN network requirements, the required bandwidth would be approximately 4Gbps.

B = 280 Mbps * 1.4 * 1.25 = 3,920 Mbps or 3.92Gbps

Using the above formula, a vSAN Stretched Cluster with a dedicated 10Gbps inter-site link, can
accommodate approximately 170,000 4KB write IOPS. Customers will need to evaluate their I/O
requirements but VMware feels that 10Gbps will meet most design requirements.

Above this conguration, customers would need to consider multiple 10Gb NICs teamed, or a 40Gb
network.

While it might be possible to use 1Gbps connectivity for very small vSAN Stretched Cluster
implementations, the majority of implementations will require 10Gbps connectivity between sites.
Therefore, VMware recommends a minimum of 10Gbps network connectivity between sites for
optimal performance and for possible future expansion of the cluster.

Requirements when Read Locality is not Available

Note that the previous calculations are only for regular Stretched Cluster trac with read locality. If
there is a device failure, read operations also have to traverse the inter-site network. This is because
the mirrored copy of data is on the alternate site when using NumberOfFailurestoTolerate=1.

The same equation for every 4K read IO of the objects in a degraded state would be added on top of
the above calculations. The expected read IO would be used to calculate the additional bandwidth
requirement.

In an example of a single failed disk, with objects from 5 VMs residing on the failed disk, with 10,000
(4KB) read IOPS, an additional 40 Mbps, or 320 mbps would be required, in addition to the above
Stretched Cluster requirements, to provide sucient read IO bandwidth, during peak write IO, and
resync operations.

Requirements Between Data Sites and the Witness Site

Witness bandwidth isn’t calculated in the same way as bandwidth between data sites.
Because hosts designated as a witness do not maintain any VM data, but rather only
component metadata, the requirements are much smaller.

50

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Virtual Machines on vSAN are comprised of multiple objects, which can potentially be
split into multiple components, depending on factors like policy and size. The number
of components on vSAN have a direct impact on the bandwidth requirement between
the data sites and the witness.
The required bandwidth between the Witness and each site is equal to ~1138 B x Number of
Components /5s

1138 B x NumComp / 5 seconds

The 1138 B value comes from operations that occur when the Preferred Site goes oine, and the
Secondary Site takes ownership of all of the components.

When the primary site goes oine, the secondary site becomes the master. The Witness sends
updates to the new master, followed by the new master replying to the Witness as ownership is
updated.

The 1138 B requirement for each component comes from a combination of a payload from the
Witness to the backup agent,followed by metadata indicating that the Preferred Site has failed.

In the event of a Preferred Site failure, the link must be large enough to allow for the cluster ownership
to change, as well ownership of all of the components within 5 seconds.

Witness to Site Examples

Workload 1

With a VM being comprised of

• 3 objects {VM namespace, vmdk (under 255GB), and vmSwap)


• Failure to Tolerate of 1 (FTT=1)
• Stripe Width of 1

Approximately 166 VMs with the above conguration would require the Witness to contain 996
components.

To successfully satisfy the Witness bandwidth requirements for a total of 1,000 components on vSAN,
the following calculation can be used:

Converting Bytes (B) to Bits (b), multiply by 8


B = 1138 B * 8 * 1,000 / 5s = 1,820,800 Bits per second = 1.82 Mbps

VMware recommends adding a 10% safety margin and round up.

B + 10% = 1.82 Mbps + 182 Kbps = 2.00 Mbps

With the 10% buer included, a rule of thumb can be stated that for every 1,000 components, 2 Mbps
is appropriate.

Workload 2

With a VM being comprised of

• 3 objects {VM namespace, vmdk (under 255GB), and vmSwap)


• Failure to Tolerate of 1 (FTT=1)
• Stripe Width of 2

Approximately 1,500 VMs with the above conguration would require 18,000 components to be
stored on the Witness.

To successfully satisfy the Witness bandwidth requirements for 18,000 components on vSAN, the
resulting calculation is:

51

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

B = 1138 B * 8 * 18,000 / 5s = 32,774,400 Bits per second = 32.78 Mbps


B + 10% = 32.78 Mbps + 3.28 Mbps = 36.05 Mbps

Using the general equation of 2Mbps for every 1,000 components, (NumComp/1000) X 2Mbps, it can
be seen that 18,000 components does in fact require 36 Mbps.

6.6 The Role of vSAN Heartbeats

As mentioned previously, when vSAN is deployed in a stretched cluster conguration, the vSAN
master node is placed on the preferred site and the vSAN backup node is placed on the secondary site.
So long as there are nodes (ESXi hosts) available in the “Preferred” site, then a master is always
selected from one of the nodes on this site. Similarly, for the “Secondary” site, so long as there are
nodes available on the secondary site.

The vSAN master node and the vSAN backup node send heartbeats every second. If communication is
lost for 5 consecutive heartbeats (5 seconds) between the master and the backup due to an issue with
the backup node, the master chooses a dierent ESXi host as a backup on the remote site. This is
repeated until all hosts on the remote site are checked. If there is a complete site failure, the master
selects a backup node from the “Preferred” site.

A similar scenario arises when the master has a failure.

When a node rejoins an empty site after a complete site failure, either the master (in the case of the
node joining the primary site) or the backup (in the case where the node is joining the secondary site)
will migrate to that site.

If communication is lost for 5 consecutive heartbeats (5 seconds) between the master and the Witness,
the Witness is deemed to have failed. If the Witness has suered a permanent failure, a new Witness
host can be congured and added to the cluster.

52

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

7. Cluster Settings – vSphere HA


Certain vSphere HA behaviors have been modied especially for vSAN.

53

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

7.1 Cluster Settings – vSphere HA

Certain vSphere HA behaviors have been modied especially for vSAN. It checks the state of the virtual
machines on a per virtual machine basis. vSphere HA can make a decision on whether a virtual
machine should be failed over based on the number of components belonging to a virtual machine
that can be accessed from a particular partition.

When vSphere HA is congured on a vSAN Stretched Cluster, VMware recommends the following:

vSphere HA Turn on

Host Monitoring Enabled

Host Hardware Monitoring – VM Component


Protection: “Protect against Storage Disabled (default)
Connectivity Loss”

Virtual Machine Monitoring Customer Preference – Disabled by default

Admission Control Set to 50%

Host Isolation Response Power o and restart VMs

“Use datastores only from the specied list”,


Datastore Heartbeats but do not select any datastores from the list.
This disables Datastore Heartbeats

Advanced Settings:

das.usedefaultisolationaddress False

das.isolationaddress0 IP address on vSAN network on site 1

das.isolationaddress1 IP address on vSAN network on site 2

7.2 Turn on vSphere HA

To turn on vSphere HA, select the cluster object in the vCenter inventory, Manage, then vSphere HA.
From here, vSphere HA can be turned on and o via a check box.

54

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

7.3 Host Monitoring

Host monitoring should be enabled on vSAN stretch cluster congurations. This feature uses network
heartbeat to determine the status of hosts participating in the cluster, and if corrective action is
required, such as restarting virtual machines on other nodes in the cluster.

7.4 Admission Control

Admission control ensures that HA has sucient resources available to restart virtual machines after a
failure. As a full site failure is one scenario that needs to be taken into account in a resilient
architecture, VMware recommends enabling vSphere HA Admission Control. Availability of workloads
is the primary driver for most stretched cluster environments. Sucient capacity must therefore be
available for a full site failure. Since ESXi hosts will be equally divided across both sites in a vSAN
Stretched Cluster, and to ensure that all workloads can be restarted by vSphere HA, VMware
recommends conguring the admission control policy to 50 percent for both memory and CPU.

VMware recommends using the percentage-based policy as it oers the most exibility and reduces
operational overhead. For more details about admission control policies and the associated algorithms
we would like to refer to the vSphere 6.5 Availability Guide.

The following screenshot shows a vSphere HA cluster congured with admission control enabled using
the percentage based admission control policy set to 50%.

55

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

It should be noted that vSAN is not admission-control aware. There is no way to inform vSAN to set
aside additional storage resources to accommodate fully compliant virtual machines running on a
single site. This is an additional operational step for administrators if they wish to achieve such a
conguration in the event of a failure.

7.5 Host Hardware Monitoring – VM Component Protection

vSphere 6.0 introduced a new enhancement to vSphere HA called VM Component Protection (VMCP)
to allow for an automated fail-over of virtual machines residing on a datastore that has either an “All
Paths Down” (APD) or a “Permanent Device Loss” (PDL) condition.

A PDL, permanent device loss condition, is a condition that is communicated by the storage controller
to ESXi host via a SCSI sense code. This condition indicates that a disk device has become unavailable
and is likely permanently unavailable. When it is not possible for the storage controller to
communicate back the status to the ESXi host, then the condition is treated as an “All Paths Down”
(APD) condition.

In traditional datastores, APD/PDL on a datastore aects all the virtual machines using that datastore.
However, for vSAN this may not be the case. An APD/PDL may only aect one or few VMs, but not all
VMs on the vSAN datastore. Also in the event of an APD/PDL occurring on a subset of hosts, there is
no guarantee that the remaining hosts will have access to all the virtual machine objects, and be able
to restart the virtual machine. Therefore, a partition may result in such a way that the virtual machine is
not accessible on any partition.

Note that the VM Component Protection (VMCP) way of handling a failover is to terminate the running
virtual machine and restart it elsewhere in the cluster. VMCP/HA cannot determine the cluster-wide
accessibility of a virtual machine on vSAN, and thus cannot guarantee that the virtual machine will be
able to restart elsewhere after termination. For example, there may be resources available to restart
the virtual machine, but accessibility to the virtual machine by the remaining hosts in the cluster is not
known to HA. For traditional datastores, this is not a problem, since we know host-datastore
accessibility for the entire cluster, and by using that, we can determine if a virtual machine can be
restarted on a host or not.

At the moment, it is not possible for vSphere HA to understand the complete inaccessibility vs. partial
inaccessibility on a per virtual machine basis on vSAN; hence the lack of VMCP support by HA for
vSAN.

VMware recommends leaving VM Component Protection (VMCP) disabled.

56

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

7.6 Datastore for Heartbeating

vSphere HA provides an additional heartbeating mechanism for determining the state of hosts in the
cluster. This is in addition to network heartbeating, and is called datastore heartbeating. In many vSAN
environment no additional datastores, outside of vSAN, are available, and as such in general VMware
recommends disabling Heartbeat Datastores as the vSAN Datastore cannot be used for heartbeating.
However, if additional datastores are available, then using heartbeat datastores is fully supported.

What do Heartbeat Datastores do, and when does it come in to play? The heartbeat datastore is used
by a host which is isolated to inform the rest of the cluster what its state is and what the state of the
VMs is. When a host is isolated, and the isolation response is congured to "power o" or "shutdown",
then the heartbeat datastore will be used to inform the rest of the cluster when VMs are powered o
(or shutdown) as a result of the isolation. This allows the vSphere HA master to immediately restart the
impacted VMs.

To disable datastore heartbeating, under vSphere HA settings, open the Datastore for Heartbeating
section. Select the option “ Use datastore from only the specied list ”, and ensure that there are no
datastore selected in the list, if any exist. Datastore heartbeats are now disabled on the cluster. Note
that this may give rise to a notication in the summary tab of the host, stating that the number of
vSphere HA heartbeat datastore for this host is 0, which is less than required:2. This message may be
removed by following KB Article 2004739 which details how to add the advanced setting
das.ignoreInsucientHbDatastore = true.

7.7 Virtual Machine Response for Host Isolation

This setting determines what happens to the virtual machines on an isolated host, i.e. a host that can
no longer communicate to other nodes in the cluster, nor is able to reach the isolation response IP
address. VMware recommends that the Response for Host Isolation is to Power o and restart VMs.
The reason for this is that a clean shutdown will not be possible as on an isolated host the access to
the vSAN Datastore, and as such the ability to write to disk, is lost.

57

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Note that a 2-node direct connect conguration is a special case. In this situation it is impossible to
congure a valid external isolation address within the vSAN network. VMware recommends disabling
the isolation response for a 2-node direct connect conguration. If however a host becomes isolated,
vSAN has the ability to kill VMs which are "ghosted" (no access to any of the components). This will
allow for vSphere HA to safely restart the impacted VMs, without the need to use the Isolation
Response.

7.8 Advanced Options

When vSphere HA is enabled on a vSAN Cluster, uses a heartbeat mechanisms to validate the state of
an ESXi host. Network heart beating is the primary mechanism for HA to validate availability of the
hosts.

If a host is not receiving any heartbeats, it uses a fail safe mechanism to detect if it is merely isolated
from its HA master node or completely isolated from the network. It does this by pinging the default
gateway.

In vSAN environments, vSphere HA uses the vSAN trac network for communication. This is dierent
to traditional vSphere environments where the management network is used for vSphere HA
communication. However, even in vSAN environments, vSphere HA continues to use the default
gateway on the management network for isolation detection responses. This should be changed so
that the isolation response IP address is on the vSAN network, as this allows HA to react to a vSAN
network failure.

In addition to selecting an isolation response address on the vSAN network, additional isolation
addresses can be specied manually to enhance reliability of isolation validation.

Network Isolation Response and Multiple Isolation Response Addresses

In a vSAN Stretched Cluster, one of the isolation addresses should reside in the site 1 data center and
the other should reside in the site 2 data center. This would enable vSphere HA to validate a host
isolation even in the case of a partition scenario (network failure between sites).

VMware recommends enabling host isolation response and specifying an isolation response addresses
that is on the vSAN network rather than the management network. The vSphere HA advanced setting
das.usedefaultisolationaddress should be set to false .

VMware recommends specifying two additional isolation response addresses, and each of these
addresses should be site specic. In other words, select an isolation response IP address from the
preferred vSAN Stretched Cluster site and another isolation response IP address from the
secondary vSAN Stretched Cluster site.

The vSphere HA advanced setting used for setting the rst isolation response IP address is
das.isolationaddress0 and it should be set to an IP address on the vSAN network which resides on
the rst site.

The vSphere HA advanced setting used for adding a second isolation response IP address is
das.isolationaddress1 and this should be an IP address on the vSAN network that resides on the
second site.

For further details on how to congure this setting, information can be found in KB Article 1002117.

*When using vSAN 2 Node clusters in the same location, there is no need to have a separate
das.isolationaddress for each of the hosts.

58

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

8. Cluster Settings - DRS


vSphere DRS is used in many environments to distribute load within a cluster.

59

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

8.1 Cluster Settings - DRS

vSphere DRS is used in many environments to distribute load within a cluster. vSphere DRS oers
many other features which can be very helpful in stretched environments.

If administrators wish to enable DRS on vSAN Stretched Cluster, there is a requirement to have a
vSphere Enterprise license edition or higher.

There is also a requirement to create VM to Host anity rules mapping VM to Host groups. These
specify which virtual machines and hosts reside in the preferred site and which reside in the secondary
site. Using Host/VM groups and rules, it becomes easy for administrators to manage which virtual
machines should run on which site, and balance workloads between sites. In the next section, Host/VM
groups and rules are discussed. Note that if DRS is not enabled on the cluster, then VM to Host anity
“should” rules are not honored. These soft (should) rules are DRS centric and are honored/rectied/
warned only when DRS is enabled on the cluster.

Another consideration is that without DRS ,there will be considerable management overhead for
administrators, as they will have to initially place virtual machines on the correct hosts in order for
them to power up without violating the host anity rules. If the virtual machine is initially placed on
the incorrect host, administrators will need to manually migrate them to the correct site before they
can be powered on.

Another consideration is related to full site failures. On a site failure, vSphere HA will restart all virtual
machines on the remaining site. When the failed site recovers, administrators will have to identify the
virtual machines that should reside on the recovered site, and manually move each virtual machine
back to the recovered site manually. DRS, with anity rules, can make this operation easier.

With vSphere DRS enabled on the cluster, the virtual machines can simply be deployed to the cluster,
and then the virtual machine is powered on, DRS will move the virtual machines to the correct hosts to
conform to the Host/VM groups and rules settings. Determining which virtual machines should be
migrated back to a recovered site is also easier with DRS.

Another area where DRS can help administrators is by automatically migrating virtual machines to the
correct site in the event of a failure, and the failed site recovers. DRS, and VM/Host anity rules, can
make this happen automatically without administrator intervention. VMware recommends enabling
vSphere DRS on vSAN Stretched Clusters where the vSphere edition allows it.

When using Site Anity for data placement, provided by vSAN 6.6 Per-Site Policy Rules, it is
important to align VM/Host Group Rules with Site Anity VM Storage Policy Rules. More information
can be found in the Per-Site Policy Considerations section.

8.2 Partially Automated or Fully Automated DRS

Customers can decide whether to place DRS in partially automated mode or fully automated mode.
With partially automated mode, DRS will handle the initial placement of virtual machines. However,
any further migration recommendations will be surfaced up to the administrator to decide whether or
not to move the virtual machine. The administrator can check the recommendation, and may decide
not to migrate the virtual machine. Recommendations should be for hosts on the same site.

With fully automated mode, DRS will take care of the initial placement and on-going load balancing of
virtual machines. DRS should still adhere to the Host/VM groups and rules, and should never balance
virtual machines across dierent sites. This is important as virtual machines on vSAN Stretched Cluster
will use read locality, which implies that they will cache locally. If the virtual machine is migrated by
DRS to the other site, the cache will need to be warmed on the remote site before the virtual machine
reaches it previous levels of performance.

One signicant consideration with fully automated mode is a site failure. Consider a situation where a
site has failed, and all virtual machines are now running on a single site. All virtual machines on the
running site have read locality with the running site, and are caching their data on the running site.
Perhaps the outage has been a couple of hours, or even a day. Now the issue at the failed site has been

60

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

addressed (e.g. power, network,etc.). When the hosts on the recovered rejoin the vSAN cluster, there
has to be a resync of all components from the running site to the recovered site. This may take some
time. However, at the same time, DRS is informed that the hosts are now back in the cluster. If in fully
automated mode, the anity rules are checked, and obviously a lot of them are not compliant.
Therefore DRS begins to move virtual machines back to the recovered site, but the components may
not yet be active (i.e. still synchronizing). Therefore virtual machines could end up on the recovered
site, but since there is no local copy of the data, I/O from these virtual machines will have to traverse
the link between sites to the active data copy. This is undesirable due to latency/performance issues.
Therefore, for this reason, VMware recommends that DRS is placed in partially automated mode if
there is an outage. Customers will continue to be informed about DRS recommendations when the
hosts on the recovered site are online, but can now wait until vSAN has fully resynced the virtual
machine components. DRS can then be changed back to fully automated mode, which will allow virtual
machine migrations to take place to conform to the VM/Host anity rules.

61

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

9. VM/Host Groups & Rules


VMware recommends enabling vSphere DRS to allow for the creation of Host-VM anity rules

62

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

9.1 VM/Host Groups & Rules

VMware recommends enabling vSphere DRS to allow for the creation of Host-VM anity rules to do
initial placement of VMs and to avoid unnecessary vMotion of VMs between sites, and impacting read
locality. Because the stretched cluster is still a single cluster, DRS is unaware of the fact that it is made
up of dierent sites and it may decide to move virtual machines between them. The use of VM/Host
Groups will allow administrators to “pin” virtual machines to sites, preventing unnecessary vMotions/
migrations. If virtual machines are allowed to move freely across sites, it may end up on the remote
site. Since vSAN Stretched Cluster implements read locality, the cache on the remote site will be cold.
This will impact performance until the cache on the remote site has been warmed.

Note that vSAN Stretched Cluster has its own notion of a Preferred site. This is setup at the
conguration point, and refers to which site takes over in the event of a split-brain. It has no bearing
on virtual machine placement. It is used for the case where there is a partition between the two data
sites *and* the witness agent can talk to both sites. In that case, the witness agent needs to decide
which side’s cluster it will stick with. It does so with what has been specied as the “Preferred” site.

9.2 Host Groups

When conguring DRS with a vSAN Stretched Cluster, VMware recommends creating two VM-Host
anity groups. An administrator could give these host groups the names of "Preferred" and
"Secondary" to match the nomenclature used by vSAN. The hosts from site 1 should be placed in the
preferred host group, and the hosts from site 2 should be placed in the secondary host group.

9.3 VM Groups

Two VM groups should also be created; one to hold the virtual machines placed on site 1 and the other
to hold the virtual machines placed on site 2. Whenever a virtual machine is created and before it is
powered on, assuming a NumberOfFailuresToTolerate policy setting of 1, the virtual machine should be
added to the correct host anity group. This will then ensure that a virtual always remains on the
same site, reading from the same replica, unless a site critical event occurs necessitating the VM being
failed over to the secondary site.

Note that to correctly use VM groups, rst o all create the VM, but do power it on. Next, edit the VM
groups and add the new VM to the desired group. Once added, and saved, the virtual machine can
now be powered on. With DRS enabled on the cluster, the virtual machine will be checked to see if it is
on the correct site according to the VM/Host Rules (discussed next) and if not, it is automatically
migrated to the appropriate site, either “preferred” or “secondary”.

63

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

9.4 VM/Host Rules

When deploying virtual machines on a vSAN Stretched Cluster, for the majority of cases, we wish the
virtual machine to reside on the set of hosts in the selected host group. However, in the event of a full
site failure, we wish the virtual machines to be restarted on the surviving site.

To achieve this, VMware recommends implementing “should respect rules” in the VM/Host Rules
conguration section. These rules may be violated by vSphere HA in the case of a full site outage. If
“must rules” were implemented, vSphere HA does not violate the rule-set, and this could potentially
lead to service outages. vSphere HA will not restart the virtual machines in this case, as they will not
have the required anity to start on the hosts in the other site. Thus, the recommendation to
implement “should rules” will allow vSphere HA to restart the virtual machines in the other site.

The vSphere HA Rule Settings are found in the VM/Host Rules section. This allows administrators to
decide which virtual machines (that are part of a VM Group) are allowed to run on which hosts (that are
part of a Host Group). It also allows an administrator to decide on how strictly “VM to Host anity
rules” are enforced.

As stated above, the VM to Host anity rules should be set to “should respect” to allow the virtual
machines on one site to be started on the hosts on the other site in the event of a complete site failure.
The “should rules” are implemented by clicking the Edit button in the vSphere HA Rule Settings at the
bottom of the VM/Host Rules view, and setting VM to Host anity rules to “vSphere HA should
respect rules during failover”.

vSphere DRS communicates these rules to vSphere HA, and these are stored in a “compatibility list”
governing allowed startup behavior. Note once again that with a full site failure, vSphere HA will be
able to restart the virtual machines on hosts that violate the rules. Availability takes preference in this
scenario.

9.5 Per-Site Policy Rule Considerations

With the introduction of Per-Site Policy Rules, VM/Hosts Group Rules are more important than ever to
consider.

Misconguration

64

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

It is entirely possible to have a VM Storage Policy using the Anity rule placing data in the one site,
with a VM/Host Group Rule placing the VM in the alternate site.

Here is an illustration of such a conguration:

65

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

66

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

On a local network this may not be a signicant issue. In a Stretched Cluster conguration, with sites
spread across large geographical distances, this is considered a misconguration. This is because
when the VM does not run in the same site, reads and writes must traverse the inter-site link.

• Additional unnecessary bandwidth is consumed to operate the VM running on a site that is


opposite to the site the data is stored on.
• This bandwidth should stay within the same site to ensure lower bandwidth utilization.
• In the situation where the alternate site is disconnected, the VM will no longer have access to its
vmdk and will essentially become a zombie VM.

Proper Conguration
A proper conguration includes VM/Host Group Rules that properly align with the Anity Rules
assigned to VMs by their corresponding VM Storage Policies.

67

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Setting proper VM/Host Group Rules and VM Storage Policy Anity Rules are benecial for several
reasons

• Bandwidth is not unnecessarily sent across the inter-site link


• Lower inter-site bandwidth utilization
• In the situation where the alternate site is disconnected, the VM will continue to have access to
its vmdk.

Summary
It is important to insure proper rules are in place to maintain a properly utilized conguration. VMware
recommends VM/Host Group rules along with Anity rules for VM Storage Policies for VMs that are
only being stored on one of the two data sites in a vSAN 6.6 Stretched Cluster.

68

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

10. Installation
The installation of vSAN Stretched Cluster is similar to the conguration of Fault Domains.

69

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

10.1 Installation

The installation of vSAN Stretched Cluster is almost identical to how Fault Domains were implemented
in earlier vSAN versions, with a couple of additional steps. This part of the guide will walk the reader
through a stretched cluster conguration.

10.2 Before You Start

Before delving into the installation of a vSAN Stretched Cluster, there are a number of important
features to highlight that are specic to stretch cluster environments.

What is a Preferred Domain/Preferred Site?

Preferred domain/preferred site is simply a directive for vSAN. The “Preferred” site is the site that
vSAN wishes to remain running when there is a failure and the sites can no longer communicate. One
might say that the “Preferred" site is the site expected to have the most reliability.

Since virtual machines can run on any of the two sites, if network connectivity is lost between site 1
and site 2, but both still have connectivity to the Witness, the preferred site is the one that survives
and its components remains active, while the storage on the non-preferred site is marked as down and
components on that site are marked as absent.

What is Read Locality?

Since virtual machines deployed on vSAN Stretched Cluster will have compute on one
site, but a copy of the data on both sites, vSAN will use a read locality algorithm to read
100% from the data copy on the local site, i.e. same site where the compute resides.
This is not the regular vSAN algorithm, which reads in a round-robin fashion across all
replica copies of the data.

This new algorithm for vSAN Stretched Clusters will reduce the latency incurred on read
operations.

If latency is less than 5ms and there is enough bandwidth between the sites, read
locality could be disabled. However please note that disabling read locality means that
the read algorithm reverts to the round robin mechanism, and for vSAN Stretched
Clusters, 50% of the read requests will be sent to the remote site. This is a signicant
consideration for sizing of the network bandwidth. Please refer to the sizing of the
network bandwidth between the two main sites for more details.

The advanced parameter VSAN.DOMOwnerForceWarmCache can be enabled or


disabled to change the behavior of read locality. This advanced parameter is hidden and
is not visible in the Advanced System Settings vSphere web client. It is only available the
CLI.

Read locality is enabled by default when vSAN Stretched Cluster is congured – it


should only be disabled under the guidance of VMware’s Global Support Services
organization, and only when extremely low latency is available across all sites.

Witness Host Must not be Part of the vSAN Cluster

When conguring your vSAN stretched cluster, only data hosts must be in the cluster object in
vCenter. The witness host must remain outside of the cluster, and must not be added to the cluster at
any point. Thus for a 1+1+1 conguration, where there is one host at each site and one physical ESXi
witness host, the conguration will look similar to the following:

70

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Note that the witness host is not shaded in blue in this case. The witness host only appears shaded in
blue when a vSAN Witness Appliance is deployed. Physical hosts that are used as witness hosts are not
shaded in blue.

10.3 vSAN Health Check Plugin for Stretched Clusters

VMware vSAN 6.1, shipped with vSphere 6.0U1, has a health check feature built in. This functionality
was rst available for vSAN 6.0. The updated 6.1 version of the health check for vSAN has
enhancements specically for vSAN stretched cluster.

Once the ESXi hosts have been upgraded or installed with ESXi version 6.0U1, there are no additional
requirements for enabling the vSAN health check. Note that ESXi version 6.0U1 is a requirement for
vSAN Stretched Cluster.

Similarly, once the vCenter Server has been upgraded to version 6.0U1, the vSAN Health Check plugin
components are also upgraded automatically, provided vSphere DRS is licensed, and DRS Automation
is set to Fully Automated. If vSphere DRS is not licensed, or not set to Fully Automated, then hosts will
have to be evacuated and the Health Check vSphere Installable Bundle (vib) will have to be installed
manually.

Please refer to the 6.1 Health Check Guide got additional information. The location is available in the
appendix of this guide.

New vSAN Health Checks for Stretched Cluster Congurations

As mentioned, there are new health checks for vSAN Stretched Cluster. Select the Cluster object in the
vCenter inventory, click on Monitor > vSAN > Health. Ensure the stretched cluster health checks pass
when the cluster is congured.

Note: The stretched cluster checks will not be visible until the stretch cluster conguration is
completed.

71

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

72

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

11. Using a vSAN Witness Appliance


vSAN StretchedCluster supports the use of a vSAN Witness Appliance

73

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

11.1 Using a vSAN Witness Appliance

VMware vSAN Stretched Cluster supports the use of a vSAN Witness Appliance as the Witness host.
This is available as an OVA (Open Virtual Appliance) from VMware. However this vSAN Witness
Appliance needs to reside on a physical ESXi host, which requires some special networking
conguration.

Networking
The vSAN Witness Appliance contains two network adapters that are connected to separate vSphere
Standard Switches (VSS).

The vSAN Witness Appliance Management VMkernel is attached to one VSS, and the WitnessPG is
attached to the other VSS. The Management VMkernel (vmk0) is used to communicate with the
vCenter Server for appliance management. The WitnessPG VMkernel interface (vmk1) is used to
communicate with the vSAN Network. This is the recommended conguration. These network
adapters can be connected to dierent, or the same, networks, provided they have connectivity to
their appropriate services.

The Management VMkernel interface could be tagged to include vSAN Network trac as well as
Management trac. In this case, vmk0 would require connectivity to both vCenter Server and the
vSAN Network.

In many nested ESXi environments, there is a recommendation to enable promiscuous mode to allow
all Ethernet frames to pass to all VMs that are attached to the port group, even if it is not intended for
that particular VM. The reason promiscuous mode is enabled in many nested environments is to
prevent a virtual switch from dropping packets for (nested) vmnics that it does not know about on
nested ESXi hosts.

A Note About Promiscuous Mode

In many nested ESXi environments, there is a recommendation to enable promiscuous mode to allow
all Ethernet frames to pass to all VMs that are attached to the port group, even if it is not intended for
that particular VM. The reason promiscuous mode is enabled in these environments is to prevent a
virtual switch from dropping packets for (nested) vmnics that it does not know about on nested ESXi
hosts. Nested ESXi deployments are not supported by VMware other than the vSAN Witness
Appliance.

The vSAN Witness Appliance is essentially a nested ESXi installation tailored for use with vSAN
Stretched Clusters and 2 Node vSAN clusters.

The MAC addresses of the VMkernel interfaces vmk0 & vmk1 are congured to match the MAC
addresses of the vSAN Witness Appliance host's NICs, vmnic0 and vmnic1. Because of this, packets
destined for either the Management VMkernel interface (vmk0) or the WitnessPG VMkernel interface,
are not dropped.

74

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Because of this, promiscuous mode is not required when using a vSAN Witness Appliance.

11.2 Setup Step 1: Deploy the vSAN Witness Appliance

The vSAN Witness Appliance must be deployed on dierent infrastructure than the Stretched Cluster
itself. This step will cover deploying the vSAN Witness Appliance to a dierent cluster.

The rst step is to download and deploy the vSAN Witness Appliance, or deploy it directly via a URL, as
shown below. In this example it has been downloaded:

75

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Select a Datacenter for the vSAN Witness Appliance to be deployed to and provide a name
(Witness-01 or something similar).

Select a cluster for the vSAN Witness Appliance to reside on.

76

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Review the details of the deployment and press next to proceed.

The license must be accepted to proceed.

77

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

At this point a decision needs to be made regarding the expected size of the Stretched Cluster
conguration. There are three options oered. If you expect the number of VMs deployed on the
vSAN Stretched Cluster to be 10 or fewer, select the Tiny conguration. If you expect to deploy more
than 10 VMs, but less than 500 VMs, then the Normal (default option) should be chosen. For more
than 500VMs, choose the Large option. On selecting a particular conguration, the resources
consumed by the appliance and displayed in the wizard (CPU, Memory and Disk):

Select a datastore for the vSAN Witness Appliance. This will be one of the datastore available to the
underlying physical host. You should consider when the vSAN Witness Appliance is deployed as thick

78

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

or thin, as thin VMs may grow over time, so ensure there is enough capacity on the selected datastore.
Remember that the vSAN Witness Appliance is not supported on vSAN Stretched Cluster datastores.

Select a network for the Management Network and for the Witness Network.

Give a root password for the vSAN Witness Appliance:

79

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

At this point, the vSAN Witness Appliance is ready to be deployed. It will need to be powered on
manually via the vSphere web client UI later:

Once the vSAN Witness Appliance is deployed and powered on, select it in the vSphere web client UI
and begin the next steps in the conguration process.

11.3 Setup Step 2: vSAN Witness Appliance Management

80

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Once the vSAN Witness Appliance has been deployed, select it in the vSphere web client UI, open the
console.

The console of the vSAN Witness Appliance should be access to add the correct networking
information, such as IP address and DNS, for the management network.

On launching the console, unless you have a DHCP server on the management network, it is very likely
that the landing page of the DCUI will look something similar to the following:

Use the <F2> key to customize the system. The root login and password will need to be provided at
this point. This is the root password that was added during the OVA deployment earlier.

Select the Network Adapters view. There will be two network adapters, each corresponding to the
network adapters on the virtual machine. You should note that the MAC address of the network
adapters from the DCUI view match the MAC address of the network adapters from the virtual
machine view. Because these match, there is no need to use promiscuous mode on the network, as
discussed earlier.

Select vmnic0, and if you wish to view further information, select the key <D> to see more details.

81

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Navigate to the IPv4 Conguration section. This will be using DHCP by default. Select the static option
as shown below and add the appropriate IP address, subnet mask and default gateway for this vSAN
Witness Appliance Management Network.

The next step is to congure DNS. A primary DNS server should be added and an optional alternate
DNS server can also be added. The FQDN, fully qualied domain name, of the host should also be
added at this point.

82

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

One nal recommendation is to do a test of the management network. One can also try adding the IP
address of the vCenter server at this point just to make sure that it is also reachable.

When all the tests have passed, and the FQDN is resolvable, administrators can move onto the next
step of the conguration, which is adding the vSAN Witness Appliance ESXi instance to the vCenter
server.

11.4 Setup Step 3: Add Witness to vCenter Server

There is no dierence to adding the vSAN Witness Appliance ESXi instance to vCenter server when
compared to adding physical ESXi hosts. However, there are some interesting items to highlight during
the process. First step is to provide the name of the Witness. In this example, vCenter server is
managing multiple data centers, so we are adding the host to the witness data center.

83

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Provide the appropriate credentials. In this example, the root user and password.

Acknowledge the certicate warning:

84

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

There should be no virtual machines on the vSAN Witness Appliance. Note: It can never run VMs in a
vSAN Stretched Cluster conguration. Note also the mode: VMware Virtual Platform. Note also that
builds number may dier to the one shown here.

The vSAN Witness Appliance also comes with its own license. You do not need to consume vSphere
licenses for the witness appliance:

85

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Lockdown mode is disabled by default. Depending on the policies in use at a customer’s site, the
administrator may choose a dierent mode to the default:

Click Finish when ready to complete the addition of the Witness to the vCenter server:

86

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

One nal item of note is the appearance of the vSAN Witness Appliance ESXi instance in the vCenter
inventory. It has a light blue shading, to dierentiate it from standard ESXi hosts. It might be a little
dicult to see in the screen shot below, but should be clearly visible in your infrastructure. (Note: In
vSAN 6.1 and 6. 2 deployments, the “No datastores have been congured” message is because the
nested ESXi host has no VMFS datastore. This can be ignored.)

87

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

One nal recommendation is to verify that the settings of the vSAN Witness Appliance matches the
Tiny, Normal or Large conguration selected during deployment. For example, the Normal deployment
should have an 12GB HDD for boot in vSAN 6.5 (8GB for vSAN 6.1/6.2), a 10GB Flash that will be
congured later on as a cache device and another 350 HDD that will also be congured later on as a
capacity device.

Once conrmed, you can proceed to the next step of conguring the vSAN network for the vSAN
Witness Appliance.

11.5 Setup Step 4: Cong vSAN Witness Host Networking

The next step is to congure the vSAN network correctly on the vSAN Witness Appliance. When the
Witness is selected, navigate to Congure > Networking > Virtual switches as shown below.

88

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The Witness has a portgroup pre-dened called witnessPg. Here the VMkernel port to be used for
vSAN trac is visible. If there is no DHCP server on the vSAN network (which is likely), then the
VMkernel adapter will not have a valid IP address. Select VMkernel adapters > vmk1 to view the
properties of the witnessPg. Validate that "vSAN" is an enabled service as depicted below.

89

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

* Engineering note: A few things to consider when conguring vSAN trac on the vSAN Witness
Appliance.

• The default conguration has vmk0 congured for Management Trac and vmk1 congured
for vSAN Trac.
• The vmk1 interface cannot be congured with an IP address on the same range as that of
vmk0. This is because Management trac and vSAN trac use the default TCP/IP stack. If
both vmk0 and vmk1 are congured on the same range, a multihoming condition will occur and
vSAN trac will ow from vmk0, rather than vmk1. Health Check reporting will fail because
vmk0 does not have vSAN enabled. The multihoming issue is detailed in KB 2010877 ( https://
kb.vmware.com/kb/2010877 ).
• In the case of 2 Node vSAN, If it is desired to have vSAN trac on the same subnet as vmk0, (or
simply use a single interface for simplicity), it is recommended to disable vSAN services on
vmk1 (WitnessPg) and enable vSAN services on vmk0 (Management). This is a perfectly valid
and supported conguration.

Congure the network address

Select the witnessPg and edit the properties by selecting the pencil icon.

90

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If vSAN is not an enabled service, select the witnessPg portgroup, and then select the option to edit it.
Tag the VMkernel port for vSAN trac, as shown below:

Next, ensure the MTU is set to the same value as the Stretched Cluster hosts’ vSAN VMkernel
interface.

91

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In the IPV4 settings, a default IP address has been allocated. Modify it for the vSAN trac network.

Static routes are still required by the witnessPg VMkernel interface (vmk1) as in vSAN 6.1 or 6.2. The
"Override default gateway for this adapter" setting is not supported for the witness VMkernel interface
(vmk1).

Once the witnessPg VMkernel interface address has been congured, click OK.

11.6 Setup Step 5: Validate Networking

92

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The nal step before a vSAN Stretched Cluster can be congured, is to ensure there is connectivity
among the hosts in each site and the Witness host. It is important to verify connectivity before
attempting to congure vSAN Stretched Clusters.

Default Gateways and Static Routes


By default, trac destined for the vSAN Witness host have no route to the vSAN networks from hosts.
As a result, pings to the remote vSAN networks fail.

When using vSAN 6.1, 6.2, or 6.5, administrators must implement static routes for the vSAN VMkernel
interfaces to the Witness Appliance vSAN VMkernel interface and vice versa.

Static routes, as highlighted previously, tell the TCPIP stack to use a dierent path to reach a particular
network. Now we can tell the TCPIP stack on the data hosts to use a dierent network path (instead of
the default gateway) to reach the vSAN network on the witness host. Similarly, we can tell the witness
host to use an alternate path to reach the vSAN network on the data hosts rather than via the default
gateway.

Note once again that the vSAN network is a stretched L2 broadcast domain between the data sites as
per VMware recommendations, but L3 is required to reach the vSAN network of the witness appliance.
Therefore, static routes are needed between the data hosts and the witness host for the vSAN
network, but they are not required for the data hosts on dierent sites to communicate to each other
over the vSAN network.

Hosts in Site A
Looking at host esxi01-sitea.rainpole.com, initially there is no route from the vSAN VMkernel interface
(vmk1) to the vSAN Witness Appliance vSAN VMkernel interface with the address of 147.80.0.15.

Notice that when attempting to ping the vSAN Witness Appliance from host esxi01-
sitea.rainpole.com's vSAN VMkernel interface (vmk1), there is no communication.
The command vmkping -I vmk1 <target IP> uses vmk1, because the -I switch species using the vmk1
interface.

93

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Add a static route for each host. The esxcli commands used to add a static route is:

esxcli network ip route ipv4 add –n <remote network> -g <gateway to use>

For the hosts in Site A, the command used above is esxcli network ip route ipv4 add -n 147.80.0.0/24 -
g 172.3.0.1.

This is because the hosts in Site A, have a gateway to the Witness vSAN Appliance vSAN VMkernel
interface through 172.3.0.1.

Other useful commands are esxcfg-route –n, which will display the network neighbors
on various interfaces, and esxcli network ip route ipv4 list, to display gateways for
various networks. Make sure this step is repeated for all hosts.

Hosts in Site B
Looking at esxi02-siteb.rainpole.com, it can also be seen that there is no route to the vSAN VMkernel
Interface (vmk1) to the vSAN Witness Appliance VMkernel interface with the address 147.80.0.15.
The issue is the same as esxi01-sitea.rainpole.com on esxi02-siteb.rainpole.com.

The route from Site B to the vSAN Witness Appliance vSAN VMkernel interface, is dierent however.
The route from Site B (in this example) is through 172.3.0.253.

For the hosts in Site B, the command used above is esxcli network ip route ipv4 add -n 147.80.0.0/24 -
g 172.3.0.253.

The vSAN Witness Appliance in the 3rd Site


The vSAN Witness Appliance, in the 3rd Site is congured a bit dierent. The vSAN VMkernel interface
(vmk1) must communicate across dierent gateways to connect to Site A and Site B.

Communication to Site A in this example must use 147.80.0.1 and communication to Site B must use
147.80.0.253.

94

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Because of this routes must be added for each vSAN VMkernel interface for each of the hosts in Site A
and Site B, on the vSAN Witness Appliance.

To do this individually for each host in Site A, the commands would be:

esxcli network ip route ipv4 add -n 172.3.0.11/32 -g 147.80.1


esxcli network ip route ipv4 add -n 172.3.0.12/32 -g 147.80.0.1

To do this individually for each host in Site B, the commands would be:

esxcli network ip route ipv4 add -n 172.3.0.13/32 -g 147.80.0.253


esxcli network ip route ipv4 add -n 172.3.0.14/32 -g 147.80.0.253

With proper routing for each site, connectivity can be veried. Before verifying, let's review the
conguration.

Conguration Summary
The following illustration shows the data ow between each of the data sites and the vSAN Witness
Appliance in Site 3.

95

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Static Static
Static Route Fau
Host VMkernel IP Site Route to Route to
to Site B Doma
Witness Site A

esxi01-
vmk1 172.3.0.11 A 172.3.0.1 NA NA Prefer
sitea.rainpole.com

esxi02-
vmk1 172.3.0.12 A 172.3.0.1 NA NA Prefer
sitea.rainpole.com

esxi01-
vmk1 172.3.0.13 B 172.3.0.253 NA NA Second
siteb.rainpole.com

esxi02-
vmk1 172.3.0.14 B 172.3.0.253 NA NA Second
siteb.rainpole.com

witness-01.rainpole.com vmk1 147.80.0.15 3 NA 147.80.0.1 147.80.0.253

2 Node vSAN Congurations


With 2 Node vSAN congurations, each node behaves as a site. With both "sites" being
in the same location, they are likely behind the same router and it is not necessary to
have a dierent default gateway for Host 1 and Host 2.

96

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

A conventional vSAN 2 Node conguration will look something like this:

Conventional vSAN 2 Node Conguration

Static Static Static


Trac
Host VMkernel IP Site Route to Route to Route to
Type
Witness Host 1 Host 2

host1-2node.rainpole.com vmk1 172.3.0.11 vSAN A 172.3.0.1 NA NA

host2-2node.rainpole.com vmk1 172.3.0.12 vSAN A 172.3.0.1 NA NA

witness-01.rainpole.com vmk1 147.80.0.15 vSAN 3 NA 147.80.0.1 147.80.0.1

A DirectConnect vSAN 2 Node Conguration is a little dierent. An alternate VMkernel port will be
used to communicate with the vSAN Witness Appliance. This is because the vSAN data network will be
directly connected between hosts.

It will look something like this:

97

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The 2 Node Direct Connect conguration will have the following connectivity (with vmk0 set for
Witness Trac)

Static Static Static


Trac
Host VMkernel IP Site Route to Route to Route to
Type
Witness Host 1 Host 2

host1-2node.rainpole.com vmk0 172.40.0.11 Witness A 172.40.0.1 NA NA

host2-2node.rainpole.com vmk0 172.40.0.12 Witness A 172.40.0.1 NA NA

witness-01.rainpole.com vmk1 147.80.0.15 vSAN 3 NA 147.80.0.1 147.80.0.1

With 2 Node vSAN Congurations, routing and connectivity will be dependent on choosing a
conventional or Direct Connect design.

Next Steps
Once routing is in place, ping from each host's appropriate interface using vmkping -I vmkX
<destination IP> to check connectivity. When connectivity has been veried between the vSAN data
node VMkernel interfaces and the vSAN Witness Appliance vSAN VMkernel interface, the Stretched
(or 2 Node) Cluster can be congured.

98

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

12. Conguring vSAN Stretched


Cluster
There are two methods to congure a vSAN Stretched Cluster.

99

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

12.1 Conguring vSAN Stretched Cluster

There are two methods to congure a vSAN Stretched Cluster. A new cluster can be stretched or an
existing cluster can be converted to a Stretched Cluster.

12.2 Creating a New vSAN Stretched Cluster

Creating a vSAN stretched cluster from a group of hosts that does not already have vSAN congured
is relatively simple. A new vSAN cluster wizard makes the process very easy.

Create Step 1: Create a Cluster

The following steps should be followed to install a new vSAN stretched cluster. This example is a 3+3
+1 deployment, meaning three ESXi hosts at the preferred site, three ESXi hosts at the secondary site
and 1 witness host.

In this example, there are 6 nodes available: esx01-sitea, esx02-site a, esx03-sitea, esx01-siteb, esx02-
siteb, and esx03-siteb. All six hosts reside in a vSphere cluster called stretched-vsan. The seventh host
witness-01, which is the witness host, is in its own data center and is not added to the cluster.

To setup vSAN and congure stretch cluster navigate to the Manage > vSAN > General. Click
Congure to begin the vSAN wizard.

Create Step 2 Congure vSAN as a Stretched Cluster

The initial wizard allows for choosing various options like disk claiming method, enabling
Deduplication and Compression (All-Flash architectures only with Advanced or greater licensing), as

100

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

well as conguring fault domains or stretched cluster. Click Congure stretched cluster.

Create Step 3 Validate Network

Network validation will conrm that each host has a VMkernel interface with vSAN trac enabled.
Click Next.

Create Step 4 Claim Disks

If the Claim Disks was set to Manual, disks should be selected for their appropriate role in the vSAN
cluster.

101

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Click Next.

Create Step 5 Create Fault Domains

The Create fault domain wizard will allow hosts to be selected for either of the two sides of the
stretched cluster. The default naming of these two fault domains are Preferred and Secondary. These
two fault domains will behave as two distinct sites.

102

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Click Next.

Create Step 6 Select Witness Host

The Witness host detailed earlier must be selected to act as the witness to the two fault domains.

Click Next.

103

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Create Step 7 Claim Disks for Witness Host

Just like physical vSAN hosts, the witness needs a cache tier and a capacity tier. *Note: The witness
does not actually require SSD backing and may reside on a traditional mechanical drive.

Select Next.

Create Step 8 Complete

Review the vSAN Stretched Cluster conguration for accuracy and click Finish.

12.3 Converting a Cluster to a Stretched Cluster

The following steps should be followed to convert an existing vSAN cluster to a stretched cluster. This
example is a 3+3+1 deployment, meaning three ESXi hosts at the preferred site, three ESXi hosts at
the secondary site and 1 witness host.

Consider that all hosts are properly congured and vSAN is already up and running.

104

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Convert Step 1 Fault Domains & Stretched Cluster

Conguring fault domains and stretched cluster setting is handled through the Fault Domains &
Stretched Cluster menu item. Select Manage > Settings > Fault Domains & Stretched Cluster.

Convert Step 2 Selecting Hosts to Participate

Select each of the hosts in the cluster and click Congure.

105

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Convert Step 3 Congure Fault Domains

The Create fault domain wizard will allow hosts to be selected for either of the two sides
of the stretched cluster. The default naming of these two fault domains are Preferred
and Secondary. These two fault domains will behave as two distinct sites.

Convert Step 4 Select a Witness Host

The Witness host detailed earlier must be selected to act as the witness to the two fault domains.

Convert Step 5 Claim Disks for Witness Host

Just like physical vSAN hosts, the witness needs a cache tier and a capacity tier. *Note: The witness
does not actually require SSD backing and may reside on a traditional mechanical drive.

106

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Convert Step 6 Complete

Review the vSAN Stretched Cluster conguration for accuracy and click Finish.

12.4 Congure Stretched Cluster Site Anity

Congure Step 1 Create Host Groups

At this point, there needs to be a way of specifying which site a VM should be deployed to. This is
achieved with VM Groups, Host Groups and VM/Host Rules. With these groups and rules, and
administrator can specify which set of hosts (i.e. which site) a virtual machine is deployed to. The rst
step is to create two host groups; the rst host groups will contain the ESXi hosts from the preferred
site whilst the second host group will contain the ESXi host from the secondary site. In this setup
example, a 3+3+1 environment is being deployed, so there are two hosts in each host group. Select the
cluster object from the vSphere Inventory, select Manage, then Settings. This is where the VM/Host
Groups are created.

Navigate to cluster > Manage > VM/Host Groups. Select the option to add a group. Give the group a
name, and ensure the group type is “Host Group” as opposed to “VM Group”. Next, click Add to select

107

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

the hosts should be in the host group. Select the hosts from site A.

Once the hosts have been added to the Host Group, click OK. Review the settings of the host group,
and click OK once more to create it:

This step will need to be repeated for the secondary site. Create a host group for the secondary site
and add the ESXi hosts from the secondary site to the host group.

108

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

When hosts groups for both data sites have been created, the next step is to create VM groups.
However, before you can do this, virtual machines should be created on the cluster.

Congure Step 2: Create VM Groups

Once the host groups are created, the initial set of virtual machines should now be created. Do not
power on the virtual machines just yet. Once the virtual machines are in the inventory, you can now
proceed with the creation of the VM Groups. First create the VM Group for the preferred site. Select
the virtual machines that you want for the preferred site.

In the same way that a second host group had to be created previously for the secondary site, a
secondary VM Group must be created for the virtual machines that should reside on the secondary
site.

Congure Step 3: Create VM/Host Rules

Now that the host groups and VM groups are created, it is time to associate VM groups with host
groups and ensure that particular VMs run on a particular site. Navigate to the VM/Host rules to
associate a VM group with a host group.

In the example shown below, The VMs in the sec-vms VM group with the host group called site-b-
hostgroup, which will run the virtual machines in that group on the hosts in the secondary site.

109

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

One item highlighted above is that this is a “should” rule. We use a “should” rule as it allows vSphere
HA to start the virtual machines on the other side of the stretched cluster in the event of a site failure.

Another VM/Host rule must be created for the primary site. Again this should be a "should” rule.
Please note that DRS will be required to enforce the VM/Host Rules. Without DRS enabled, the soft
“should” rules have no eect on placement behavior in the cluster.

Congure Step 4: Set vSphere HA Rules

There is one nal setting that needs to be placed on the VM/Host Rules. This setting once again
denes how vSphere HA will behave when there is a complete site failure. In the screenshot below,
there is a section in the VM/Host rules called vSphere HA Rule Settings. One of the settings is for VM
to Host Anity rules. A nal step is to edit this from the default of “ignore” and change it to “vSphere
HA should respect VM/Host anity rules” as shown below:

This setting can be interpreted as follows:

• If there are multiple hosts on either sites, and one hosts fails, vSphere HA will try to restart the
VM on the remaining hosts on that site, maintained read anity.
• If there is a complete site failure, then vSphere HA will try to restart the virtual machines on the
hosts on the other site. If the “must respect” option shown above is selected, then vSphere HA

110

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

would be unable to restart the virtual machines on the other site as it would break the rule.
Using a “should” rule allows it to do just that.

Congure Step 5 (Optional): Set VM Storage Policy Anity Rules

When using vSAN 6.6, it may be desired to store a VM's data only on the site the VM runs on. To
accomplish this, a VM storage policy with the Anity Rule must be created.

Rules should be selected as follows:

• Primary level of failures to tolerate (PFTT) = 0 - This ensures the Anity Rule may be used
• Secondary level of failures to tolerate (SFTT) = 0,1,2,3 - Based on Failure Tolerance Method
desired for local protection
• Failure tolerance method = Mirroring or Erasure Coding - This is based on the desired
protection method. All-Flash is required for Erasure Coding.
• Anity - Preferred or Secondary Fault Domain - This will align with VM/Host Group Rules

VMware recommends that when using Site Anity for vSAN object placement, VM/Host Group Rules
should be aligned accordingly.

More information specic to Per-Site Polices and Anity may be found here: https://
storagehub.vmware.com/#!/vmware-vsan/vsan-stretched-cluster-2-node-guide/per-site-policies/1

12.5 Verifying vSAN Stretched Cluster Component Layouts

That completes the setup of the vSAN Stretched Cluster. The nal steps are to power up the virtual
machines created earlier, and examine the component layout. When NumberOfFailuresToTolerate = 1
is chosen, a copy of the data should go to both sites, and the witness should be placed on the witness
host.

In the example below, esx01-sitea and esx02-sitea resides on site 1, whilst esx01-siteb and esx02-
siteb resides on site 2. The host witness-01 is the witness. The layout shows that the VM has been

111

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

deployed correctly.

As we can clearly see, one copy of the data resides on storage in site1, a second copy of the data
resides on storage in site2 and the witness component resides on the witness host and storage on the
witness site. Everything is working as expected.

Warning: Disabling and re-enabling of vSAN in a stretched cluster environment has the following
behaviors:

The witness conguration is not persisted. When recreating a stretched cluster vSAN, the witness will
need to be re-congured. If you are using the same witness disk as before, the disk group will need to
be deleted. This can only be done by opening an SSH session to the ESXi host, logging in as a
privileged user and removing the disk group with the esxcli vsan storage remove command.

The fault domains are persisted, but vSAN does not know which FD is the preferred one. Therefore,
under Fault Domains, the secondary FD will need to be moved to the secondary column as per of the
reconguration.

12.6 Upgrading a older vSAN Stretched Cluster

Upgrading a vSAN Stretched Cluster is very easy. It is important though to follow a sequence of steps
to ensure the upgrade goes smoothly.

Upgrading Step 1: Upgrade vCenter Server

As with any vSphere upgrades, it is typically recommended to upgrade vCenter Server rst. While
vCenter Server for Windows installations are supported, the steps below describe the process when
using the vCenter Server Appliance (VCSA).

112

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Log in to the VAM I interface and select Update from the Navigator pane to begin the upgrade
process.

*Refer to the documentation for vCenter Server for Windows to properly upgrade to a newer release of
vCenter Server.

After the upgrade has completed, the VCSA will have to be rebooted for the updates to be applied. It is
important to remember that vSAN Health Check will not be available until after hosts have been
upgraded.

Upgrading Step 2: Upgrade Hosts in Each Site

Upgrading hosts at each site is the next task to be completed. There are a few considerations to
remember when performing these steps.

As with any upgrade, hosts will be required to be put in maintenance mode, remediated, upgraded,
and rebooted. It is important to consider the amount of available capacity at each site. In sites that
have sucient available capacity, it would be desirable to choose the “full data migration” vSAN data
migration method. This method is preferred when site locality is important for read operations. When
the “ensure accessibility” method is selected, read operations will traverse the inter-site link. Some
applications may be more sensitive to the additional time required to read data from the alternate site.

With vSphere DRS in place, as hosts are put in maintenance mode, it is important to ensure the
previously described VM/host groups and VM/host rules are in place. These rules will ensure that
virtual machines are moved to another host in the same site. If DRS is set to “fully automated” virtual
machines will vMotion to other hosts automatically, while “partially automated” or “manual” will
require the virtualization admin to vMotion the virtual machines to other hosts manually.

It is recommended to sequentially upgrade hosts at each site rst, followed by sequentially upgrading
the hosts at the alternate site. This method will introduce the least amount of additional storage

113

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

trac. While it is feasible to upgrade multiple hosts simultaneously across sites when there is
additional capacity, as data is migrated within each site, this method will require additional resources.

Upgrading Step 3: Upgrade the Witness Appliance

After both data sites have been upgraded, the witness will also need to be upgraded. The Health
Check will show that the witness has not been upgraded.

Upgrading the witness is done in the same way that physical ESXi hosts are updated. It is important to
remember that before the witness is upgraded, the witness components will no longer be available and
objects will be noncompliant. Objects will report that the witness component is not found.

After the upgrade is complete, the witness component will return, and will reside on the witness.

114

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Upgrading Step 4: Upgrade the on-disk Format if necessary

The nal step in the upgrade process, will be to upgrade the on-disk format. To use the
new features introduced in vSAN 6.2 like deduplication and compression or checksum,
the on-disk format must be upgraded to version 3.0. The Health Check will assuming
that vSphere 6.0 Update 2 or higher hosts would prefer a version 3.0 on-disk format,
and as a result, it will throw an error until the format is upgraded.

To upgrade the on-disk format from version 2 .0 to version 3.0, select Manage > General under
vSAN. Then click Upgrade under the On-disk Format Version section.

The on-disk format upgrade will perform a rolling upgrade across the hosts in the cluster within each
site. Each host’s disk groups will be removed and recreated with the new on-disk format. The amount
of time required to complete the on-disk upgrade will vary based on the cluster hardware
conguration, how much data is on the cluster, and whatever over disk operations are occurring on the
cluster. The on-disk rolling upgrade process does not interrupt virtual machine disk access and is
performed automatically across the cluster.

The witness components residing on the witness appliance will be deleted and recreated. This process
is relatively quick given the size of witness objects.

Once the on-disk format is complete, the cluster has been upgraded.

115

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

116

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

13. Management and Maintenance


The following section of the guide covers considerations related to management and maintenance of
a vSAN Stretched Cluster conguration.

117

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

13.1 Management and Maintenance

The following section of the guide covers considerations related to management and maintenance of a
vSAN Stretched Cluster conguration.

13.2 Maintenance Mode Consideration

When it comes to maintenance mode in the vSAN Stretched Cluster conguration, there are two
scenarios to consider; maintenance mode on a site host and maintenance mode on the witness host.

Maintenance Mode on a Site Host

Maintenance mode in vSAN Stretched Clusters is site specic. All maintenance modes (Ensure
Accessibility, Full data migration and No data migration) are all supported. However, in order to do a
Full data migration, you will need to ensure that there are enough resources in the same site to
facilitate the rebuilding of components on the remaining node on that site.

Maintenance Mode on the Witness Host

Maintenance mode on the witness host should be an infrequent event, as it does not run any virtual
machines. When maintenance mode is performed on the witness host, the witness components cannot
be moved to either site. When the witness host is put in maintenance mode, it behaves as the No data
migration option would on site hosts. It is recommended to check that all virtual machines are in
compliance and there is no ongoing failure, before doing maintenance on the witness host.

118

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

14. Failure Scenarios


In this section, we will discuss the behavior of the vSAN Stretched Cluster when various failures occur.

119

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

14.1 Failure Scenarios

Failure Scenarios and Component Placement


Understanding component placement is paramount in understanding failure scenarios. The illustration
shows the the placement of a vSAN Object's components in a Stretched Cluster Scenario.

The virtual machine's virtual disk (vmdk) has one component placed in the Preferred Site, one
component placed in the Secondary Site, and a Witness component in the Tertiary Site that houses the
vSAN Witness Host.

The illustration shows a storage policy that will protect across sites, but not within a site. This is the
default protection policy used with vSAN Stretched Clusters for versions 6.1 through 6.5.

Local protection within a site was added in vSAN 6.6.

Below is a vSAN 6.6 cluster with data protection (mirroring) across sites, as well as local data
protection (mirroring in this case) within each of the data sites.

vSAN Stretched Clusters can support up to a single complete site failure and a policy based maximum
of host failures within a site.

120

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If a site has more failures than the local protection policy will allow, then the site is considered failed. It
is important to remember that the vSAN Witness Host, residing in the Tertiary site, is only a single host.
Because of this, a failure of the vSAN Witness Host is also considered a site failure.

The scenarios in this section will cover dierent failure behaviors.

14.2 Individual Host Failure or Network Isolation

What happens when a host fails or is network


isolated?
vSAN will mark the component absent. The VM will continue to run or it will be rebooted by vSphere
HA if the VM was running on the host that went oine.

If the policy includes Local Protection (introduced in vSAN 6.6), reads will be serviced by the remaining
components within the same site.

• The component will be rebuilt within the same site after 60 minutes if there is an additional
host available and the failed host does not come back online.
• If there are no additional hosts available within the site, the component will only be rebuilt if the
failed/isolated host comes back online.
• In cases where multiple hosts that contain the components fail or are isolated, reads will be
services across the inter-site link.

This can be a signicant amount of trac on the inter-site link depending on the amount of
data on the failed host.

If the policy does not include Local Protection, reads will be serviced across the inter-site link.
This can be a signicant amount of trac on the inter-site link depending on the amount of data on
the failed host.

• The component will be rebuilt within the same site as the failed/isolated hosts after 60 minutes
if there is an alternate

121

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

• If there are no additional hosts available, or hosts are at capacity, the component will only be
rebuilt if the failed/isolated host comes back online.

14.3 Individual Drive Failure

What happens when a drive fails?


vSAN will mark the component absent. The VM will continue to run.

If the policy includes Local Protection (introduced in vSAN 6.6), reads will be serviced by the remaining
components within the same site.

• The component will be rebuilt within the same site after 60 minutes if there is an additional
host available and the failed host does not come back online.
• If there are no additional hosts available within the site, the component will only be rebuilt if the
failed/isolated host comes back online.
• In cases where multiple hosts that contain the components fail or are isolated, reads will be
services across the inter-site link.

This can be a signicant amount of trac on the inter-site link depending on the amount of
data on the failed host.

122

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If the policy does not include Local Protection, reads will be serviced across the inter-site link.
This can be a signicant amount of trac on the inter-site link depending on the amount of data on
the failed host.

• The component will be rebuilt within the same site as the failed/isolated hosts after 60 minutes
if there is an alternate
• If there are no additional hosts available, or hosts are at capacity, the component will only be
rebuilt if the failed/isolated host comes back online.

14.4 Multiple Simultaneous Failures

What happens if there are failures at multiple


levels?
The scenarios thus far have only covered situations where a single failure has occurred.

Local Protection in was introduced in vSAN 6.6. Reviewing the vSAN Storage Policy Rule changes,
Number of Failures to Tolerate became Primary Number of Failures to Tolerate , and is directly
associated with site availability. Secondary Number of Failures to Tolerate was introduced with Local
Protection, which works along with Failure Tolerance Method to determine data layout/placement
within a Stretched Cluster site.

Votes and their contribution to object accessibility


The vSAN Design and Sizing Guide goes into further detail about how component availability
determines access to objects. In short, each component has a vote, and a quorum of votes must be
present for an object to be accessible. Each site will have an equal number of votes and there will be
an even distribution of votes within a site. If the total number of votes is an even number, a random
vote will be added.

In the illustration below, an 8 Node vSAN Stretched Cluster (4+4+1) has an object with PFTT=1
(Mirrored across sites) and SFTT=1/FTM Mirroring (Local Protection). Each site has 3 votes each in this
case, with a total of 9 votes.

123

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If the vSAN Witness Host fails, 3 votes are no longer present, leaving only 6 accessible. Only 66% of
the components are available, so the vmdk is still accessible.

With the loss of a vSAN Disk, 4 votes are no longer present, leaving only 5 accessible. Because 55.5%
of the components are available, the vmdk is still accessible.

124

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

If a component in another location, on either site, regardless of having Local Protection enabled, were
to fail, the vmdk would be inaccessible.

Below, the illustration shows the vSAN Witness Host oine, 1 failure in the Preferred Site, and 1
failure in the Secondary Site.

In this illustration, the vSAN Witness Host oine and 2 failures in the Preferred Site.

125

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In both cases , only 44.4% of the votes are present (4 of 9), resulting in the virtual machine's vmdk
being inaccessible.

It is important to understand that components, the weight of their votes, and their presence/absence
determine accessibility of the vSAN object.

In each of the above failure cases, restoring access to the existing vSAN Witness would make the
object accessible.
Deploying a new vSAN Witness would not because the components would not be present.

14.5 Recovering from a Complete Site Failure

The descriptions of the host failures previously, although related to a single host failure, are also
complete site failures. VMware has modied some of the vSAN behavior when a site failure occurs and
subsequently recovers. In the event of a site failure, vSAN will now wait for some additional time for
“all” hosts to become ready on the failed site before it starts to sync components. The main reason is
that if only some subset of the hosts come up on the recovering site, then vSAN will start the rebuild
process. This may result in the transfer of a lot of data that already exists on the nodes that might
become available at some point in time later on.

VMware recommends that when recovering from a failure, especially a site failure, all nodes in the site
should be brought back online together to avoid costly resync and reconguration overheads. The
reason behind this is that if vSAN bring nodes back up at approximately the same time, then it will only
need to synchronize the data that was written between the time when the failure occurred and the
when the site came back. If instead nodes are brought back up in a staggered fashion, objects might to
be recongured and thus a signicant higher amount of data will need to be transferred between sites.

14.6 How Read Locality is Established After Failover

How Read Locality is Established After Failover to Other Site?

A common question is how read locality is maintained when there is a failover. This guide has already
described read locality, and how in a typical vSAN deployment, a virtual machine reads equally from all

126

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

of its replicas in a round-robin format. In other words, if a virtual machine has two replicas as a result
of being congured to tolerate one failure, 50% of the reads come from each replica.

This algorithm has been enhanced for Stretched Clusters so that 100% of the reads comes from vSAN
hosts on the local site, and the virtual machine does not read from the replica on the remote site. This
avoids any latency that might be incurred by reading over the link to the remote site. The result of this
behavior is that the data blocks for the virtual machine are also cached on the local site.

In the event of a failure or maintenance event, the virtual machine is restarted on the remote site. The
100% rule continues in the event of a failure. This means that the virtual machine will now read from
the replica on the site to which it has failed over. One consideration is that there is no cached data on
this site, so cache will need to warm for the virtual machine to achieve its previous levels of
performance. New for vSAN 6.6, when a Site Anity rule is used in conjunction with a PFTT=0 policy,
data is not present on the alternate site.

When the virtual machine starts on the other site, either as part of a vMotion operation or a power on
from vSphere HA restarting it, vSAN instantiates the in-memory state for all the objects of said virtual
machine on the host where it moved. That includes the “owner” (coordinator) logic for each object. The
owner checks if the cluster is setup in a “stretch cluster” mode, and if so, which fault domain it is
running in. It then uses the dierent read protocol — instead of the default round-robin protocol
across replicas (at the granularity of 1MB), it sends 100% of the reads to the replica that is on the same
site (but not necessarily the same host) as the virtual machine.

14.7 Replacing a Failed Witness Host

Should a vSAN Witness Host fail in the vSAN Stretched Cluster, a new vSAN Witness Host can easily be
introduced to the conguration.

Swapping out the vSAN Witness Host Pre-6.6


Navigate to Cluster > Manage > vSAN > Fault Domains.

The failing witness host can be removed from the vSAN Stretched Cluster via the UI (red X in fault
domains view).

127

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The next step is to rebuild the vSAN stretched and selecting the new witness host. In the same view,
click on the “congure stretched cluster” icon. Align hosts to the preferred and secondary sites as
before. This is quite simple to do since the hosts are still in the original fault domain, so simply select
the secondary fault domain and move all the hosts over in a single click:

Select the new witness host:

Create the disk group and complete the vSAN Stretched Cluster creation.

On completion, verify that the health check failures have resolved. Note that the vSAN Object health
test will continue to fail as the witness component of VM still remains “Absent”. When CLOMD (Cluster
Level Object Manager Daemon) timer expires after a default of 60 minutes, witness components will
be rebuilt on new witness host. Rerun the health check tests and they should all pass at this point, and
all witness components should show as active.

14.8 VM Provisioning When a Site is Down

If there is a failure in the cluster, i.e. one of the sites is down; new virtual machines can still be
provisioned. The provisioning wizard will however warn the administrator that the virtual machine does

128

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

not match its policy as follows:

In this case, when one site is down and there is a need to provision virtual machines, the
ForceProvision capability is used to provision the VM. This means that the virtual machine is
provisioned with a NumberOfFailuresToTolerate = 0, meaning that there is no redundancy.
Administrators will need to rectify the issues on the failing site and bring it back online. When this is
done, vSAN will automatically update the virtual machine conguration to
NumberOfFailuresToTolerate = 1, creating a second copy of the data and any required witness
components.

14.9 Site Failure or Network Partitions

What happens when sites go oine, or lose


connectivity?
A typical vSAN Stretched Cluster conguration can be seen here:

Preferred Site Failure or Completely Partitioned

129

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In the event the Preferred Site fails or is partitioned, vSAN powers the virtual machines running in that
site o. The reason for this, is because the virtual machine's components are not accessible due to the
loss of quorum. The vSAN Stretched Cluster has now experienced a single site failure. The loss of
either site in addition to the witness is two failures, will take the entire cluster oine.

An HA master node will be elected in the Secondary Site, which will validate which virtual machines are
to be powered on. Because quorum has been formed between the vSAN Witness Host and the
Secondary Site, virtual machines in the Secondary Site will have access to their data, and can be
powered on.

130

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Secondary Site Failure or Partitioned


In the event the Secondary Site fails or is partitioned, vSAN powers the virtual machines running in
that site o. The reason for this, is because the virtual machine's components are not accessible due to
the loss of quorum. The vSAN Stretched Cluster has now experienced a single site failure. The loss of
either site in addition to the witness is two failures, will take the entire cluster oine.

131

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The HA Master on the Preferred Site, will validate which virtual machines are to be powered on. Virtual
machines which have been moved to the Preferred Site will now have access to their data, and can be
powered on.

132

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

vSAN Witness Host Failure or Partitioned


Virtual machines running in both of the main sites of a Stretched Cluster are not impacted by the vSAN
Witness Host being partitioned. Virtual machines continue to run at both locations. The vSAN
Stretched Cluster has now experienced a single site failure. The loss of either site in addition to the
witness is two failures, will take the entire cluster oine.

133

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

In the event the vSAN Witness Host has failed, the behavior is the same as if the vSAN Witness Hosts
were partitioned from the cluster. Virtual machines continue to run at both locations. Because the
vSAN Stretched Cluster has now experienced a single site failure, it is important to either get the vSAN
Witness Host back online, or deploy a new one for the cluster.

134

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

When the existing vSAN Witness Host comes back online or a new vSAN Witness Host is deployed,
metadata changes are resynchronized between the main Stretched Cluster sites and the vSAN Witness
Host. The amount of data that needs to be transmitted depends on a few items such as the number of
objects and the number of changes that occurred while the vSAN Witness Host was oine. However,
this amount of data is relatively small considering it is metadata, not large objects such as virtual disks.

135

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Intelligent site continuity for stretched clusters


In vSAN 6.7 a number of improvements have been made to the failure response logic when tracking
the "tness" of a site for failover/failback in the event that links to the witness and adjacent data site
become available at dierent times, illustrated below.

136

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

This enhancement addresses a scenario in which the preferred site is completely isolated (a break in
the ISL link, and to the witness site), and vSAN properly fails over to the secondary site. In the event
that the witness site regains connectivity to the preferred site, vSAN 6.7 will properly track the
“tness” of the site and maintain the Secondary Site as the active site until the ISL is recovered. This
helps prevent a false positive of the primary site being back up, and failing back over to it, and
attempting to use stale components.

14.10 Ecient inter-site resync for stretched clusters

With the introduction of vSAN 6.7 we have made improvements to the resync mechanism in stretched
clusters that further reduces the time from component failure to object compliance. One particular
example is the use of a "proxy owner" host for objects that need to be resynced across sites following
a failure.

Take the example of an object with the storage policy: PFTT=1, SFTT=1.

Let us assume that Site B (Secondary) has had a failure and that all components on that site need to be
rebuilt, in previous versions of vSAN the resync would read the Primary site's data and synchronise it
multiple times across the inter-site link (ISL), so if two components needed to be rebuilt on the
Secondary site the data would be transmitted twice across the ISL raising the bandwidth utilisation on
the ISL for the duration of the resync.

In 6.7 resyncs made to the remote site with now be copied once to a proxy host and then copied within
that site to the other hosts from the proxy host, this lowers the trac cost for resyncs across sites
from one copy per component (could be multiple if using R5/R6) down to a single copy.

137

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

The above gure illustrates the ow of how components are rebuilt in vSAN 6.7 with the proxy owner
improvement for inter-site resyncs. First, a copy is made to Site B, then the object is re-created within
the site and nally, the data is copied from the proxy host to any other hosts within the site.

14.11 Failure Scenario Matrices

Failure Scenario Matrices


Hardware/Physical Failures

Impact/Observed VMware
Scenario vSAN Behaviour HA Behaviour

Disk Group is marked as failed


and all components present on
Cache disk failure VM will continue running.
it will rebuild on another Disk
Group.

Disk Group is marked as failed


Capacity disk failure (Dedupe and all components present on
VM will continue running.
and Compression ON) it will rebuild on another Disk
Group.

Disk marked as failed and all


Capacity disk failure (Dedupe
components present on it will VM will continue running.
and Compression OFF)
rebuild on another disk.

138

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Impact/Observed VMware
Scenario vSAN Behaviour HA Behaviour

All components present on the


Disk Group failure/oine Disk Group will rebuild on VM will continue running.
another Disk Group.

All Disk Groups backed by the


HBA/RAID card will be marked
RAID/HBA card failure absent and all components VM will continue running.
present will rebuild on other
Disk Groups.

Component on the host will be VM will continue running


marked as absent by vSAN – if on another host. If the
component rebuild will be VM was running on the
Host failure
kicked o after 60 minutes if same host as the failure a
the host does not come back HA restart of the VM will
up. take place.

Components present on the VM will continue running


host will be marked as absent if on another host. If the
by vSAN – component rebuilds VM was running on the
Host isolation
will be kicked o after 60 same host as the failure a
minutes if the host does not HA restart of the VM will
come back online. take place.

Witness loss counts as a site


(PFTT) failure, as such the
Witness loss / failed / isolated cluster will be placed in a
VM will continue running.
from one or both sites degraded state until the witness
comes back online or is
redeployed.

VMs running on the


partitioned/failed site are
powered o.

Site is declared lost, quorum is If there is an ISL loss, HA


Data site failure or partition, ISL established between the will restart the VMs from
failure / connectivity loss witness and the remaining data the secondary site on the
site. preferred site.

If the preferred site has


failed, they will restart on
the secondary site.

VMs stop running. HA


Dual site loss Cluster oine. cannot restart VMs until
quorum is re-established.

139

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Policy Implications
The below table is built based on the following example policy conguration:

PFTT = 1, SFTT = 2, FTM = R5/6

140

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Impact/Observed VMware HA
Scenario vSAN Behaviour
Behaviour

VMs running on the


partitioned/failed site are
powered o.
Site marked as failed and rebuild
of components will begin when
HA will restart the VMs from
the failed site comes online again.
Single site failure (PFTT) the secondary site on the
This is also triggered in the event
preferred site.
that the Witness site is lost as it is
viewed as a discrete site.
If the preferred site has failed,
they will restart on the
secondary site.

Disk and disk group failures


will not aect VM running
state.
Single disk, disk group, All components present will
VMs will continue running if
host failure on one site rebuild on their respective fault
on a host other than the one
(SFTT) domain.
that failed. If the VM was
running on the failed host a
HA restart of the VM will take
place.

The site is marked as failed,


VMs running on the failed site
Dual disk, disk group, Site marked as failed by vSAN, are powered o.
host failure on one site component rebuilds will begin
(SFTT) when the site comes online again. HA will restart the VMs from
the failed site on the
remaining site.

Disk and disk group failures


Site marked as failed, disk/disk
will not aect VM running
group/host also marked as failed.
state.
Components present on the failed
Single site failure (PFTT) VMs will continue running if
site will wait for the site to come
and single disk, disk they are running on a host/site
online again in order to rebuild.
group, host failure across other than the ones that
remaining sites (SFTT) failed.
Components present on the failed
disk/disk group/host will rebuild
If the VM was on the failed
on their respective fault domain
host/site a HA restart of the
within the same site.
VM will take place.

141

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Impact/Observed VMware HA
Scenario vSAN Behaviour
Behaviour

Site marked as failed, both disks/ Disk and disk group failures
disk groups/hosts also marked as will not aect VM running
failed. state.

Single site failure (PFTT) Components present on the failed VMs will continue running if
and dual disk, disk group, site will wait for the site to come they are running on a host/site
host failure across online again in order to rebuild. other than the ones that
remaining sites (SFTT) failed.
Components present on the failed
disks/disk groups/hosts will If the VM was on the failed
rebuild on their respective fault hosts/site a HA restart of the
domain within the same site. VM will take place.

Cluster oine. Both sites marked


as failed by vSAN. A site will need
to come back online to bring vSAN
online.
VMs will stop running and the
cluster will be oine until a
This is as a result of the single
site is brought back online.
PFTT failure (for example the
Single site failure (PFTT)
witness) and the dual SFTT failure.
and triple disk, disk Due to the PFTT and SFTT
The policy species "SFTT=2"
group, host failure across violations (single site failure
which, during a PFTT violation is
remaining sites (SFTT) and triple SFTT failure) the
counted globally across sites due
cluster's objects will have lost
to quorum implications.
quorum and as-such cannot
run.
The cluster can be brought up
again by bringing the failed site
back online or replacing the failed
SFTT devices on remaining sites.

Cluster oine. A site will need to VMs will stop running and the
Dual site failure (PFTT) come back online to bring vSAN cluster will be oine until a
back online. site is brought back online.

Disk and disk group failures


will not aect VM running
state.

The site with a dual failure will be VMs will continue running if
marked as failed by vSAN, on a host other than the one
components residing on the site that failed. If the VM was on
Single disk, disk group,
will need to wait for it to come the failed host as the failure a
host failure on one site
online to rebuild. HA restart of the VM will take
(SFTT) and Dual disk,
place.
disk group, host failure
The site with the single failure will
on another site (SFTT)
have its components rebuilt on VMs running on the failed site
their respective fault domain are powered o, HA will
within the site. restart the VMs from the
secondary site on the
preferred site. If the preferred
site has failed, they will restart
on the secondary site.

142

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

Impact/Observed VMware HA
Scenario vSAN Behaviour
Behaviour

Cluster oine. A site will need to


come back online to bring vSAN
online.
Dual disk, disk group,
This is as a result of the dual SFTT
host failure on one site VMs will stop running and the
failure and the policy specifying
(SFTT) and Dual disk, cluster will be oine until a
"SFTT=2", after which the site is
disk group, host failure site is brought back online.
marked as failed.
on another site (SFTT)
This can be achieved by replacing
the failed SFTT devices on either
site.

Component failure of any


The component out of compliance VM will continue to run as
sort when insucient
will not rebuild until adequate long as the policy has not
failure domains are
failure domains are available. been violated.
available for a rebuild

A special case should be called out for policies congured with PFTT=0 and Anity=Secondary

Impact/Observed vSAN
Scenario vSAN Behaviour
Behaviour

No component
reconguration across sites is
required due to PFTT=0. In
addition, as site anity is set
to secondary the VM will not
Sites are partitioned due to be HA restarted on the
VM continues running.
ISL loss preferred site, rather it will be
allowed to run in place on the
secondary site. This is because
the object is in compliance
with its policy and does not
need site-quorum to run.

143

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

15. Appendix A
A list of links to additional vSAN resources is included below.

144

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

15.1 Appendix A: Additional Resources

A list of links to additional vSAN resources is included below.

• vSAN 6.0 Proof Of Concept Guide


• vSAN 6.1 Health Check Plugin Guide
• vSAN Stretched Cluster Bandwidth Sizing Guidance
• Tech note: New VSAN 6.0 snapshot format vsanSparse
• vSAN 6.2 Design and Sizing Guide
• vSAN Troubleshooting Reference Manual
• RVC Command Reference Guide for vSAN
• vSAN Administrators Guide
• vSAN 6.0 Performance and Scalability Guide

15.2 Location of the vSAN Witness Appliance OVA

The vSAN Witness Appliance OVA is located on the Drivers & Tools tab of vSAN download page. There
you will nd a section called VMware vSAN tools & plug-ins. This is where the "Stretch Cluster Witness
VM OVA" is located. The URL is:

https://my.vmware.com/web/vmware/info/slug/datacenter_cloud_infrastructure/
vmware_virtual_san/6_0#drivers_tools

145

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

16. Appendix B
New ESXCLI commands for vSAN Stretched Cluster.

146

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

16.1 Appendix B: Commands for vSAN Stretched Clusters

ESXCLI

New ESXCLI commands for vSAN Stretched Cluster.

esxcli vsan cluster preferredfaultdomain

Display the preferred fault domain for a host:

[root@cs-ie-dell04:~] esxcli vsan cluster preferredfaultdomain


Usage: esxcli vsan cluster preferredfaultdomain {cmd} [cmd options]

Available Commands:
get Get the preferred fault domain for a stretched cluster.
set Set the preferred fault domain for a stretched cluster.

[root@cs-ie-dell04:~] esxcli vsan cluster preferredfaultdomain get


Preferred Fault Domain Id: a054ccb4-ff68-4c73-cb c2-d272d45e32df
Preferred Fault Domain Name: Preferred
[root@cs-ie-dell04:~]

esxcli vsan cluster unicastagent

An ESXi host in a vSAN Stretched Cluster communicated to the witness host via a unicast agent over
the vSAN network. This command can add, remove or display information about the unicast agent,
such as network port.

[root@cs-ie-dell02:~] esxcli vsan cluster unicastagent


Usage: esxcli vsan cluster unicastagent {cmd} [cmd options]

Available Commands:
add Add a unicast agent to the vSAN cluster configuration.
list List all unicast agents in the vSAN cluster configuration.
remove Remove a unicast agent from the vSAN cluster configuration.

[root@cs-ie-dell02:~] esxcli vsan cluster unicastagent list


IP Address Port
---------- -----
172.3.0.16 12321
[root@cs-ie-dell02:~]

RVC–Ruby vSphere Console

The following are the new stretched cluster RVC commands:


vsan.stretchedcluster.config_witness
Configure a witness host. The name of the cluster, the witness host and the preferred fault doma

/localhost/Site-A/compu ters> vsan.stretchedcluster.config_witness -h


usage: config_witness cluster witness_host preferred_fault_domain
Configure witness host to form a vSAN Stretched Cluster
cluster: A cluster with vSAN enabled
witness_host: Witness host for the stretched cluster
preferred_fault_domain: preferred fault domain for witness host
--help, -h: Show this message
/localhost/Site-A/computers>

vsan.stretchedcluster.remove_witness
Remove a witness host. The name of the cluster must be provided as an argument to the command.

147

Copyright © 2018 VMware, Inc. All rights reserved.


vSAN Stretched Cluster & 2 Node Guide

/localhost/Site-A/compu ters> vsan.stretchedcluster.remove_witness -h


usage: remove_witness cluster
Remove witness host from a vSAN Stretched Cluster
cluster: A cluster with vSAN stretched cluster enabled
--help,-h: Show this message

vsan.stretchedcluster.witness_info
Display information about a witness host. Takes a cluster as an argument.

/localhost/Site-A/compu ter s> ls


0 Site-A (cluster): cpu 100 GHz, memory 241 GB
1 cs-ie-dell04.ie.local (standalone): cpu 33 GHz, memory 81 GB
/localhost/Site-A/compu ters> vsan.stretchedcluster.witness_info 0
Found witness host for vSAN stretched cluster.
+------------------------+
--------------------------------------+
| Stretched Cluster | Site-A |
+------------------------+--------------------------------------+
| Witness Host Name | cs-ie-dell04.ie.local |
| Witness Host UUID | 55684ccd-4ea7-002d-c3a 9-ecf4bbd59370 |
| Preferred Fault Domain | Preferred |
| Unicast Agent Address | 172.3.0.16 |
+------------------------+--------------------------------------+

148

Copyright © 2018 VMware, Inc. All rights reserved.

Das könnte Ihnen auch gefallen