Sie sind auf Seite 1von 38

ATS Masters - Storage

Stretched SVC Cluster

2012 IBM Corporation


ATS Masters - Storage

Agenda
A little history
Options and issues
Requirements and restrictions

2012 IBM Corporation


ATS Masters - Storage

Terminology
SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster
Two independent SVC nodes in two independent sites + one independent site for Quorum
Acts just like a single I/O Group with distributed high availability
Site 1 Site 2

I/O Group 1 I/O Group 1

Distributed I/O groups NOT a HA Configuration and not recommended, if one site failed:
Manual volume move required
Some data still in cache of offline I/O Group
Site 1
Site 2
I/O Group 1

I/O Group 1 I/O Group 2


I/O Group 2

Storwize V7000 Split I/O Group not an option:


Single enclosure includes both nodes
Physical distribution across two sites not possible
Site 1 Site 2

3 2012 IBM Corporation


ATS Masters - Storage

Early Days - separate racks in machine room

Servers

Lots of cross-cabling
Fabric Fabric
A B

Protected from physical problems


Plumbing leak, fire
SVC + UPS A chance to tackle the guy with a
chainsaw SVC + UPS
Wheres the quorum disk?

2012 IBM Corporation


ATS Masters - Storage

Fabric presence on both sides of machine room

Servers

Fabric Fabric
A A

Cross cabling only needed


for fabrics and SVC nodes
SVC + UPS SVC + UPS

Fabric Fabric
B B

Requirement for zero-hop between nodes in an I/O Group


Needed to zone away ISL connections within an I/O Group
Each fabric has two sets of zones for the two sets of ports.
Quorum concern remains
2012 IBM Corporation
ATS Masters - Storage

Server cluster also stretched across machine room


Can do cluster failover
Wheres the storage?
Server Cluster Server Cluster

Fabric Fabric
A A

Cross cabling only needed


for fabrics and SVC nodes
SVC + UPS SVC + UPS

Fabric Fabric
B B

Requirement for zero-hop between nodes in an I/O Group


NeededtozoneawayISLconnectionswithinanI/OGroup
Each fabric has two sets of zones for the two sets of ports.
Quorum concern remains
2012 IBM Corporation
ATS Masters - Storage

SVC V4.3 Vdisk (Volume) Mirroring


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

Cross cabling only needed


for fabrics and SVC nodes
SVC + UPS SVC + UPS

Fabric Fabric
B B

SVC Volume has a copy


on either side of machine room

Requirement for zero-hop between nodes in an I/O Group


NeededtozoneawayISLconnectionswithinanI/OGroup
Each fabric has two sets of zones for the two sets of ports.
Quorum concern remains
2012 IBM Corporation
ATS Masters - Storage

SVC V5.1 put LW SFPs in nodes for 10km distance


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

LW SFPs w/SM fibre allows up to 10km

SVC + UPS SVC + UPS

Fabric Fabric
B B

SVC Volume has a copy


at each site
Wheres the quorum?

Requirement for zero-hop between nodes in an I/O Group


NeededtozoneawayISLconnectionswithinanI/OGroup
Each fabric has two sets of zones for the two sets of ports.
Quorum concern remains
2012 IBM Corporation
ATS Masters - Storage

SVC V5.1 stretched cluster with 3rd site -1


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

LW SFPs w/SM fibre allows up to 10km

SVC + UPS SVC + UPS

Fabric Fabric
B B

Ok, but?

Active / passive storage devices


(like DS3/4/5K):
Each quorum disk storage
controller must be connected to
both sites
2012 IBM Corporation
ATS Masters - Storage

SVC V5.1 stretched cluster with 3rd site -2


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

LW SFPs w/SM fibre allows up to 10km

SVC + UPS SVC + UPS

Fabric Fabric
B B

Active / passive storage devices


(like DS3/4/5K):
Each quorum disk storage
controller must be
connected to both sites
2012 IBM Corporation
ATS Masters - Storage

SVC V5.1 stretched cluster with 3rd site -3


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

LW SFPs w/SM fibre allows up to 10km

SVC + UPS SVC + UPS

Fabric Fabric
B B

Active / passive storage devices


(like DS3/4/5K): LOTS OF CROSS CABLING!
Each quorum disk storage
controller must be
connected to both sites
2012 IBM Corporation
ATS Masters - Storage

SVC V6.3-option 1: Same as V5 but farther using DWDM


Can do cluster failover and
Mirroring allows single Volume
Server Cluster to be visible on both sides Server Cluster

Fabric Fabric
A A

DWDM allows up to 40km


Speed drops after 10km

SVC + UPS SVC + UPS

Fabric Fabric
B B

Active / passive storage devices


(like DS3/4/5K):
Each quorum disk storage
controller must be
connected to both sites
2012 IBM Corporation
ATS Masters - Storage

SVC V6.3-option 1 (cont)


ser chooses number of ISLs on SAN
U
Still no hops between nodes in an I/O Group
Server Cluster Server Cluster
These connections can be on DWDM too.

Active or
Fabric passive Fabric
A DWDM over
B
shared single
mode fibre(s)

SVC + UPS 0-10 KM Fibre Channel distance SVC + UPS


supported up to 8Gbps
11-20KM Fibre Channel distance
supported up to 4Gbps
Fabric 21-40KM Fibre Channel distance Fabric
A supported up to 2Gbps
B

ser chooses number of ISLs on SAN


U
Still no hops between nodes in an I/O Group
These connections can be on DWDM too.

wo ports per SVC node attached to local switchess


T
Two ports per SVC node attached to remote switches via DWDM
Hosts and storage attached to local switches, need enough ISLs
3rd site quorum (not shown) attached to both fabrics
2012 IBM Corporation
ATS Masters - Storage

SVC V6.3 option 2: Dedicated ISLs for nodes (can use DWDM)
ser chooses number of ISLs on public SAN
U
Only half of all SVC ports used for host I/O
Server Cluster Server Cluster

ublic
P rivate At least 1 ISL
P Private Public
Fabric Fabric Fabric Fabric
Trunk if more than 1 C A
A C

Distance now up to 300km


Apps may require less
SVC + UPS SVC + UPS

Public Private Private Public


Fabric At least 1 ISL Fabric
Fabric D Trunk if more than 1
Fabric B
B D

ser chooses number of ISLs on public SAN


U
Only half of all SVC ports used for host I/O

wo ports per SVC node attached to public fabrics


T
Two ports per SVC node attached to dedicated fabrics
Hosts and storage attached to public fabrics
3rd site quorum (not shown) attached to public fabrics
2012 IBM Corporation
ATS Masters - Storage

Configuration Using Brocade Virtual Fabrics


hysical switches are partitioned into
P
two logical switches, two virtual fabrics
Server Cluster Server Cluster

ublic
P rivate
P rivate
P ublic
P
Fabric Fabric Fabric Fabric
A A A A
ote ISLs/Trunks for private
N
SANs are dedicated rather than
being shared to guarantee
dedicated bandwidth available
SVC + UPS for node to node traffic SVC + UPS

ublic
P rivate
P rivate
P ublic
P
Fabric Fabric Fabric Fabric
B B B B

ote ISLs/Trunks for private fabrics are dedicated


N
rather than being shared to guarantee dedicated
bandwidth is available for node to node traffic
2012 IBM Corporation
ATS Masters - Storage

Configuration Using CISCO VSANs


User chooses number of ISLs on public SAN

Server Cluster Switches/fabrics are partitioned using VSANs Server Cluster

ublic
P Private ISL per I/O group
1 Private ublic
P
VSAN VSAN Configured as trunk VSAN VSAN
A A A A
ote ISLs/Trunks for private
N
VSANs are dedicated rather
than being shared to guarantee
dedicated bandwidth available
SVC + UPS for node to node traffic SVC + UPS

ublic
P Private ISL per I/O group
1 Private ublic
P
VSAN VSAN Configured as trunk VSAN VSAN
B B B B

User chooses number of ISLs on public SAN

wo ports per SVC node attached to public VSANs


T
Two ports per SVC node attached to private VSANs
Hosts and storage attached to public VSANs
3rd site quorum (not shown) attached to public VSANs
2012 IBM Corporation
ATS Masters - Storage

Split I/O Group Distance


The new Split I/O Group configurations will support
distances of up to 300km (same recommendation as for
Metro Mirror)
However for the typical deployment of a split I/O group
only 1/2 or 1/3rd of this distance is recommended because
there will be 2 or 3 times as much latency depending on
what distance extension technology is used
The following charts explain why

17 2012 IBM Corporation


ATS Masters - Storage

Metro/Global Mirror

o Technically SVC supports distances up to 8000km


SVC will tolerate a round trip delay of up to 80ms between

nodes

o The same code is used for all inter-node communication


Global Mirror, Metro Mirror, Cache Mirroring, Clustering
SVCs proprietary SCSI protocol only has 1 round trip

o In practice Applications are not designed to support a Write


I/O latency of 80ms
Hence Metro Mirror is deployed for shorter distances (up to

300km) and Global Mirror is used for longer distances

18 2012 IBM Corporation


ATS Masters - Storage
IBM Presentation Template Full Version

M/etro Mirror: Application Latency = 1 long distance round trip

Server Cluster 1 Server Cluster 2

1) Write request from host


2) Xfer ready to host
3) Data transfer from host
6) Write completed to host
1 round trip
4) Metro Mirror Data transfer to remote site
5) Acknowledgment

SVC Cluster 1 SVC Cluster 2

7a) Write request from SVC 7b) Write request from SVC
8a) Xfer ready to SVC 8b) Xfer ready to SVC
9a) Data transfer from SVC 9b) Data transfer from SVC
10a) Write completed to SVC Steps 16 affect application latency 10b) Write completed to SVC
Steps 710 should not affect the application

Data center 1 Data center 2


19 2012 IBM Corporation
ATS Masters - Storage
IBM Presentation Template Full Version

Split I/O Group Preferred Node local, Write uses 1 round trip

Server Cluster 1 Server Cluster 2

1) Write request from host


2) Xfer ready to host
3) Data transfer from host
6) Write completed to host
1 round trip
4) Cache Mirror Data transfer to remote site
5) Acknowledgment
Node 1 Node 2
SVC Split I/O Group

7b) Write request from SVC


8b) Xfer ready to SVC
9b) Data transfer from SVC
10b) Write completed to SVC

2 round trips but SVC


write cache hides this
latency from the host
Steps 16 affect application latency
Steps 710 should not affect the application
Data center 1 Data center 2
20 2012 IBM Corporation
ATS Masters - Storage
IBM Presentation Template Full Version

Split I/O Group Preferred node remote, Write = 3 round trips

Server Cluster 1 Server Cluster 2

2 round trips 1) Write request from host


2) Xfer ready to host
3) Data transfer from host
6) Write completed to host

1 round trip
4) Cache Mirror Data transfer to remote site
5) Acknowledgment
Node 1 Node 2
SVC Split I/O Group

7b) Write request from SVC


8b) Xfer ready to SVC
9b) Data transfer from SVC
10b) Write completed to SVC

2 round trips but SVC


write cache hides this
latency from the host
Steps 16 affect application latency
Steps 710 should not affect the application
Data center 1 Data center 2
21 2012 IBM Corporation
ATS Masters - Storage

Help with some round trips

Some switches and distance extenders use extra buffers and


proprietary protocols to eliminate one of the round trips worth
of latency for SCSI Write commands

These devices are already supported for use with SVC


No benefit or impact inter-node communication

Does benefit Host to remote SVC I/Os

Does benefit SVC to remote Storage Controller I/Os

22 2012 IBM Corporation


ATS Masters - Storage
IBM Presentation Template Full Version

Split I/O Group Preferred Node Remote with help, 2 round trips
5) Write request to SVC
6) Xfer ready from SVC
7) Data transfer to SVC
10) Write completed from SVC
Server Cluster 1 Server Cluster 2
1 round trip 1) Write request from host
11) Write completion to remote site 2) Xfer ready to host
3) Data transfer from host
4) Write+ data transfer to remote site 12) Write completed to host

1 round trip
8) Cache Mirror Data transfer to remote site
9) Acknowledgment
Node 1 Node 2
SVC Split I/O Group

16) Write+ data transfer to remote site


17) Write request to storage
21) Write completion to remote site 18) Xfer ready from storage
19) Data transfer to storage
1 round trip hidden 20) Write completed from
storage
from the host
Distance Extenders
13) Write request from SVC
14) Xfer ready to SVC
15) Data transfer from SVC
22) Write completed to SVC

Steps 1 to 12 affect application latency


Steps 13 to 22 should not affect the application Data center 2
Data center 1
23 2012 IBM Corporation
ATS Masters - Storage

Long Distance Impact


Additional latency because of long distance
Light speed in glass: ~ 200.000 km/sec
1 km distance = 2 km round trip
Additional round trip time because of distance:
1 km = 0.01 ms
10 km = 0.10 ms
25 km = 0.25 ms
100 km = 1.00 ms
300 km = 3.00 ms
SCSI protocol:
Read: 1 I/O operation = 0.01 ms / km
Initiator requests data and target provides data
Write: 2 I/O operations = 0.02 ms / km
Initiator announces amount of data, target acknowledges
Initiator send data, target acknowledge
SVCsproprietarySCSIprotocolfornode-to-node traffic has only 1 round trip
Fibre channel frame:
User data per FC frame (Fibre channel payload): up to 2048 bytes = 2KB
Also for very small user data (< 2KB) a complete frame is required
Large user data is split across multiple frames

24 2012 IBM Corporation


ATS Masters - Storage

SVC Split I/O Group Quorum Disk


SVC creates three Quorum disk candidates on the first three
managed MDisks
One Quorum disk is active
SVC 5.1 and later:
SVC is able to handle the Quorum disk management in a very flexible
way, but in a Split I/O Group configuration a well defined setup is
required.
->DisablethedynamicquorumfeatureusingtheoverrideflagforV6.2
and later
svctask chquorum -MDisk <mdisk_id or name> -override yes
This flag is currently not configurable in the GUI

Split Brain situation:


SVC uses the quorum disk to decide which SVC node(s) should survive
No access to the active Quorum disk:
In a standard situation (no split brain): SVC will select one of the other
Quorum candidates as active Quorum

25 2012 IBM Corporation


ATS Masters - Storage

SVC Split I/O Group Quorum Disk


Quorum disk requirements: Only certain storage supported
Must be placed in a third, independent site
Storage box must be fibre channel connected
ISLs with one hop to Quorum storage system are supported
Supported infrastructure:
WDM equipment similar to Metro Mirror
Link requirement similar to Metro Mirror
Max round trip delay time is 80 ms, 40 ms each direction
FCIP to Quorum disk can be used with the following requirements:
Max round trip delay time is 80 ms, 40 ms each direction
If fabrics are not merged, routers required
Independent long distance equipment from each site to Site 3 is required

iSCSI storage not supported


Requirement for active / passive storage devices (like DS3/4/5K):
Each quorum disk storage controller must be connected to both sites

26 2012 IBM Corporation


ATS Masters - Storage

3rd-site quorum supports Extended Quorum

27 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group without ISLs between SVC


nodes
Split I/O Group without ISLs between SVC nodes (Classic Split I/O Group)
SVC 6.2 and earlier:
TwoportsoneachSVCnodeneededtobeconnectedtotheremoteswitch
No ISLs between SVC nodes
Third site required for Quorum disk
ISLs with max. 1 hop can be used for storage traffic and Quorum disk attachment

SVC 6.2 (late) update:


Distance extension to max. 40 km with passive WDM devices
Up to 20km at 4Gb/s or up to 40km at 2Gb/s.
LongWave SFPs for long distances required
LongWave SFPs must be supported from the switch and WDM vendor

SVC 6.3:
Similar to the support statement in SVC 6.2
Additional: support for active WDM devices Minimum Maximum Maximum
Quorum disk requirement similar to Remote Copy (MM/GM) distance distance Link
requirments: Speed
Max. 80 ms Round Trip delay time, 40 ms each direction
FCIP connectivity supported for quorum disk >= 0 km = 10 km 8 Gbps
No support for iSCSI storage system
> 10 km = 20 km 4 Gbps
> 20km = 40km 2 Gbps

28 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group without ISLs between SVC


nodes
Server 1
Site 1 Site 2 Server 2

Switch 1 Switch 3

Switch 2 Switch 4

SVC node1 SVC node2

Storage Storage

Site 3

Active
Quorum

29 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group without ISLs between SVC


nodes
Site 1 Site 2
Server 1 Server 2

ISL (Server) ISL (Server)

Switch 1 Switch 3

Switch 2 Switch 4

ISL (Server) ISL (Server)

SVC node 1 SVC node 2

Storage 3 Storage 2

Switch 5 Switch 6

Site 3
Act. Quorum

30 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group without ISLs between SVC


nodes

Site 1 Site 2
Server 1 Server 2

ISL (Server) ISL (Server)

Switch 1 Switch 3

Switch 2 Switch 4

ISL (Server) ISL (Server)

SVC node 1 SVC node 2

Storage 3 Storage 2

Switch 5 Switch 6

Ctl. A Ctl. B Site 3


DS4700

Act. Quorum

31 2012 IBM Corporation


ATS Masters - Storage

SAN and Buffer-to-Buffer Credits


Buffer-to-Buffer (B2B) credits
Are used as a flow control method by Fibre Channel technology and
represent the number of frames a port can store
Provides best performance

Light must cover the distance 2 times


Submit data from Node 1 to Node 2
Submit acknowledge from Node 2 back to Node 1

B2B Calculation depends on link speed and distance


Number of multiple frames in flight increase equivalent to the link speed

32 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group without ISLs: Long distance


configuration

SVC Buffer to Buffer credits


2145CF8 / CG8 have 41 B2B credits
Enough for 10km at 8Gb/sec with 2 KB payload

All earlier models:


Use 1/2/4Gb/sec fibre channel adapters
Have 8 B2B credits which is enough for 4km at 4Gb/sec

Recommendation 1:
Use CF8 / CG8 nodes for more than 4km distance for best performance

Recommendation 2:
SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change
the B2B credits in the switch to 41 as well
Link speed FC frame Required B2B Max distance with
length credits for 10 km 8 B2B credits
distance
1Gb/sec 1 km 5 16 km
2 Gb/sec 0.5 km 10 8 km
4 Gb/sec 0.25 km 20 4 km
8 Gb/sec 0.125 km 40 2 km

33 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group with ISLs between SVC nodes


Server 1 Server 3

Server 2
Site 1 Site 2 Server 4

WDM WDM
ISL
Publ.SAN1 Publ.SAN1
ISL
Priv.SAN1 Priv.SAN1

WDM WDM
Publ.SAN2 ISL Publ.SAN2

Priv.SAN2 ISL Priv.SAN2

SVC-01 SVC-02

ISL
ISL

Storage Storage

Quorum candidate Site 3 Quorum candidate

Switch Switch

Ctl. A Ctl. B

Act. Quorum

34 2012 IBM Corporation


ATS Masters - Storage

Long distance with ISLs between SVC nodes


Some switches and distance extenders use extra buffers and proprietary
protocols to eliminate one of the round trips worth of latency for SCSI Write
commands
These devices are already supported for use with SVC
No benefit or impact inter-node communication
Does benefit Host to remote SVC I/Os
Does benefit SVC to remote Storage Controller I/Os

Consequences:
Metro Mirror is deployed for shorter distances (up to 300km)
Global Mirror is used for longer distances
Split I/O Group supported distance will depend on application latency restrictions
100km for live data mobility (150km with distance extenders)
300km for fail-over / recovery scenarios
SVC supports up to 80ms latency, far greater than most application workloads would tolerate

35 2012 IBM Corporation


ATS Masters - Storage

Split I/O Group Configuration: Examples


Example 1)
Configuration with live data mobility:
Server 1 Server 3

VMware ESX with VMotion or AIX with live partition mobility Server 2
Site 1 Site 2 Server 4

Distance between sites: 12km


WDM WDM
-> SVC 6.3: Configuration with or without ISLs are supported Publ.SAN1
ISL
Publ.SAN1
ISL
Priv.SAN1 Priv.SAN1
-> SVC 6.2: Only configuration without ISLs is supported WDM WDM
Publ.SAN2 ISL Publ.SAN2

Priv.SAN2 ISL Priv.SAN2

Example 2)
SVC-01 SVC-02

Configuration with live data mobility : ISL


ISL

VMware ESX with VMotion or AIX with live partition mobility


Distance between sites: 70km Storage Storage

Quorum candidate Site 3 Quorum candidate

-> Only SVC 6.3 Split I/O Group with ISLs is supported.

Switch Switch
Example 3)
Ctl. A Ctl. B

Configuration without live data mobility :


VMware ESX with SRM, AIX HACMP, or MS Cluster
Act. Quorum

Distance between sites: 180km


-> Only SVC 6.3 Split I/O Group with ISLs is supported or
-> Metro Mirror configuration
Because of long distances: only in active / passive configuration
36 2012 IBM Corporation
ATS Masters - Storage

Split I/O Group - Disaster Recovery


Split I/O groups provide distributed HA functionality
Usage of Metro Mirror / Global Mirror is recommended for disaster
protection
Both major Split I/O Group sites must be connected to the MM / GM
infrastructure
Without ISLs between SVC nodes:
All SVC ports can be used for MM / GM connectivity
With ISLs between SVC nodes:
Only MM / GM connectivity to the public SAN network is supported
Only 2 FC ports per SVC node will be available for MM or GM and will also be
used for host to SVC and SVC to disk system I/O
Thus limited capability currently
Congestion on GM ports would affect host I/O, but not node-to-node (heartbeats, etc.)
Might need more than one cluster to handle all traffic
More expensive, more ports and paths to deal with

37 2012 IBM Corporation


ATS Masters - Storage

Summary
SVC Split I/O Group:
Is a very powerful solution for automatic and fast handling of storage failures
Transparent for servers
Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition
Mobility)
Transparent for all OS based clusters
Distances up to 300 km (SVC 6.3) are supported

Two possible scenarios:


Without ISLs between SVC nodes (classic SVC Split I/O Group)
Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM
With ISLs between SVC nodes:
Up to 100 km distance for live data mobility (150 km with distance extenders)
Up to 300 km for fail-over / recovery scenarios

Long distance performance impact can be optimized by:


Load distribution across both sites
Appropriate SAN Buffer to Buffer credits

38 2012 IBM Corporation

Das könnte Ihnen auch gefallen