Beruflich Dokumente
Kultur Dokumente
think that node03 is down and try to start SG03 on them. This may lead to data corruption
as same service group may be online on 2 systems.
5. Failover due to resource fault or operator request would still work.
Recovery
To recover from jeopardy, just fix the link and GAB automatically detects the new link and the
jeopardy membership is removed from node03.
2. Network partition
Now consider a case where the last link also fails (note that the last link fails when node03 was
already in jeopardy membership). In that case 2 mini-clusters are formed.
I/O fencing
VCS implements I/O fencing mechanism to avoid a possible split-brain condition. It ensure data
integrity and data protection. I/O fencing driver uses SCSI-3 PGR (persistent group reservations) to
fence off the data in case of a possible split brain scenario. Persistent group reservations are
persistent across SCSI bus resets and supports multi-pathing from host to disk.
Coordinator disks
Coordinator disks are used to store the key of each host, which can be used to determine which
node stays in cluster in case of a possible split brain scenario. In case of a split brain scenario the
coordinator disks triggers the fencing driver to ensure only one mini-cluster survives.
data disks
The disks used in shared storage for VCS are automatically fenced off as they are discovered and
configured under VxVM.
Now consider various scenarios and how fencing works to avoid any data corruption.
In case of a possible split brain
As show in the figure above assume that node01 has key A and node02 has key B.
1. Both nodes think that the other node has failed and start racing to write their keys to the
coordinator disks.
2. node01 manages to write the key to majority of disks i.e. 2 disks
3. node02 panics
4. node01 now has a perfect membership and hence Service groups from node02 can be
started on node01
In case of a node failure
Assume that node02 fails as shown in the diagram above.
1. node01 detects no heartbeats from node02 and start racing to register its keys on the
coordinator disks and ejects the keys of node02.
2. As node01 wins the race forming a perfect cluster membership.
3. VCS thus can failover any service group thats on the node02 to node01
In case of manual seeding after reboot in a network partition
Consider a case when there is already a network partition and a node [node02] reboots. At this point
the node which got rebooted can not join the cluster due to the gabtab file has specified minimum 2
nodes to be communicating to start VCS and it cant communicate with other node due to network
partition.
1. node02 reboots and a user manually forces GAB on node02 to seed the node.
2. node02 detects keys of node01 pre-existing on the coordinator disks and comes to know
about the existing network partition. I/O fencing driver thus prevents HAD from starting and
outputs an error on the console about the pre-existing network partition.
Summary
VCS ensures data integrity by using all of the below mechanisms.
1. I/O fencing recommended method. requires scsi3 PGR compatible disks to implement.
2. GAB seeding prevents service groups from starting if nodes are not communicating.
Ensures a cluster membership is formed.
3. jeopardy cluster membership
4. Low priority links to ensure redundancy in case high priority links fail. In case of a network
partition where all high priority links fail, low priority link can be used to form a jeopardy
membership by promoting it to a high priority link.