You are on page 1of 72

Veritas Cluster Server (VCS) HOWTO:

===================================
$Id: VCS-HOWTO,v 1.25 2002/09/30 20:05:38 pzi Exp $
Copyright (c) Peter Ziobrzynski, pzi@pzi.net


Contents:
---------
- Copyright
- Thanks
- Overview
- VCS installation
- Summary of cluster queries
- Summary of basic cluster operations
- Changing cluster configuration
- Configuration of a test group and test resource type
- Installation of a test agent for a test resource
- Home directories service group configuration
- NIS service groups configuration
- Time synchronization services
- ClearCase configuration

Copyright:
----------

This HOWTO document may be reproduced and distributed in whole or in
part, in any medium physical or electronic, as long as this copyright
notice is retained on all copies. Commercial redistribution is allowed
and encouraged; however, the author would like to be notified of any
such distributions.

All translations, derivative works, or aggregate works incorporating
any this HOWTO document must be covered under this copyright notice.
That is, you may not produce a derivative work from a HOWTO and impose
additional restrictions on its distribution. Exceptions to these rules
may be granted under certain conditions.

In short, I wish to promote dissemination of this information through
as many channels as possible. However, I do wish to retain copyright
on this HOWTO document, and would like to be notified of any plans to
redistribute the HOWTO.

If you have questions, please contact me: Peter Ziobrzynski <pzi@pzi.net>


Thanks:
-------

- Veritas Software provided numerous consultations that lead to the
cluster configuration described in this document.

- Parts of this document are based on the work I have done for
Kestrel Solutions, Inc.

- Basis Inc. for assisting in selecting hardware components and help
in resolving installation problems.

- comp.sys.sun.admin Usenet community.


Overview:
---------

This document describes the configuration of a two or more node Solaris
Cluster using Veritas Cluster Server VCS 1.1.2 on Solaris 2.6. Number
of standard UNIX services are configured as Cluster Service Groups:
user home directories, NIS naming services, time synchronization (NTP).
In addition a popular Software Configuration Management system from
Rational - ClearCase is configured as a set of cluster service groups.

Configuration of various software components in the form
of a cluster Service Group allows for high availability of the application
as well as load balancing (fail-over or switch-over). Beside that cluster
configuration allows to free a node in the network for upgrades, testing
or reconfiguration and then bring it back to service very quickly with
little or no additional work.

- Cluster topology.


The cluster topology used here is called clustered pairs. Two nodes
share disk on a single shared SCSI bus. Both computers and the disk
are connected in a chain on a SCSI bus. Both differential or fast-wide
SCSI buses can be used. Each SCSI host adapter in each node is assigned
different SCSI id (called initiator id) so both computers can coexist
on the same bus.

+ Two Node Cluster with single disk:

Node Node
| /
| /
| /
| /
|/
Disk

A single shared disk can be replaced by two disks each on its private
SCSI bus connecting both cluster nodes. This allows for disk mirroring
across disks and SCSI buses.
Note: the disk here can be understood as disk array or a disk pack.

+ Two Node Cluster with disk pair:

Node Node
|\ /|
| \ / |
| \ |
| / \ |
|/ \|
Disk Disk

Single pair can be extended by chaining additional node and connecting
it to the pair by additional disks and SCSI buses. One or more nodes
can be added creating N node configuration. The perimeter nodes have
two SCSI host adapters while the middle nodes have four.

+ Three Node Cluster:

Node Node Node
|\ /| |\ /|
| \ / | | \ / |
| \ | | \ |
| / \ | | / \ |
|/ \| |/ \|
Disk Disk Disk Disk

+ N Node Cluster:

Node Node Node Node
|\ /| |\ /|\ /|
| \ / | | \ / | \ / |
| \ | | \ | ...\ |
| / \ | | / \ | / \ |
|/ \| |/ \|/ \|
Disk Disk Disk Disk Disk

- Disk configuration.

Management of the shared storage of the cluster is performed with the
Veritas Volume Manager (VM). The VM controls which disks on the shared
SCSI bus are assigned (owned) to which system. In Volume Manager disks
are grouped into disk groups and as a group can be assigned for access
from one of the systems. The assignment can be changed quickly allowing
for cluster fail/switch-over. Disks that compose disk group can be
scattered across multiple disk enclosures (packs, arrays) and SCSI
buses. We used this feature to create disk groups that contains VM
volumes mirrored across devices. Below is a schematics of 3 cluster
nodes connected by SCSI busses to 4 disk packs (we use Sun Multipacks).

The Node 0 is connected to Disk Pack 0 and Node 1 on one SCSI bus and
to Disk Pack 1 and Node 1 on second SCSI bus. Disks 0 in Pack 0 and 1
are put into Disk group 0, disks 1 in Pack 0 and 1 are put into Disk
group 1 and so on for all the disks in the Packs. We have 4 9 GB disks
in each Pack so we have 4 Disk groups between Node 0 and 1 that can be
switched from one node to the other.

Node 1 is interfacing the the Node 2 in the same way as with the Node 0.
Two disk packs Pack 2 and Pack 3 are configured with disk groups 4, 5,
6 and 7 as a shared storage between the nodes. We have a total of 8 disk
groups in the cluster. Groups 0-3 can be visible from Node 0 or 1 and
groups 4-7 from Node 1 and 2. Node 1 is in a privileged situation and
can access all disk groups.


Node 0 Node 1 Node 2 ... Node N
------- ------------------- ------
|\ /| |\ /|
| \ / | | \ / |
| \ / | | \ / |
| \ / | | \ / |
| \ / | | \ / |
| \ / | | \ / |
| \ / | | \ / |
| \ | | \ |
| / \ | | / \ |
| / \ | | / \ |
| / \ | | / \ |
| / \ | | / \ |
| / \ | | / \ |
| / \ | | / \ |
|/ \| |/ \|
Disk Pack 0: Disk Pack 1: Disk Pack 2: Disk Pack 3:

Disk group 0: Disk group 4:
+----------------------+ +------------------------+
| Disk0 Disk0 | | Disk0 Disk0 |
+----------------------+ +------------------------+
Disk group 1: Disk group 5:
+----------------------+ +------------------------+
| Disk1 Disk1 | | Disk1 Disk1 |
+----------------------+ +------------------------+
Disk group 2: Disk group 6:
+----------------------+ +------------------------+
| Disk2 Disk2 | | Disk2 Disk2 |
+----------------------+ +------------------------+
Disk group 3: Disk group 7:
+----------------------+ +------------------------+
| Disk3 Disk3 | | Disk3 Disk3 |
+----------------------+ +------------------------+

- Hardware details:

Below is a detailed listing of the hardware configuration of two
nodes. Sun part numbers are included so you can order it directly
form Sunstore and put it on your Visa:

- E250:
+ Base: A26-AA
+ 2xCPU: X1194A
+ 2x256MB RAM: X7004A,
+ 4xUltraSCSI 9.1GB hard drive: X5234A
+ 100BaseT Fast/Wide UltraSCSI PCI adapter: X1032A
+ Quad Fastethernet controller PCI adapter: X1034A

- MultiPack:
+ 4x9.1GB 10000RPM disk
+ Storedge Mulitpack: SG-XDSK040C-36G

- Connections:

+ SCSI:
E250: E250:
X1032A-------SCSI----->Multipack<----SCSI---X1032A
X1032A-------SCSI----->Multipack<----SCSI---X1032A

+ VCS private LAN 0:
hme0----------Ethernet--->HUB<---Ethernet---hme0

+ VCS private LAN 1:
X1034A(qfe0)--Ethernet--->HUB<---Ethernet---X1034A(qfe0)

+ Cluster private LAN:
X1034A(qfe1)--Ethernet--->HUB<---Ethernet---X1034A(qfe1)

+ Public LAN:
X1034A(qfe2)--Ethernet--->HUB<---Ethernet---X1034A(qfe2)

Installation of VCS-1.1.2
----------------------------

Two systems are put into the cluster: foo_c and bar_c


- Set scsi-initiator-id boot prom envrionment variable to 5 on one
of the systems (say bar_c):

ok setenv scsi-initiator-id 5
ok boot -r

- Install Veritas Foundation Suite 3.0.1.

Follow Veritas manuals.


- Add entries to your c-shell environment:

set veritas = /opt/VRTSvmsa
setenv VMSAHOME $veritas
setenv MANPATH ${MANPATH}:$veritas/man
set path = ( $path $veritas/bin )


- Configure the ethernet connections to use hme0 and qfe0 as Cluster
private interconnects. Do not create /etc/hostname.{hme0,qfe0}.
Configure qfe2 as the public LAN network and qfe1 as Cluster main private
network. The configuration files on foo_c:

/etc/hosts:
127.0.0.1 localhost
# public network (192.168.0.0/16):
192.168.1.40 bar
192.168.1.51 foo
# Cluster private network (network address 10.2.0.0/16):
10.2.0.1 bar_c
10.2.0.3 foo_c loghost

/etc/hostname.qfe1:
foo_c

/etc/hostname.qfe2:
foo

The configuration files on bar_c:

/etc/hosts:
127.0.0.1 localhost
# Public network (192.168.0.0/16):
192.168.1.40 bar
192.168.1.51 foo
# Cluster private network (network address 10.2.0.0/16):
10.2.0.1 bar_c loghost
10.2.0.3 foo_c

/etc/hostname.qfe1:
bar_c

/etc/hostname.qfe2:
bar

- Configure at least two VM diskgroups on shared storage (Multipacks)
working from on one of the systems (e.g. foo_c):

+ Create cluster volume groups spanning both multipacks
using vxdiskadm '1. Add or initialize one or more disks':

cluster1: c1t1d0 c2t1d0
cluster2: c1t2d0 c2t2d0
...

Name vmdisks like that:

cluster1: cluster101 cluster102
cluster2: cluster201 cluster202
...

You can do it for 4 disk groups with this script:

#!/bin/sh
for group in 1 2 3 4;do
vxdisksetup -i c1t${group}d0
vxdisksetup -i c2t${group}d0
vxdg init cluster${group} cluster${group}01=c1t${group}d0
vxdg -g cluster${group} adddisk
cluster${group}02=c2t${group}d0
done

+ Create volumes in each group mirrored across both multipacks.
You can do it with the script for 4 disk groups with this script:

#!/bin/sh
for group in 1 2 3 4;do
vxassist -b -g cluster${group} make vol01 8g layout=mirror
cluster${group}01 cluster${group}02
done

+ or do all diskgroup and volumes in one script:

#!/bin/sh
for group in 1 2 3 4;do
vxdisksetup -i c1t${group}d0
vxdisksetup -i c2t${group}d0
vxdg init cluster${group} cluster${group}01=c1t${group}d0
vxdg -g cluster${group} adddisk
cluster${group}02=c2t${group}d0
vxassist -b -g cluster${group} make vol01 8g layout=mirror
cluster${group}01 cluster${group}02
done

+ Create veritas file systems on the volumes:

#!/bin/sh
for group in 1 2 3 4;do
mkfs -F vxfs /dev/vx/rdsk/cluster$group/vol01
done

+ Deport a group from one system: stop volume, deport a group:

# vxvol -g cluster2 stop vol01
# vxdg deport cluster2

+ Import a group and start its volume on the other system to
see if this works:

# vxdg import cluster2
# vxrecover -g cluster2 -sb

- With the shared storage configured it is important to know how to
manually move the volumes from one node of the cluster to the other.
I use a cmount command to do that. It is like a rc scritp with additional
argument for the disk group.

To stop (deport) the group 1 on a node do:

# cmount 1 stop

To start (import) the group 1 on the other node do:

# cmount 1 start

The cmount script is as follows:

#!/bin/sh
set -x
group=$1
case $2 in
start)
vxdg import cluster$group
vxrecover -g cluster$group -sb
mount -F vxfs /dev/vx/dsk/cluster$group/vol01 /cluster$group
;;
stop)
umount /cluster$group
vxvol -g cluster$group stop vol01
vxdg deport cluster$group
;;
esac


- To remove all shared storage volumes and groups do:

#!/bin/sh
for group in 1 2 3 4; do
vxvol -g cluster$group stop vol01
vxdg destroy cluster$group
done

- Install VCS software:
(from install server on athena)

# cd /net/athena/export/arch/VCS-1.1.2/vcs_1_1_2a_solaris
# pkgadd -d . VRTScsga VRTSgab VRTSllt VRTSperl VRTSvcs VRTSvcswz clsp


+ correct /etc/rc?.d scripts to be links:
If they are not symbolic links then it is hard to disable VCS
startup at boot. If they are just rename /etc/init.d/vcs to
stop starting and stopping at boot.

cd /etc
rm rc0.d/K10vcs rc3.d/S99vcs
cd rc0.d
ln -s ../init.d/vcs K10vcs
cd ../rc3.d
ln -s ../init.d/vcs S99vcs

+ add -evacuate option to /etc/init.d/vcs:

This is optional but I find it important to switch-over
all service groups from the node that is being shutdown.
When I take a cluster node down I expect the rest of the
cluster to pick up the responsibility to run all services.
The default VCS does not do that. The only way to move a
group from one node to another is to crash it or do manual
switch-over using hagrp command.

'stop')
$HASTOP -local -evacuate > /dev/null 2>&1
;;

- Add entry to your c-shell environment:

set vcs = /opt/VRTSvcs
setenv MANPATH ${MANPATH}:$vcs/man
set path = ( $vcs/bin $path )

- To remove the VCS software:
NOTE: required if demo installation fails.

# sh /opt/VRTSvcs/wizards/config/quick_start -b
# rsh bar_c 'sh /opt/VRTSvcs/wizards/config/quick_start -b'
# pkgrm VRTScsga VRTSgab VRTSllt VRTSperl VRTSvcs VRTSvcswz clsp
# rm -rf /etc/VRTSvcs /var/VRTSvcs
# init 6

- Configure /.rhosts on both nodes to allow each node transparent rsh
root access to the other:

/.rhosts:

foo_c
bar_c

- Run quick start script from one of the nodes:
NOTE: must run from /usr/openwin/bin/xterm - other xterms cause terminal
emulation problems

# /usr/openwin/bin/xterm &
# sh /opt/VRTSvcs/wizards/config/quick_start

Select hme0 and qfe0 network links for GAB and LLT connections.
The script will ask twice for the links interface names. Link 1 is hme0
and link2 is qfe0 for both foo_c and bar_c nodes.

You should see the heartbeat pings on the interconnection hubs.

The wizard creates LLT and GAB configuration files in /etc/llttab,
/etc/gabtab and llthosts on each system:

On foo_c:

/etc/llttab:

set-node foo_c
link hme0 /dev/hme:0
link qfe1 /dev/qfe:1
start

On bar_c:

/etc/llttab:

set-node bar_c
link hme0 /dev/hme:0
link qfe1 /dev/qfe:1
start

/etc/gabtab:

/sbin/gabconfig -c -n2


/etc/llthosts:

0 foo_c
1 bar_c

The LLT and GAB communication is started by rc scripts S70llt and S92gab
installed in /etc/rc2.d.

- We can configure private interconnect by hand creating above files.

- Check basic installation:

+ status of the gab:

# gabconfig -a

GAB Port Memberships
===============================================================
Port a gen 1e4c0001 membership 01
Port h gen dd080001 membership 01

+ status of the link:

# lltstat -n

LLT node information:
Node State Links
* 0 foo_c OPEN 2
1 bar_c OPEN 2


+ node parameters:

# hasys -display

- Set/update VCS super user password:

+ add root user:

# haconf -makerw
# hauser -add root
password:...
# haconf -dump -makero

+ change root password:

# haconf -makerw
# hauser -update root
password:...
# haconf -dump -makero

- Configure demo NFS service groups:

NOTE: You have to fix the VCS wizards first: The wizard perl scripts
have a bug that makes the core dump in the middle of filling out
configuration forms. The solution is to provide shell wrapper for one
binary and avoid running it with specific set of parameters. Do the
following in VCS-1.1.2 :

# cd /opt/VRTSvcs/bin
# mkdir tmp
# mv iou tmp
# cat << EOF > iou
#!/bin/sh
echo "[$@]" >> /tmp/,.iou.log
case "$@" in
'-c 20 9 -g 2 2 3 -l 0 3') echo "skip bug" >> /tmp/,.iou.log
;;
*) /opt/VRTSvcs/bin/tmp/iou "$@" ;;
esac
EOF
# chmod 755 iou

+ Create NFS mount point directories on both systems:

# mkdir /export1 /export2

+ Run the wizard on foo_c node:

NOTE: must run from /usr/openwin/bin/xterm - other xterms cause
terminal emulation problems

# /usr/openwin/bin/xterm &
# sh /opt/VRTSvcs/wizards/services/quick_nfs

Select for groupx:
- public network device: qfe2
- group name: groupx
- IP: 192.168.1.53
- VM disk group: cluster1
- volume: vol01
- mount point: /export1
- options: rw
- file system: vxfs

Select for groupy:
- public network device: qfe2
- group name: groupy
- IP: 192.168.1.54
- VM disk group: cluster2
- volume: vol01
- mount point: /export2
- options: rw
- file system: vxfs

You should see: Congratulations!...

The /etc/VRTSvcs/conf/config directory should have main.cf and
types.cf files configured.

+ Reboot both systems:

# init 6


Summary of cluster queries:
----------------------------

- Cluster queries:

+ list cluster status summary:

# hastatus -summary

-- SYSTEM STATE
-- System State Frozen

A foo_c RUNNING 0
A bar_c RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B groupx foo_c Y N ONLINE
B groupx bar_c Y N OFFLINE
B groupy foo_c Y N OFFLINE
B groupy bar_c Y N ONLINE

+ list cluster attributes:

# haclus -display
#Attribute Value
ClusterName my_vcs
CompareRSM 0
CounterInterval 5
DumpingMembership 0
Factor runque 5 memory 1 disk 10 cpu 25 network 5
GlobalCounter 16862
GroupLimit 200
LinkMonitoring 0
LoadSampling 0
LogSize 33554432
MajorVersion 1
MaxFactor runque 100 memory 10 disk 100 cpu 100
network 100
MinorVersion 10
PrintMsg 0
ReadOnly 1
ResourceLimit 5000
SourceFile ./main.cf
TypeLimit 100
UserNames root cDgqS68RlRP4k



- Resource queries:

+ list resources:

# hares -list
cluster1 foo_c
cluster1 bar_c
IP_192_168_1_53 foo_c
IP_192_168_1_53 bar_c
...

+ list resource dependencies:

# hares -dep
#Group Parent Child
groupx IP_192_168_1_53 groupx_qfe1
groupx IP_192_168_1_53 nfs_export1
groupx export1 cluster1_vol01
groupx nfs_export1 NFS_groupx_16
groupx nfs_export1 export1
groupx cluster1_vol01 cluster1
groupy IP_192_168_1_54 groupy_qfe1
groupy IP_192_168_1_54 nfs_export2
groupy export2 cluster2_vol01
groupy nfs_export2 NFS_groupy_16
groupy nfs_export2 export2
groupy cluster2_v cluster2

+ list attributes of a resource:
# hares -display export1
#Resource Attribute System Value
export1 ConfidenceLevel foo_c 100
export1 ConfidenceLevel bar_c 0
export1 Probed foo_c 1
export1 Probed bar_c 1
export1 State foo_c ONLINE
export1 State bar_c OFFLINE
export1 ArgListValues foo_c /export1
/dev/vx/dsk/cluster1/vol01 vxfs rw ""
...


- Groups queries:

+ list groups:

# hagrp -list
groupx foo_c
groupx bar_c
groupy foo_c
groupy bar_c


+ list group resources:


# hagrp -resources groupx
cluster1
IP_192_168_1_53
export1
NFS_groupx_16
groupx_qfe1
nfs_export1
cluster1_vol01

+ list group dependencies:

# hagrp -dep groupx


+ list of group attributes:

# hagrp -display groupx
#Group Attribute System Value
groupx AutoFailOver global 1
groupx AutoStart global 1
groupx AutoStartList global foo_c
groupx FailOverPolicy global Priority
groupx Frozen global 0
groupx IntentOnline global 1
groupx ManualOps global 1
groupx OnlineRetryInterval global 0
groupx OnlineRetryLimit global 0
groupx Parallel global 0
groupx PreOnline global 0
groupx PrintTree global 1
groupx SourceFile global ./main.cf
groupx SystemList global foo_c 0 bar_c 1
groupx SystemZones global
groupx TFrozen global 0
groupx TriggerEvent global 1
groupx UserIntGlobal global 0
groupx UserStrGlobal global
groupx AutoDisabled foo_c 0
groupx AutoDisabled bar_c 0
groupx Enabled foo_c 1
groupx Enabled bar_c 1
groupx ProbesPending foo_c 0
groupx ProbesPending bar_c 0
groupx State foo_c |ONLINE|
groupx State bar_c |OFFLINE|
groupx UserIntLocal foo_c 0
groupx UserIntLocal bar_c 0
groupx UserStrLocal foo_c
groupx UserStrLocal bar_c


- Node queries:

+ list nodes in the cluster:

# hasys -list
foo_c
bar_c

+ list node attributes:

# hasys -display bar_c
#System Attribute Value
bar_c AgentsStopped 1
bar_c ConfigBlockCount 54
bar_c ConfigCheckSum 48400
bar_c ConfigDiskState CURRENT
bar_c ConfigFile /etc/VRTSvcs/conf/config
bar_c ConfigInfoCnt 0
bar_c ConfigModDate Wed Mar 29 13:46:19 2000
bar_c DiskHbDown
bar_c Frozen 0
bar_c GUIIPAddr
bar_c LinkHbDown
bar_c Load 0
bar_c LoadRaw runque 0 memory 0 disk 0 cpu 0
network 0
bar_c MajorVersion 1
bar_c MinorVersion 10
bar_c NodeId 1
bar_c OnGrpCnt 1
bar_c SourceFile ./main.cf
bar_c SysName bar_c
bar_c SysState RUNNING
bar_c TFrozen 0
bar_c UserInt 0
bar_c UserStr

- Resource types queries:

+ list resource types:
# hatype -list
CLARiiON
Disk
DiskGroup
ElifNone
FileNone
FileOnOff
FileOnOnly
IP
IPMultiNIC
Mount
MultiNICA
NFS
NIC
Phantom
Process
Proxy
ServiceGroupHB
Share
Volume

+ list all resources of a given type:
# hatype -resources DiskGroup
cluster1
cluster2

+ list attributes of the given type:
# hatype -display IP
#Type Attribute Value
IP AgentFailedOn
IP AgentReplyTimeout 130
IP AgentStartTimeout 60
IP ArgList Device Address NetMask Options
ArpDelay IfconfigTwice
IP AttrChangedTimeout 60
IP CleanTimeout 60
IP CloseTimeout 60
IP ConfInterval 600
IP LogLevel error
IP MonitorIfOffline 1
IP MonitorInterval 60
IP MonitorTimeout 60
IP NameRule IP_ + resource.Address
IP NumThreads 10
IP OfflineTimeout 300
IP OnlineRetryLimit 0
IP OnlineTimeout 300
IP OnlineWaitLimit 2
IP OpenTimeout 60
IP Operations OnOff
IP RestartLimit 0
IP SourceFile ./types.cf
IP ToleranceLimit 0
- Agents queries:

+ list agents:
# haagent -list
CLARiiON
Disk
DiskGroup
ElifNone
FileNone
FileOnOff
FileOnOnly
IP
IPMultiNIC
Mount
MultiNICA
NFS
NIC
Phantom
Process
Proxy
ServiceGroupHB
Share
Volume

+ list status of an agent:
# haagent -display IP
#Agent Attribute Value
IP AgentFile
IP Faults 0
IP Running Yes
IP Started Yes


Summary of basic cluster operations:
------------------------------------

- Cluster Start/Stop:

+ stop VCS on all systems:
# hastop -all

+ stop VCS on bar_c and move all groups out:
# hastop -sys bar_c -evacuate

+ start VCS on local system:
# hastart

- Users:
+ add gui root user:
# haconf -makerw
# hauser -add root
# haconf -dump -makero
- Group:

+ group start, stop:
# hagrp -offline groupx -sys foo_c
# hagrp -online groupx -sys foo_c

+ switch a group to other system:
# hagrp -switch groupx -to bar_c

+ freeze a group:
# hagrp -freeze groupx

+ unfreeze a group:
# hagrp -unfreeze groupx

+ enable a group:
# hagrp -enable groupx

+ disable a group:
# hagrp -disable groupx

+ enable resources a group:
# hagrp -enableresources groupx

+ disable resources a group:
# hagrp -disableresources groupx

+ flush a group:
# hagrp -flush groupx -sys bar_c

- Node:

+ feeze node:
# hasys -freeze bar_c

+ thaw node:
# hasys -unfreeze bar_c

- Resources:

+ online a resouce:
# hares -online IP_192_168_1_54 -sys bar_c

+ offline a resouce:
# hares -offline IP_192_168_1_54 -sys bar_c

+ offline a resouce and propagte to children:
# hares -offprop IP_192_168_1_54 -sys bar_c

+ probe a resouce:
# hares -probe IP_192_168_1_54 -sys bar_c

+ clear faulted resource:
# hares -clear IP_192_168_1_54 -sys bar_c

- Agents:

+ start agent:
# haagent -start IP -sys bar_c

+ stop agent:
# haagent -stop IP -sys bar_c


- Reboot a node with evacuation of all service groups:
(groupy is running on bar_c)

# hastop -sys bar_c -evacuate
# init 6
# hagrp -switch groupy -to bar_c



Changing cluster configuration:
--------------------------------

You cannot edit configuration files directly while the
cluster is running. This can be done only if cluster is down.
The configuration files are in: /etc/VRTSvcs/conf/config

To change the configuartion you can:

+ use hagui
+ stop the cluster (hastop), edit main.cf and types.cf directly,
regenerate main.cmd (hacf -generate .) and start the cluster (hastart)
+ use the following command line based procedure on running cluster

To change the cluster while it is running do this:

- Dump current cluster configuration to files and generate main.cmd file:

# haconf -dump
# hacf -generate .
# hacf -verify .

- Create new configuration directory:

# mkdir -p ../new

- Copy existing *.cf files in there:

# cp main.cf types.cf ../new

- Add new stuff to it:

# vi main.cf types.cf

- Regenerate the main.cmd file with low level commands:

# cd ../new
# hacf -generate .
# hacf -verify .

- Catch the diffs:

# diff ../config/main.cmd main.cmd > ,.cmd

- Prepend this to the top of the file to make config rw:

# haconf -makerw

- Append the command to make configuration ro:

# haconf -dump -makero

- Apply the diffs you need:

# sh -x ,.cmd


Cluster logging:
-----------------------------------------------------

VCS logs all activities into /var/VRTSvcs/log directory.
The most important log is the engine log engine.log_A.
Each agent also has its own log file.

The logging parameters can be displayed with halog command:

# halog -info
Log on hades_c:
path = /var/VRTSvcs/log/engine.log_A
maxsize = 33554432 bytes
tags = ABCDE


Configuration of a test group and test resource type:
=======================================================

To get comfortable with the cluster configuration it is useful to
create your own group that uses your own resource. Example below
demonstrates configuration of a "do nothing" group with one resource
of our own type.


- Add group test with one resource test. Add this to
/etc/VRTSvcs/conf/config/new/types.cf:

type Test (
str Tester
NameRule = resource.Name
int IntAttr
str StringAttr
str VectorAttr[]
str AssocAttr{}
static str ArgList[] = { IntAttr, StringAttr, VectorAttr,
AssocAttr }
)

- Add this to /etc/VRTSvcs/conf/config/new/main.cf:

group test (
SystemList = { foo_c, bar_c }
AutoStartList = { foo_c }
)

Test test (
IntAttr = 100
StringAttr = "Testing 1 2 3"
VectorAttr = { one, two, three }
AssocAttr = { one = 1, two = 2 }
)

- Run the hacf -generate and diff as above. Edit it to get ,.cmd file:

haconf -makerw

hatype -add Test
hatype -modify Test SourceFile "./types.cf"
haattr -add Test Tester -string
hatype -modify Test NameRule "resource.Name"
haattr -add Test IntAttr -integer
haattr -add Test StringAttr -string
haattr -add Test VectorAttr -string -vector
haattr -add Test AssocAttr -string -assoc
hatype -modify Test ArgList IntAttr StringAttr VectorAttr AssocAttr
hatype -modify Test LogLevel error
hatype -modify Test MonitorIfOffline 1
hatype -modify Test AttrChangedTimeout 60
hatype -modify Test CloseTimeout 60
hatype -modify Test CleanTimeout 60
hatype -modify Test ConfInterval 600
hatype -modify Test MonitorInterval 60
hatype -modify Test MonitorTimeout 60
hatype -modify Test NumThreads 10
hatype -modify Test OfflineTimeout 300
hatype -modify Test OnlineRetryLimit 0
hatype -modify Test OnlineTimeout 300
hatype -modify Test OnlineWaitLimit 2
hatype -modify Test OpenTimeout 60
hatype -modify Test RestartLimit 0
hatype -modify Test ToleranceLimit 0
hatype -modify Test AgentStartTimeout 60
hatype -modify Test AgentReplyTimeout 130
hatype -modify Test Operations OnOff
haattr -default Test AutoStart 1
haattr -default Test Critical 1
haattr -default Test Enabled 1
haattr -default Test TriggerEvent 0
hagrp -add test
hagrp -modify test SystemList foo_c 0 bar_c 1
hagrp -modify test AutoStartList foo_c
hagrp -modify test SourceFile "./main.cf"
hares -add test Test test
hares -modify test Enabled 1
hares -modify test IntAttr 100
hares -modify test StringAttr "Testing 1 2 3"
hares -modify test VectorAttr one two three
hares -modify test AssocAttr one 1 two 2

haconf -dump -makero


- Feed it to sh:

# sh -x ,.cmd

- Both group test and resource Test should be added to the cluster


Installation of a test agent for a test resource:
-------------------------------------------------
This agent does not start or monitor any specific resource. It just
maintains its persistent state in ,.on file. This can be used as a
template for other agents that perform some real work.

- in /opt/VRTSvcs/bin create Test directory

# cd /opt/VRTSvcs/bin
# mkdir Test

- link in the precompiled agent binary for script implemented methods:

# cd Test
# ln -s ../ScriptAgent TestAgent

- create dummy agent scripts in /opt/VRTSvcs/bin/Test:
(make then executable - chmod 755 ...)

online:
#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
echo yes > /opt/VRTSvcs/bin/Test/,.on
offline:

#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
echo no > /opt/VRTSvcs/bin/Test/,.on
open:
#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
close:
#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
shutdown:
#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
clean:
#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
monitor:

#!/bin/sh
echo "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/log
case "`cat /opt/VRTSvcs/bin/Test/,.on`" in
no) exit 100 ;;
*) exit 101 ;;
esac

- start the agent:

# haagent -start Test -sys foo_c

- distribute the agent code to other nodes:

# cd /opt/VRTSvcs/bin/
# rsync -av --rsync-path=/opt/pub/bin/rsync Test bar_cs/bin

- start test group:

# hagrp -online test -sys foo_c

Note:

Distribution or synchronization of the agent code is very important for
cluster intergrity. If the agents differ on various cluster nodes
unpredictible things can happen. I maintain a shell script in the
veritas agent directory (/opt/VRTSvcs/bin) to distribute code of all
agents I work on:

#!/bin/sh
set -x
mkdir -p /tmp/vcs
for dest in hades_c:/opt/VRTSvcs/bin /tmp/vcs;do
rsync -av --rsync-path=/opt/pub/bin/rsync --exclude=log --
exclude=,.on ,.sync CCViews CCVOBReg CCVOBMount ClearCase Test CCRegistry
NISMaster NISClient $dest
done
cd /tmp
tar cvf vcs.tar vcs


Home directories service group configuration:
=============================================

We configure home directories to be a service group consisting of an IP
address
and the directory containing all home directories.
Users can consistently connect (telnet, rsh, etc.) to the logical IP and
expect
to find thier home directories local on the system.
The directory that we use is the source directory for the automounter
that mounts all directories as needed on the /home subdirectoies. We put
directories on the /cluster3/homes directory and mount it with
/etc/auto_home:

* localhost:/cluster3/homes/&

We assume that all required user accounts are configured on all cluster
nodes. This can be done by hand rdisting the /etc/passwd and group files
or by using NIS. We used both methods and NIS one is described below.
All resources of the group are standard VCS supplied ones so we do not
have to implement any agent code for additional resources.

Group 'homes' has the following resource (types in brackets):

homes:

IP_homes (IP)
| |
v v
share_homes (Share) qfe1_homes (NIC)
|
v
mount_homes (Mount)
|
v
volume_homes (Volume)
|
v
dgroup_homes (DiskGroup)

The service group definition for this group is as follows (main.cf):

group homes (
SystemList = { bar_c, foo_c }
AutoStartList = { bar_c }
)

DiskGroup dgroup_homes (
DiskGroup = cluster3
)

IP IP_homes (
Device = qfe2
Address = "192.168.1.55"
)

Mount mount_homes (
MountPoint = "/cluster3"
BlockDevice = "/dev/vx/dsk/cluster3/vol01"
FSType = vxfs
MountOpt = rw
)

Share share_homes (
PathName = "/cluster3"
Options = "-o rw=localhost"
OnlineNFSRestart = 0
OfflineNFSRestart = 0
)

NIC qfe2_homes (
Device = qfe2
NetworkType = ether
)

Volume volume_homes (
Volume = vol01
DiskGroup = cluster3
)

IP_homes requires qfe2_homes
IP_homes requires share_homes
mount_homes requires volume_homes
share_homes requires mount_homes
volume_homes requires dgroup_homes

NIS service group configuration:
=================================

NIS is configured as two service groups: one for the NIS Master server
and the other for the NIS clients. The server is configured to store all
NIS source data files on the shared storage in /cluster1/yp directory.
We copied the follwing files to /cluster1/yp:

auto_home ethers mail.aliases netmasks protocols
services
auto_master group netgroup networks publickey
timezone
bootparams hosts netid passwd rpc

The makefile in /var/yp required some changes to reflect different then
defalt /etc location of source files. Also the use of sendmail to generate
new aliases while the NIS service was in the process of starting up was
hanging and we had to remove it from the stardart map generatetion.
The limitation here is that the new mail aliases can only be added when
the NIS is completely running. The follwing diffs have been applied to
/var/yp/Makefile:

*** Makefile- Sun May 14 23:33:33 2000
--- Makefile.var.yp Fri May 5 07:38:02 2000
***************
*** 13,19 ****
# resolver for hosts not in the current domain.
#B=-b
B=
! DIR =/etc
#
# If the passwd, shadow and/or adjunct files used by rpc.yppasswdd
# live in directory other than /etc then you'll need to change the
--- 13,19 ----
# resolver for hosts not in the current domain.
#B=-b
B=
! DIR =/cluster1/yp
#
# If the passwd, shadow and/or adjunct files used by rpc.yppasswdd
# live in directory other than /etc then you'll need to change the
***************
*** 21,30 ****
# DO NOT indent the line, however, since /etc/init.d/yp attempts
# to find it with grep "^PWDIR" ...
#
! PWDIR =/etc
DOM = `domainname`
NOPUSH = ""
! ALIASES = /etc/mail/aliases
YPDIR=/usr/lib/netsvc/yp
SBINDIR=/usr/sbin
YPDBDIR=/var/yp
--- 21,30 ----
# DO NOT indent the line, however, since /etc/init.d/yp attempts
# to find it with grep "^PWDIR" ...
#
! PWDIR =/cluster1/yp
DOM = `domainname`
NOPUSH = ""
! ALIASES = /cluster1/yp/mail.aliases
YPDIR=/usr/lib/netsvc/yp
SBINDIR=/usr/sbin
YPDBDIR=/var/yp
***************
*** 45,51 ****
else $(MAKE) $(MFLAGS) -k all NOPUSH=$(NOPUSH);fi

all: passwd group hosts ethers networks rpc services protocols \
! netgroup bootparams aliases publickey netid netmasks c2secure \
timezone auto.master auto.home

c2secure:
--- 45,51 ----
else $(MAKE) $(MFLAGS) -k all NOPUSH=$(NOPUSH);fi

all: passwd group hosts ethers networks rpc services protocols \
! netgroup bootparams publickey netid netmasks \
timezone auto.master auto.home

c2secure:
***************
*** 187,193 ****
@cp $(ALIASES) $(YPDBDIR)/$(DOM)/mail.aliases;
@/usr/lib/sendmail -bi -oA$(YPDBDIR)/$(DOM)/mail.aliases;
$(MKALIAS) $(YPDBDIR)/$(DOM)/mail.aliases
$(YPDBDIR)/$(DOM)/mail.byaddr;
- @rm $(YPDBDIR)/$(DOM)/mail.aliases;
@touch aliases.time;
@echo "updated aliases";
@if [ ! $(NOPUSH) ]; then $(YPPUSH) -d $(DOM) mail.aliases; fi
--- 187,192 ----


We need only one master server so only one instance of this service group
is allowed on the cluster (group is not parallel).

Group 'nis_master' has the following resources (types in brackets):

nis_master:

master_NIS (NISMaster)
|
v
mount_NIS (Mount)
|
v
volume_NIS (Volume)
|
v
dgroup_NIS (DiskGroup)

The client service group is designed to configure domain name on the
node and then start ypbind in a broadcast mode. We need NIS client to
run on every node so it is designed as parallel group. Clients cannot
function without Master server running somewhere on the cluster network
so we include dependency between client and master service groups as
'online global'.
The client group unconfigures NIS completely from the node when it is
shotdown. This may seem radical but it is required for consistency
with the startup.

To allow master group to come on line we also include in this group
automatic configuration of the domain name.

The nis_master group is defined as follows (main.cf):

group nis_master (
SystemList = { bar_c, foo_c }
AutoStartList = { bar_c }
)

DiskGroup dgroup_NIS (
DiskGroup = cluster1
)

Mount mount_NIS (
MountPoint = "/cluster1"
BlockDevice = "/dev/vx/dsk/cluster1/vol01"
FSType = vxfs
MountOpt = rw
)

NISMaster master_NIS (
Source = "/cluster1/yp"
Domain = mydomain
)

Volume volume_NIS (
Volume = vol01
DiskGroup = cluster1
)

master_NIS requires mount_NIS
mount_NIS requires volume_NIS
volume_NIS requires dgroup_NIS


Group 'nis_client' has the following resource (types in brackets):

nis_client:

client_NIS (NISClient)

The nis_client group is defined as follows (main.cf):

group nis_client (
SystemList = { bar_c, foo_c }
Parallel = 1
AutoStartList = { bar_c, foo_c }
)

NISClient client_NIS (
Domain = mydomain
)

requires group nis_master online global


Both master and client service group use custom built resource and
correspnding agent code. The resource are defined as follows (in types.cf):

type NISClient (
static str ArgList[] = { Domain }
NameRule = resource.Name
str Domain
)

type NISMaster (
static str ArgList[] = { Source, Domain }
NameRule = resource.Name
str Source
str Domain
)

The agents code for NISMaster:

- online:


Time synchronization services (xntp):
======================================
,,,


ClearCase configuration:
=========================

ClearCase is a client server system prividing so called multi-vesion file
system functionality. The mvfs file systems are used to track contents
of files, directories, symbolic links, in versions of so called elements.
Elements are stored in VOBs (mvfs objects) and are looked at using Views
objects. Information about objects like their location, permissions,
etc. is stored in distributed database called registry. For ClearCase to
be configured on a system the Registry, VOB and View server processes
have to be started. VOBs and Views store their data in a regular
directory trees. The VOB and View storage directories can be located on
the shared storage of the cluster and cluster service groups configured
to mount it and start needed server processes.

We configured ClearCase as a set of four service groups: ccregistry, views,
vobs_group_mnt, vobs_group_reg. Each node in the cluster must have a
standard ClearCase installed and configured into the same region. All
views and VOBs need to be configured to use their storage directories
on the cluster shared storage. In our case we used /cluster2/viewstore
for views storate directory and /cluster4/vobstore for VOB
storage directory. All VOBs must be public.

The licensing of clearcase in the cluster is resolved by configuring
each node in the cluster as the license server for itself. This is done
by transfering all your licenses from one node to the other and still
keeping the other license server. Since this may be a streatch of the
licensing agreement you may want to use a separate license server outside
of the cluster.


Groups and resources:
---------------------

All four service groups (ccregistry, views, vobs_group_mnt, vobs_group_reg)
perform a specialized clearcase function that can be isolated to a single
node of the cluster. All nodes of the cluster run the basic clearcase
installation and this is performed by the resource type named ClearCase.
Each of the service groups includes this resource.
The ccregistry service group transfers clearcase master registry server
to a particular cluster node. This is performed by the specialized resource
of type CCRegistry.
The Views are handled by service groups that include specialized resource
of type CCViews. This resource registeres and starts all views sharing the
same storage directory.
The VOBs functionality is handled by two separate service groups: one that
registres a VOB on the cluster node and the other that mounts it on the same
or other cluster node. The VOB registration is performed by the specialized
resource of type CCVOBReg and the VOB mounting by the resource of type
CCVOBMount.
Detailed description of each service group and their resources follows:



ccregistry service group:
--------------------------

The ccregistry group is responsible for configuring a cluster node as
a primary registry server and if necessary unconfiguring it from any
other nodes on the cluster. All nodes in the cluster are configured as
registry backup servers that store a copy of the primary registry data.
The /var/adm/atria/rgy/rgy_hosts.conf has to be configured with all
cluster nodes as backups:

/var/adm/atria/rgy/rgy_hosts.conf:

foo_c
foo_c bar_c

This group uses two custom resources: ccregistry_primary and
ccase_ccregistry. The ccase_ccregistry is of type ClearCase and is
responsible for starting basic ClearCase services. No views or VOBs are
configured at this point. Other service groups will do that later. The
ccregistry_primary resource is changing configuration files to configure
a host as primary registry server.

ccregistry:

ccregistry_primary (CCRegistry)
|
|
v
ccase_ccregistry (ClearCase)


The ccregistry group is defined as follows (main.cf):

group ccregistry (
SystemList = { bar_c, foo_c }
AutoStartList = { bar_c }
)

CCRegistry ccregistry_primary (
)

ClearCase ccase_ccregistry (
)

ccregistry_primary requires ccase_ccregistry

The custom resource for the group CCRegistry and ClearCase are defined
as (in types.cf):


type CCRegistry (
static str ArgList[] = { }
NameRule = resource.Name
)

type ClearCase (
static str ArgList[] = { }
NameRule = resource.Name
static str Operations = OnOnly
)

ClearCase resource implemenation:

The ClearCase 'online' agent is responsible for configuring registry
configuration files and starting ClearCase servers. Configration is done
in such a way that clearcase runs only one registry master server in the
cluster. The /var/adm/atria/rgy/rgy_hosts.conf file is configured to use
the current node as the master only if no clearcase is running on other
cluster nodes. If clearcase service group is detected in on-line state
anywhere in the cluster the current node is started as the registry backup
server. It is assumed that the other node claimed the master registry
status already. The master status file /var/adm/atria/rgy/rgy_svr.conf
is updated to indicate the current node status. After the registry
configuration files are prepared the standard clearcase startup script
is run /usr/atria/etc/atria_start.


ClearCase/online agent:

> #!/bin/sh
> # ClearCase online:
> if [ -r /view/.specdev ];then
> # Running:
> exit 0
> fi
>
> this=`hostname`
> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`
> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`
> master=`cat /var/adm/atria/rgy/rgy_svr.conf`
>
> online=
> for host in $backups;do
> if [ "$host" != "$this" ];then
> stat=`hagrp -state ccregistry -sys $host | grep ONLINE | wc
-l`
> if [ $stat -gt 0 ];then
> online=$host
> break
> fi
> fi
> done
>
> if [ "$this" = "$primary" -a "$online" != "" ];then
> # Erase master status:
> cp /dev/null /var/adm/atria/rgy/rgy_svr.conf
>
> # Create configuraion file with this host as a master:
> cat <<-EOF > /var/adm/atria/rgy/rgy_hosts.conf
> $online
> $backups
> EOF
> fi
>
> # Normal ClearCase startup:
> /bin/sh -x /usr/atria/etc/atria_start start


The ClearCase resource is configured not to use 'offline' but only
'shutdown' agent. The 'offline' could be dangerous for clearcase if
VCS missed the monitor detection and decided to restart it.
The 'shutdown' ClearCase agent stops all clearcase servers using standard
clearcase shutdown script (/usr/atria/etc/atria_start).

ClearCase/shutdown:

> #!/bin/sh
> # ClearCase shutdown:
> # Normal ClearCase shutdown:
> /bin/sh -x /usr/atria/etc/atria_start stop

ClearCase/monitor:

> #!/bin/sh
> # ClearCase monitor:
> if [ -r /view/.specdev ];then
> # Running:
> exit 110
> else
> # Not running:
> exit 100
> fi


CCRegistry resource implemenation:

This resource verifies if the current node is configured as the registry
master server and if not performs switch-over from other node to this one.
The complication here was the sequence of events: when switching
over a group from one node to the other the VCS engine first offlines it
on the node that is on-line and then brings it on-line on the one that
was offline.
With registry switch-over the sequence of events has to be reversed: the
destination node must fist perform the rgy_switchover and transfer the
master status to itself while the master is up and next the old master
must be shutdown and configured as backup.

For this sequence to be implemented the offline agent (that is
called first on the current primary) does not perform the switchover
but only marks the intent of the master to be transfered by creating
a marker file ,.offline in the agent directory. The monitor script
that is called next on the current master is reporting the primary
being down if it finds the ,.offline marker.

CCRegistry/offline:

> #!/bin/sh
> # CCRegistry offline:
> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then
> # No albd_server - no vobs:
> exit 1
> fi
>
> this=`hostname`
> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`
> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`
>
> if [ "$this" != "$primary" ];then
> # This host is not configured as primary - do nothing:
> exit 1
> fi
>
> touch /opt/VRTSvcs/bin/CCRegistry/,.offline
>
> exit 0


Next the online agent on the target node performs the actual switch-over
using rgy_switchover.
Next the monitor script in the following iteration on the old primary sees
that the primary was tranfered by looking into the rgy_hosts.conf file and
then removes the ,.offline marker.

> #!/bin/sh
> # CCRegistry online:
> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then
> # No albd_server - no vobs:
> exit 1
> fi
>
> this=`hostname`
> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`
> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`
>
> if [ "$this" = "$primary" ];then
> # This host is already configured as primary - do nothing:
> exit 1
> fi
>
> # Check if this host if on the backup list - if not do nothing.
> # Only backups can become primary.
>
> contine=0
> for backup in $backups; do
> if [ "$backup" = "$this" ];then
> continue=1
> fi
> done
> if [ $continue -eq 0 ];then
> exit 1
> fi
>
>
> # Check if backup data exists. If not do nothing:
> if [ ! -d /var/adm/atria/rgy/backup ];then
> exit 1
> fi
>
> # Check how old the backup data is. If it is to old do nothing:
> # ,,,
>
>
> # Put the backup on line and switch hosts. Change from $primary to $this
host.
> # Assign last $backup host in backup list as backup:
>
> /usr/atria/etc/rgy_switchover -backup "$backups" $primary $this
>
> touch /opt/VRTSvcs/bin/CCRegistry/,.online
>
> exit 0


Sometimes the rgy_switchover running on the target node does not complete
the registry transfer and the operation has to be retried. To do this
the online agent leaves an ,.online marker in the agent directory right
after the rgy_switchover is run. Next the monitor agent looks for the
,.online marker and if it finds it it retries the rgy_switchover.
As soon as the monitor agent detects that the configuration files have
been properly updated and the switch-over was completed it removes the
,.online marker.

To maintain integrity of the agent operation the open and close agents
remove both marker files (,.online and ,.offline) that may have been
left there from the previous malfunctioning or crashed system.

CCRegstry/open:

> cd /opt/VRTSvcs/bin/CCRegistry
> rm -f ,.offline ,.online


CCRegstry/close:

> cd /opt/VRTSvcs/bin/CCRegistry
> rm -f ,.offline ,.online


vobs_<group>_reg service group:
---------------------------------
The first step in configuring clearcase is to create, register and
mount VOBs. This source group is designed to register a set of VOBs
that use a specific storage directory. All VOBs that are located in a
given directory are registered on the current cluster node. The <group>
is the parameter that should be replace with the uniqe name indicating
a group of VOBs. We used this name to consitently name Veritas Volume
Manager disk group, mount point directory and a collection of cluster
resources desinged to provide VOBs infrastructure. The vobs_<group>_reg
service group is built of the following resources:

- ccase_<group>_reg resrouce of type ClearCase. This resource powers up
clearcase on the cluster node and makes it ready for use. See above the
detailed description of this group implementation.

- ccvobs_<group>_reg resource of type CCVOBReg. This is a custom resource
that registers given set of VOBs identified by given VOB tags, storage
directory.

- mount_<group>_reg resource of type Mount. This resource mounts given
Veritas Volume on a directory.

- volume_<group>_reg resrouce of type Volume. This resource starts indicated
Veritas Volume in a given disk group.

- dgroup_<group>_reg resource of type DiskGroup onlines given Vertas
disk group.

Here is the dependency diagram of the resrouces of this group:

ccvobs_<group>_reg(CCVOBReg)
| |
v v
ccase_<group>_reg (ClearCase) mount_<group>_reg (Mount)
|
v
volume_<group>_reg(Volume)
|
v
dgroup_<group>_reg(DiskGroup)


There can be many instances of this service group - one for each collection
of VOBs. Each set can be managed separately onlining it on various
cluster nodes and providing load balancing functionality.
One of our implemetations used "cluster4" for the name of the <group>.
We named Veritas disk group "cluster4", the VOBs storage directory
/cluster4/vobstore. Here is the example definition of the vobs_cluster4_reg
group (in main.cf):

group vobs_cluster4_reg (
SystemList = { foo_c, bar_c }
AutoStartList = { foo_c }
)

CCVOBReg ccvobs_cluster4_reg (
Storage = "/cluster4/vobstore"
CCPassword = foobar
)

ClearCase ccase_cluster4_reg (
)

DiskGroup dgroup_cluster4_reg (
DiskGroup = cluster4
)

Mount mount_cluster4_reg (
MountPoint = "/cluster4"
BlockDevice = "/dev/vx/dsk/cluster4/vol01"
FSType = vxfs
MountOpt = rw
)

Volume volume_cluster4_reg (
Volume = vol01
DiskGroup = cluster4
)

requires group ccregistry online global

ccvobs_cluster4_reg requires ccase_cluster4_reg
ccvobs_cluster4_reg requires mount_cluster4_reg
mount_cluster4_reg requires volume_cluster4_reg
volume_cluster4_reg requires dgroup_cluster4_reg


CCVOBReg resource implementation:

The resource type is defined as follows:

type CCVOBReg (
static str ArgList[] = { CCPassword, Storage }
NameRule = resource.Name
str Storage
str CCPassword
)

The CCPasswd is the ClearCase registry password. The Storage is the
directory where VOBs storage directories are located.

The online agent checks the storage directory and uses basenames of all
directory entries with suffix .vbs as the VOB's tags to register.
First we try to unmount, remove tags, unregister and kill VOB's servers.
Removing of tags is done with the send-expect engine (expect) running
'ct rmtag' command so we can interactively provide registry password.
When the VOB previous instance is cleaned up it is registered and tagged.


> #!/bin/sh
> # CCVOBReg online:
>
> shift
> pass=$1
> shift
> vobstorage=$1
>
> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then
> # No albd_server - no views:
> exit 1
> fi
>
> # Handle all VOBs created in the VOB storage directory:
> if [ ! -d $vobstorage ];then
> exit
> fi
>
> for tag in `cd $vobstorage; ls | sed 's/.vbs//'`;do
> storage=$vobstorage/$tag.vbs
>
> # Try to cleanup first:
> cleartool lsvob /vobs/$tag
> status=$?
> if [ $status -eq 0 ];then
> cleartool umount /vobs/$tag
>
> expect -f - <<-EOF
> spawn cleartool rmtag -vob -all /vobs/$tag
> expect "Registry password:"
> send "$pass\n"
> expect eof
> EOF
>
> cleartool unregister -vob $storage
>
> pids=`ps -ef | grep vob | grep "$storage" | grep -v grep |
awk '
> { print $2 }'`
>
> for pid in $pids;do
> kill -9 $pid
> done
> fi
>
> # Now register:
> cleartool register -vob $storage
> cleartool mktag -vob -pas $pass -public -tag /vobs/$tag $storage
> done
>

The monitor agent is implemented by checking 'ct lsvob' output and comparing
the vobs listed as registered on the current host versus the vobs found
in the VOB's storage directory:


> #!/bin/sh
> # CCVOBReg monitor:
>
>
> vobs -l` -eq 0 ];then
> # No albd_server:
> exit 100
> fi
>
> # Handle all VOBs created in the VOB storage directory:
> if [ ! -d $vobstorage ];then
> exit 100
> fi
>
> # Number of VOBS found in the storage:
> nvobs_made=`cd $vobstorage; ls | sed 's/.vbs//' | wc -l`
>
> # Number of VOBS registered on this host:
> nvobs_reg=`cleartool lsvob | grep /net/$host$vobstorage | wc -l`
>
> #if [ $nvobs_reg -lt $nvobs_made ];then
> if [ $nvobs_reg -lt 1 ];then
> # Not running:
> exit 100
> else
> # Running:
> exit 110
> fi


The offline agent work in the same way as the online with excetion of
registering and tagging of the VOB.


vobs_<group>_mnt service group:
--------------------------------
After VOBs are registered and tagged on the cluster node they need to
be mounted. The mount can be done anywhere in the cluster and not necessaryly
on the same node where they are registered.
The vobs_<group>_mnt service group performs the mounting operation. It is
desinged to complement vobs_<group>_reg services group and operate on a set
of VOBs.

The following resource compose this service group:

- ccase_<group>_mnt resource of type ClearCase. This resource powers up
clearcase on the cluster node and makes it ready for use.

- ccvobs_<group>_mnt resource of type CCVOBMount.
The work of mounting a set of VOBs is implemented in the in this resource.
The VOBs are defined as a list of tags.

Here is the dependency diagram of the resrouces of this group:

ccvobs_<group>_mnt (CCVOBMount)
|
v
ccase_<group>_mnt (ClearCase)

There may be many instances of the vobs_<group>_mnt - the <group> is used
as the name of the VOBs group. We used "cluster4" to match the name of
the vobs_cluster4_reg group. Here is how we defined it (in main.cf):

group vobs_cluster4_mnt (
SystemList = { foo_c, bar_c }
AutoStartList = { foo_c }
Parallel = 1
PreOnline = 1
)

CCVOBMount ccvobs_cluster4_mnt (
CCPassword = foobar
Tags = { cctest, admin }
)

ClearCase ccase_cluster4_mnt (
)

requires group vobs_cluster4_reg online global

ccvobs_cluster4_mnt requires ccase_cluster4_mnt


CCVOBMount resource implementation:

The resource type is defined as follows:

type CCVOBMount (
static str ArgList[] = { CCPassword, Tags }
NameRule = resource.Name
str CCPassword
str Tags[]
str Storage
)

The CCPassword is the ClearCase registry password. The Tags is the
list of VOB's tags to mount.

The online agent mounts and unlocks the list of VOBs. The NFS shares are
also refreshed to allow for remove VOBs use.

> #!/bin/sh
> # CCVOBMount online:
> shift
> pass=$1
> shift
> shift
> tags=$*
>
> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then
> # No albd_server - no vobs:
> exit 1
> fi
> for tag in $tags;do
> cleartool mount /vobs/$tag
> cleartool unlock vob:/vobs/$tag
> done
>
> # Refresh share table - othewise remote nodes can't mount storage
directory:
> shareall

The offline agent terminates all processes that use clearcase file systems.
Unexports all views and then umounts all vobs locking them first.

> #!/bin/sh
> # CCVOBMount offline:
> shift
> pass=$1
> shift
> shift
> tags=$*
>
> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then
> # No albd_server - no vobs:
> exit 1
> fi
>
> # Kill users of mvfs:
> tokill=`/usr/atria/sun5/kvm/5.6/fuser_mvfs -n /dev/ksyms`
> while [ -n "$tokill" ];do
> kill -HUP $tokill
> tokill=`/usr/atria/sun5/kvm/5.6/fuser_mvfs -n /dev/ksyms`
> done
>
> # Unexport views:
> /usr/atria/etc/export_mvfs -au
>
> for tag in $tags;do
> on=`cleartool lsvob /vobs/$tag | grep '^*' | wc -l`
> if [ $on -ne 0 ];then
> cleartool lock vob:/vobs/$tag
> cleartool umount /vobs/$tag
> fi
> done



views service group:
----------------------

The views service group manages a group of views configured to use
a specific directory as parent of views storage directories.
All views that are found in the provided directory are started, stopped
and monitored. The group is using the following resources:

views:

views_views (CCViews)
| |
v v
ccase_views (ClearCase) mount_views (Mount)
|
v
volume_views (Volume)
|
v
dgroup_views (DiskGroup)

The views custom resource CCViews is defined as follows (in types.cf):

type CCViews (
static str ArgList[] = { CCPassword, Storage }
NameRule = resource.Name
str CCPassword
str Storage
)



ClearCase service groups and NIS:
---------------------------------------

,,,


Disk backup of the shared storage:
-------------------------------------

The backup node of the cluster should switch all of the shared storage to
itself before doing the backup. This can be done with a simple switchover
of all the storage related service groups to the backup node, doing the
backup and switching the groups back to its intended locations.
We do it with the following shell script that does full backup of all cluster
to the DAT tape evey night.


> #!/bin/sh
> # $Id: VCS-HOWTO,v 1.25 2002/09/30 20:05:38 pzi Exp $
> # Full backup script. All filesystem from vfstab in cpio format to DLT.
> # Logs in /backup/log_<date>.
>
> set -x
>
> SYSD=/opt/backup
> LOG=log_`date +%y%m%d_%H%M%S`
> ATRIAHOME=/usr/atria; export ATRIAHOME
> PATH=${PATH}:$ATRIAHOME/bin
> DEV=/dev/rmt/4ubn
>
> exec > $SYSD/$LOG 2>&1
>
> # Move all cluster shared storage to the backup node:
> groups="nis_master homes views vobs_cluster4_mnt vobs_cluster4_reg
ccregistry"
> for group in $groups; do
> /opt/VRTSvcs/bin/hagrp -switch $group -to zeus_c
> done
>
> # Take all file systems in /etc/vfstab that are of type ufs or vxfs and
> # are not /backup:
> FSYS=`awk '$1 !~ /^#/ { \
> if ( ( $4 == "ufs" || $4 == "vxfs" ) \
> && $3 != "/backup" && $3 != "/backup1" && $3 != "/spare" ) \
> { print $3 } \
> }' /etc/vfstab`
>
> # Start and stop jobs for each file system:
> vobs=`cleartool lsvob | grep \* | awk '{ printf "vob:%s ", $2 }'`
> cluster4_start="cleartool lock -c Disk-Backup-Running-Now $vobs"
> cluster4_stop="cleartool unlock $vobs"
>
> mt -f $DEV rewind
>
> cd /
>
> for f in $FSYS;do
> f=`echo $f | sed 's/^\///'`
> eval $"${f}_start"
> echo $f
> find ./$f -mount | cpio -ocvB > $DEV
> eval $"${f}_stop"
> done
>
> mt -f $DEV rewind
>
> # Move cluster to the split state - hades_c runs all users homes, etc.
> groups="homes views"
> for group in $groups; do
> /opt/VRTSvcs/bin/hagrp -switch $group -to hades_c
> done
>
> ( head -40 $SYSD/$LOG; echo '...'; tail $SYSD/$LOG ) | \
> mailx -s "backup on `hostname`" root
>
>
>
>


How to configure VERITAS Cluster Server (VCS) to set up event
notification to users

Details:

The following example makes use of the "resfault" event. This event type can be configured to
mail predefined users about a resource that has failed. It can be set up as follows:

The following actions need to be performed on all nodes in a cluster:
1. Copy the trigger
from /opt/VRTSvcs/bin/sample_triggers/resfault to/opt/VRTSvcs/bin/triggers/resfault

2. To set up mail notification, uncomment the following section at the end of the resfault
file/opt/VRTSvcs/bin/triggers/resfault:
# put your code here...
#
# Here is a sample code to notify a bunch of users.
# @recipients=("username@servername.com");
#
# $msgfile="/tmp/resfault";
# echo system = $ARGV[0], resource = $ARGV[1] > $msgfile;
# foreach $recipient (@recipients) {
# # Must have elm setup to run this.
# elm -s resfault $recipient < $msgfile;
# }
# rm $msgfile;
#
By default, the script, once uncommented, is designed to use elm to notify users. Some systems
will not have elm. If not, the standard mailx utility can be used instead, as detailed below. Note
the use of the "\", which is needed so that the "@" gets interpreted correctly by Perl:

# Here is a sample code to notify a bunch of users.
@recipients=("root\@anotherserver.com,root");

$msgfile="/tmp/resfault";
echo system = $ARGV[0], resource = $ARGV[1] > $msgfile;
foreach $recipient (@recipients) {
# Must have mailx to run this.
mailx -s resfault $recipient < $msgfile;
}
rm $msgfile;

This is all that has to be done. When a resource next fails, a message similar to the following
will be seen in the relevant person's mailbox:
From: Super-User <root>
Message-Id: <200012241016.KAA03694@sptsun****.uk.veritas.com>
To: root
Subject: resfault
Content-Length: 42

system = sptsun****, resource = nfsdg_nfs

How to place a volume and disk group under the control of VCS
(Symantec Cluster Server)

Details:

Following is the algorithm to create a volume, file system and put them under VCS (Symantec
Cluster Server).

1. Create a disk group. This can be done with vxdg.
2. Create a mount point and file system.

3. Deport the disk group. This can be done with vxdg.
4. Create a service group. This can be done with hagrp.
5. Add cluster resources (given bellow) to the service group. This can be done with hagrp.
Resources Name Attributes
1. Disk group, disk group name
2. Mount block device, FSType, MountPoint.


Note: An example of a service group that contains a DiskGroup resource can be found in the
"Symantec Cluster Server 6.1 Bundled Agents Guide":
https://sort.symantec.com/public/documents/vcs/6.1/linux/productguides/html/vcs_bundled_age
nts/ch02s02s02.htm

The complete "Symantec Cluster Server 6.1 Bundled Agents Guide" can be found here:
http://www.symantec.com/business/support/resources/sites/BUSINESS/content/live/DOCUMEN
TATION/6000/DOC6943/en_US/vcs_bundled_agents_61_lin.pdf


6. Create dependencies between the resources (given bellow). This can be done
using hadep.

1. Mount and disk group.

7. Enable all resources. This can be done using hares.


The following examples show how to create a RAID-5 volume with a VxFS file system and put it
under VCS control.

Method 1 - Using the command line
1. Create a disk group using Volume Manager with a minimum of 4 disks:
# vxdg init datadg disk01=c1t1d0s2 disk02=c1t2d0s2 disk03=c1t3d0s2 disk04=c1t4d0s2
# vxassist -g datadg make vol01 2g layout=raid5

2. Create a mount point for this volume:
# mkdir /vol01

3. Create a file system on this volume:
# mkfs -F vxfs /dev/vx/rdsk/datadg/vol01

4. Deport this disk group:
# vxdg deport datadg

5. Create a service group:
# haconf -makerw
# hagrp -add newgroup
# hagrp -modify newgroup SystemList <sysA> 0 <sysB> 1
# hagrp -modify newgroup AutoStartList <sysA>

6. Create a disk group resource and modify its attributes:
# hares -add data_dg DiskGroup newgroup
# hares -modify data_dg DiskGroup datadg

7. Create a mount resource and modify its attributes:
# hares -add mnt Mount newgroup
# hares -modify mnt BlockDevice /dev/vx/dsk/datadg/vol01
# hares -modify mnt FSType vxfs
# hares -modify mnt MountPoint /vol01

8. Link the mount resource to the disk group resource:
# hares -link mnt data_dg


9. Enable the resources and close the configuration:
# hagrp -enableresources newgroup
# haconf -dump -makero


Method 2 - Editing /etc/VRTSvcs/conf/config/main.cf
# hastop -all
# cd /etc/VRTSvcs/conf/config
# haconf -makerw
# vi main.cf
Add the following lines to end of this file, customizing the attributes as appropriate for your
configuration:
**********************************************START******************************************************
group newgroup (
SystemList = { sysA =0, sysB=1}
AutoStartList = { sysA }
)

DiskGroup data_dg (
DiskGroup = datadg
)

Mount mnt (
MountPoint = "/vol01"
BlockDevice = " /dev/vx/dsk/datadg/vol01"
FSType = vxfs
)

mnt requires data_dg
************************************************END******************************************************
# haconf -dump -makero
# hastart -local
Check status of the new service group.

How to dynamically remove a node from a live cluster without
interruptions

Details:

Before making changes to the VERITAS Cluster Server (VCS) configuration, the main.cf file,
make a good copy of the current main.cf. In this example, csvcs6 is removed from a two node
cluster. Execute these commands on csvcs5, the system not to be removed.

1. cp -p /etc/VRTSvcs/conf/config/main.cf
/etc/VRTSvcs/conf/config/main.cf.last_known.good

2. Check the current systems, group(s), and resource(s) status

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A csvcs5 RUNNING 0
A csvcs6 RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N ONLINE
B wvcs csvcs5 Y N OFFLINE
B wvcs csvcs6 Y N ONLINE


Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and
service group wvcs are configured to run on both nodes. Service group test_B is configured to
run on csvcs6 only.

Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service
group wvcs to csvcs5 if it is to be online.

hagrp -switch <service_group> -to <node>

# hagrp -switch wvcs -to csvcs5

3. Check for service group dependency

# hagrp -dep
#Parent Child Relationship
test_B test_A online global

4. Make VCS configuration writable

# haconf -makerw

5. Unlink the group dependency if there is any. In this case, the service group test_B requires
test_A.

hagrp -unlink <parent_group> <Child_group>

# hagrp -unlink test_B test_A

6. Stop VCS on csvcs6, the node to be removed.

hastop -sys <node>

# hastop -sys csvcs6

7. Check the status again, making sure csvcs6 is EXITED and the failover service group is
online on running node.

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A csvcs5 RUNNING 0
A csvcs6 EXITED 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N OFFLINE
B wvcs csvcs5 Y N ONLINE
B wvcs csvcs6 Y N OFFLINE


8. Delete csvcs6 from wvcs and test_A SystemList.

hagrp -modify <service_group> SystemList -delete <node>

# hagrp -modify wvcs SystemList -delete csvcs6
# hagrp -modify test_A SystemList -delete csvcs6

9. Check all the resources belonging to the service group and delete all the resources from
group test_B before removing the group.

hagrp -resources <service_group>

# hagrp -resources test_B
jprocess
kprocess

hares -delete <resource_name>

# hares -delete jprocess
# hares -delete kprocess

hagrp -delete <service_group>

# hagrp -delete test_B

10. Check the status again, making sure all the service groups are online on the other node. In
this case csvcs5.

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A csvcs5 RUNNING 0
A csvcs6 EXITED 0

-- GROUP STATE
-- Group System Probed AutoDisabled State


B test_A csvcs5 Y N ONLINE
B wvcs csvcs5 Y N ONLINE


11. Delete system (node) from cluster, save the configuration, and make it read only.

# hasys -delete csvcs6

# haconf -dump -makero

12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be
necessary to reduce the number for " /sbin/gabconfig -c -n # " in the /etc/gabtab file on all the
running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the
GAB will not be auto seed.


To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):

1. Unconfigure and unload GAB

/sbin/gabconfig -u

modunload -i `modinfo | grep gab | awk '{print $1}`

2. Unconfigure and unload LLT

/sbin/lltconfig -U

modunload -i `modinfo | grep llt | awk '{print $1}`

3. Prevent LLT, GAB and VCS from starting up in the future

mv /etc/rc2.d/S70llt /etc/rc2.d/s70llt
mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs

4. If it ** is not ** desired to be running VCS on this particular node again, all the VCS related
packages and files can now be removed.

pkgrm VRTSperl
pkgrm VRTSvcs
pkgrm VRTSgab
pkgrm VRTSllt

rm /etc/llttab
rm /etc/gabtab


NOTE: Due to the complexity and variation of VCS configuration, it is not possible to cover all
the possible situations and conditions of a cluster configuration in one technote. The above
steps are essential for common configuration in most VCS setups and provide some idea how
to deal with complex setups.

How to offline a critical resource without affecting other resources
and bringing a service group offline

Details:

If there is a need to bring a resource in a service group offline without affecting other resources
being run, then something similar to the following procedure can be used:
The aim here is to be able to take a critical resource offline to perform maintenance. Note that,
ordinarily, a critical resource being offlined would cause the failover of a service group.

1. Freeze the service group in question, e.g.:

#haconf -makerw
#hagrp -freeze jbgroup -persistent
#haconf -dump makero

#hagrp -display jbgroup reveals that the group is now frozen:
Group Attribute System Value
jbgroup AutoFailOver global 1
jbgroup AutoStart global 1
jbgroup AutoStartList global sptsunvcs2
jbgroup FailOverPolicy global Priority
jbgroup Frozen global 1 <-------------------

2. The resource can then be taken offline, e.g.:

#umount /jbvol (in this case, to serve as a simple example, the mountpoint /jbvol was taken
offline)

3. The output from hastatus reveals:
# hastatus
attempting to connect....connected

group resource system message
--------------- -------------------- --------------- --------------------
sptsunvcs2 RUNNING
jbgroup sptsunvcs2 ONLINE
bdg sptsunvcs2 ONLINE
jbip sptsunvcs2 ONLINE
-------------------------------------------------------------------------
jbmount sptsunvcs2 ONLINE
jb_hme1 sptsunvcs2 ONLINE
jbdg_jbvol sptsunvcs2 ONLINE
jbgroup sptsunvcs2 PARTIAL *FAULTED*
jbmount sptsunvcs2 *FAULTED*

So it can be seen that by freezing the group only resources manually taken offline go offline.
4. Once the maintenance work has been carried out, the changes can then be reversed. Start
by unfreezing the group:

#haconf -makerw
#hagrp -unfreeze jbgroup -persistent
#haconf -dump -makero

5. All that is required now is to clear the failed resources and online them. In this case:

#hastatus -summary reports just the one failed resource:
...
RESOURCES FAILED
-- Group Type Resource System
C jbgroup Mount jbmount sptsunvcs2
...
That can be cleared by
#hares -clear jbmount
Then
#hares -online jbmount -sys sptsunvcs2
To verify the resource is online correctly:
#hastatus -summary

-- GROUP STATE
-- Group System Probed AutoDisabled State

B jbgroup sptsunvcs1 Y N OFFLINE

B jbgroup sptsunvcs2 Y N ONLINE

Related Articles

TECH21770 Preventing service group failover options in VERITAS Cluster Server (VCS)
TECH21859 Shutting down an Oracle instance for maintenance under VERITAS Cluster
Server (VCS) control


How to set up a disk to be used by the VERITAS Cluster Server
(VCS) heartbeat disk feature

Details:

For the VCS private network, there are two types of channels available for heartbeating:
network connections and heartbeat regions on shared disks.

To be able to use heartbeat regions on a shared disk:

1. Choose a disk which is seen by all nodes in the cluster.
2. Make sure that there is no data on the disk you want to use. The disk will be reinitialized
during this process.
3. Unmount all file systems on the disk.
4. If the disk is under Volume Manager control, remove any volumes, plexes, or subdisks from
the disk and remove the disk from any active disk group or deport its disk group.
5. Allocate a VCS partition on the disk. Type:

# /opt/VRTSvcs/bin/hahbsetup disk_tag
...where disk_tag resembles "c#t#d#", which define the controller number(c#), the SCSI target
ID (t#), and the SCSI logical unit number (d#) of the selected disk.
Enter y when prompted.

For example:
# /opt/VRTSvcs/bin/hahbsetup c1t1d0

Output resembles the following text:

The hadiskhb command is used to set up a disk for combined use
by VERITAS Volume Manager and VERITAS Cluster Server for disk
communication.

WARNING: This utility will destroy all data on c1t1d0

Have all disk groups and file systems on disk c1t1d0 been either
unmounted or deported? y

There are currently slices in use on disk /dev/dsk/c1t1d0s2
Destroy existing data and reinitialize disk? y

1520 blocks are available for VxCS disk communication and
service group heartbeat regions on device /dev/dsk/c1t1d0s7

This disk can now be configured into a Volume Manager disk
group. Using vxdiskadm, allow it to be configured into the disk
group as a replacement disk. Do not select reinitialization of
the disk.

After running vxdiskadm, consult the output of prtvtoc to
confirm the existence of slice 7. Reinitializing the disk
under VxVM will delete slice 7. If this happens, deport the disk
group and rerun hahbsetup.

The disk should now be initialized.

6. Display the partition table. Type:

# prtvtoc /dev/dsk/disk_tags2

For example:
# prtvtoc /dev/dsk/c1t1d0s2

Output resembles:


* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
2 5 01 0 8887440 8887439
3 15 01 0 1520 1519
4 14 01 3040 8884400 8887439
7 13 01 1520 1520 3039

7. Confirm that slice 7 exists and that its tag is 13.

In that example, the partition /dev/dsk/c1t1d0s7 can now be used for heartbeat disk purpose
into VCS.

Related Articles

TECH20362 How to guarantee that disk heartbeat regions are valid using Signatures


How to set up VERITAS Cluster Server node names that do not
depend on the system host name

Details:

A VERITAS Cluster Server (VCS) cluster can be set up using its own node name that
corresponds to a node ID.
For example, if the two systems are named king and queen (the output fromhostname or uname
-n.), then the best thing to do is change king and queen to something else and set up the VCS
node names as sysA for king and sysB for queen. This way, if host names king and queen
need to be changed to any other host names in the future, the VCS cluster will not be affected
by it, and the cluster node names will remain as sysA and sysB. Here are the steps to
accomplish this:

1. On all systems within the cluster, the /etc/llthosts file must have both the node IDs and node
name. For example, if the node ID are 0 and 1, then /etc/llthosts should be:

0 sysA
1 sysB

2. All systems within the cluster must have an /etc/VRTSvcs/conf/sysname file. This file must
have the cluster node names defined for the system. In this case, sysA for king and sysB for
queen.

On the system king /etc/VRTSvcs/conf/sysname, it should have just this:

sysA

On the system queen /etc/VRTSvcs/conf/sysname, it should have just this:

sysB

NOTE: The sysname file must be in the conf directory where the VRTSvcs is installed.

3. On all systems within the cluster, the /etc/llttab file must point to the
/etc/VRTSvcs/conf/sysnamefile for its "set-node" token. Here is a sample of /etc/llttab:
set-cluster 1
set-node /etc/VRTSvcs/conf/sysname
link qfe0 /dev/qfe:0
link qfe1 /dev/qfe:1
link-lowpri hme0 /dev/hme:0
start

4. The /etc/VRTSvcs/conf/config/main.cf must reference the systems as sysA and sysB.
Here is a sample of /etc/VRTSvcs/conf/config/main.cf from a cluster called royal:
include "types.cf"

cluster royal (
UserNames = { root = pwxzyyZyykKo }
CounterInterval = 5
Factor = { runque = 5, memory = 1, disk = 10, cpu = 25,
network = 5 }
MaxFactor = { runque = 100, memory = 10, disk = 100, cpu = 100,
network = 100 }
)

system sysA

system sysB

group File_test (
SystemList = { sysB, sysA }
PrintTree = 0
AutoStartList = { sysB, sysA }
)

FileOnOff ftest (
PathName = "/tmp/file_test"
)
5. Continue to follow the installation guide that came with the software to configure global
atomic broadcast (GAB) and start VCS.

How to tell which resource is causing the "Concurrency Violation"
in a VERITAS Cluster Server (VCS) cluster

Details:

Concurrency Violation indicates that one of the resources from a non-parallel service group is
online on more than one node within the cluster. When this happens, an error message will
appear on the console, the messages/syslog.log file and VCS engine log. Here is an example
from a cluster consisting of two nodes (rum and coke).

Similar error message in messages/syslog.log file and on console:

Nov 27 14:45:38 rum WARNING:: VCS Concurrency Violation!!! Group="group1" Hosts=(rum
coke)

Similar error message in VCS engine log:

TAG_E 2000/11/27 14:45:38 Resource ip1 is online on rum
TAG_B 2000/11/27 14:45:38 CurrentCount increased above 1 for failover group
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From the console or in the messages/syslog.log file, it provides the following:

1. The service group is group1.
2. The trouble hosts are rum and coke.

From the VCS engine log, it provides the group's "CurrentCount" number. The line immediately
above the "CurrentCount" number shows the resource that is causing the concurrency violation.
In this example, the resource is ip1, which happens to be an IP resource type.


hastatus -sum from coke before the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A coke RUNNING 0
A rum RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B group1 coke Y N ONLINE
B group1 rum Y N OFFLINE

# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 1634948 0 1634948 0 0 0
hme0 1500 coke coke 230705 0 79164 0 327 0
hme0:1 1500 10.10.8.0 10.10.10.2 0 0 0 0 0 0


hastatus -sum after the concurrency violation took place:

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A coke RUNNING 0
A rum RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B group1 coke Y N ONLINE
B group1 rum Y N PARTIAL


netstat -i from rum after the violation took place, showing the virtual ip 10.10.10.2 (resource ip1)
is also active:

# netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
Queue
lo0 8232 loopback localhost 2026601 0 2026601 0 0 0
hme0 1500 rum rum 1108676 0 1781672 0 91 0
hme0:1 1500 10.10.8.0 10.10.10.2 0 0 0 0 0 0

To fix this problem, ifconfig the virtual interface down and clear the resource on the partial
system (rum). In this case, it is on rum and the virtual interface is hme0:1 with IP address of
10.10.10.2.

# ifconfig hme0:1 inet 0.0.0.0 down

Because the interface was taken offline outside of VCS, the resource state will change from
online to faulted. To remove the faulted flag from the resource, execute the following hares
command.

# hares -clear ip1

In VERITAS Cluster Server when the cables that attach the disk
array to the node are detached, it does not failover

Details:

A disk group is disabled if klog, configuration copies, or headers in the private region on a
sufficient number of disks are invalid.

When a disk group is disabled, volumes may still be able to do I/O if while doing the I/O no
changes are required to the private region (location of klog, configuration copies, and headers)
of the disks.


When can a disabled disk group be deported?


A disabled disk group can be deported when any applications doing I/O to the volume(s) have
either had all their I/O's completed (including retries of I/O) and have called the close routine of
the volume(s). If all volumes are closed, the disk group can be deported.

If the disabled disk group has been deported, it is safe to import the disk group to the system
again without doing a reboot.


What happens when a disk cable is pulled?


When the cable is pulled, I/Os to volumes return errors. Since the private region of the disks
can not be updated, the disk group gets disabled.

Before the disk group can be deported, all of the pending I/Os must complete. In order for the
I/O to complete, it may be retried by the underlying disk driver a number of times. Also, some
disk drivers may not ever return an error. For example, on Solaris the JNI driver has a
parameter in the /kernel/drv/fcaw.conf file called "failover". By default this is set to "0" which
means that the I/O will never timeout. Therefore, the file system or other application will never
see an error. It is recommended that failover be set to the number of seconds after which an
I/O should be timed out. In the case of the vxfs file system, the file system will not be able to be
unmounted until there are no I/Os outstanding. Therefore, the volume will not be closed.

In the case of the disk group, the disk group cannot be deported until all of the volumes are
closed.


When a storage cable is pulled, the following is observed:

1. Cable is pulled

2. Volume agent returns that volume is faulted (if parameters are set properly in the disk
drivers to return errors, e.g. if failover for JNI driver is set to 0 it will never return error)

3. Volume resources fault. These trigger the offline of the whole group.

4. Clean entry point is called for volume resources -- volumes are offline

5. Offline for whole group begins.

6. Offline for mount resource begins.

If all processes accessing file system can be killed unmount will succeed.

If all processes accessing file system cannot be killed, then:

a.) Try to force unmount if possible (Solaris 2.8 and ufs)

b.) If that cannot be done, then remain online

7. In Step 6 if force unmount succeeds, the disk group can be deported and online on another
server

8. In Step 6 if unmount cannot be forced and the file system remains online, the disk group
cannot be deported and group will remain online (cannot be failed over since even though the
disk group is marked disabled. Volumes are still open since the file system could not be
unmounted.)


Why does the monitor entry point of the disk group agent return ONLINE if the disk
group is disabled?


The problem is that if the I/Os have not timed out and the volume is closed, VCS will not be able
to deport the disk group in the offline entry point. In this case, the only thing for VCS to do is to
return ONLINE in the monitor entry point.

It has been suggested that the monitor should return OFFLINE if the disk group is disabled.
This is incorrect since I/Os may still be happening to the volumes as outlined above.

It has also been suggested that the system should panic when a disk group is disabled. This
means that any applications which could cleanly shutdown (still able to do I/O to their volume)
will not. However, data corruption due to the disk group being imported on two hosts will be
prevented.

Another suggestion is that after all I/Os have been retried and failed, the application (file system,
database, etc.) should fail, close the volumes and allow the disk group to be deported.

NOTE: It is unknown how long or if ever that the I/Os will time out. This is the suggestion of
attempting a deport of the disk group in the monitor entry point if all volumes in the disk group
are disabled. At this time this solution is best since doing the work of the offline entry point in the
monitor entry point may raise other problems later.

Therefore, the correct solution when a disk group is disabled is to provide an error message to
the log and wait for administrative help.This is the best solution which does not compromise
data integrity.

VRTSvxfs 3.4 has addressed this issue and has made some changes to the way the file
systems are unmounted. Currently, if the file system is in a disabled state and it believes
resources are in use that are not, it will force it to unmount


Moving the service groups to the second subcluster
To move the service groups to the second subcluster
1. Switch failover groups from the first half of the cluster to one of the nodes in the
second half of the cluster. In this procedure, galaxy is a node in the first half of the
cluster and jupiter is a node in the second half of the cluster. Enter the following:
# hagrp -switch failover_group -to jupiter
2. On the first half of the cluster, stop all applications that are not configured under
VCS. Use native application commands to stop the applications.
3. On the first half of the cluster, unmount the VxFS or CFS file systems that are not
managed by VCS.
# mount | grep vxfs
Verify that no processes use the VxFS or CFS mount point. Enter the following:
# fuser -c mount_point
Stop any processes using a VxFS or CFS mount point with the mechanism provided
by the application.
Unmount the VxFS or CFS file system. Enter the following:
# umount /mount_point
4. On the first half of the cluster, bring all the VCS service groups offline including CVM
group. Enter the following:
# hagrp -offline group_name -sys galaxy
When the CVM group becomes OFFLINE, all the parallel service groups such as the
CFS file system will also become OFFLINE on the first half of the cluster nodes.
5. Verify that the VCS service groups are offline on all the nodes in first half of the
cluster. Enter the following:
# hagrp -state group_name
6. Freeze the nodes in the first half of the cluster. Enter the following:
# haconf -makerw
# hasys -freeze -persistent galaxy
# haconf -dump -makero
7. If IO fencing is enabled, then on each node of the first half of the cluster, change the
contents of the /etc/vxfenmode file to configure I/O fencing in disabled mode. Enter
the following:
8. # cp /etc/vxfen.d/vxfenmode_disabled /etc/vxfenmode
9.
10. # cat /etc/vxfenmode
11.
12. [root@swlx08 ~]# cat /etc/vxfenmode
13.
14. #
15.
16. # vxfen_mode determines in what mode VCS I/O Fencing should work.
17.
18. #
19.
20. # available options:
21.
22. # scsi3 - use scsi3 persistent reservation disks
23.
24. # customized - use script based customized fencing
25.
26. # disabled - run the driver but don't do any actual fencing
27.
28. #
29.
vxfen_mode=disabled
30. If the cluster-wide attribute UseFence is set to SCSI3, then reset the value to NONE
in the/etc/VRTSvcs/conf/config/main.cf file, in first half of the cluster.
31. Verify that only GAB ports a, b, d and h are open. Enter the following:
32. # gabconfig -a
33.
34. GAB Port Memberships
35.
36. =======================================================
37.
38. Port a gen 6b5901 membership 01
39.
40. Port b gen 6b5904 membership 01
41.
42. Port d gen 6b5907 membership 01
43.
Port h gen ada40f membership 01
Do not stop VCS. Port h should be up and running.
44. In the first half of the cluster, stop all VxVM and CVM volumes. Enter the following
command for each disk group:
# vxvol -g diskgroup stopall
Verify that no volumes remain open. Enter the following:
# vxprint -Aht -e v_open
45. On first half of the cluster, upgrade the operating system on all the nodes, if
applicable. For instructions, see the upgrade paths for the operating system.

The VERITAS Cluster Server application agent monitor program
script must be present on the local disk when creating a custom
monitor program or the resource will never get probed.

Issue:

The VERITAS Cluster Server application agent monitor program script must be present on the
local disk when creating a custom monitor program or the resource will never get probed.

Error:

TAG_B 2001/03/19 11:42:32 (boing) VCS:148506:monitor:xclock:Could not exec
MonitorProgram.
TAG_B 2001/03/19 11:43:32 (boing) VCS:13027:Resource(xclock) - monitor procedure did not
complete within the expected time.

Details:

When creating a custom monitor program for use with the VERITAS Cluster Server (VCS)
application agent, the monitor program must be accessible to all the systems which will run that
agent at all times. The monitor program cannot reside on the shared storage. If the monitor
program is the on shared storage, VCS will not be able to determine its status.


# hares -display xclock

#RESOURCE ATTRIBUTE SYSTEM VALUE
xclock Group global application
xclock Type global Application
xclock AutoStart global 1
xclock Critical global 1
xclock Enabled global 1
xclock LastOnline global boing
xclock MonitorOnly global 0
xclock ResourceOwner global unknown
xclock TriggerEvent global 0
xclock ArgListValues boing ""/sharedstorage/start_xclock
/sharedstorage/stop_xclock""
/sharedstorage/mon_xclock 0 1""
xclock ConfidenceLevel boing 0
xclock Flags boing MONITOR TIMEDOUT
xclock IState boing not waiting
xclock Probed boing 1
xclock Start boing 0
xclock State boing OFFLINE / MONITOR TIMEDOUT
xclock CleanProgram global
xclock MonitorProcesses global
xclock MonitorProgram global /sharedstorage/mon_xclock
xclock PidFiles global
xclock StartProgram global /sharedstorage/start_xclock
xclock StopProgram global /sharedstorage/stop_xclock
xclock User global






VERITAS Cluster Server (VCS) IP, IPMultiNIC, Network Interface
Card (NIC), IPMultiNICB, MultiNICA and MultiNICB resources will
not plumb the initial NIC interface

Issue:

VERITAS Cluster Server (VCS) IP, IPMultiNIC, Network Interface Card (NIC), IPMultiNICB,
MultiNICA and MultiNICB resources will not plumb the initial NIC interface.

Solution:

This means the IP, IPMultiNIC, Network Interface Card (NIC), IPMultiNICB, MultiNICA and
MultiNICB resources need to "see" the NIC interface up with an administrative IP address
before these resources can function correctly. In the case of the MultiNICA resource, only the
first listed interface should be configured by the operating system at boot. This is required so
that VCS can probe and determine the status of the NIC, MultiNICB or MultiNICA resources
when the cluster starts.

The following is an example of how to plumb up an NIC (qfe2) interface under Solaris 8:

# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.133.17.34 netmask fffff800 broadcast 10.133.23.255
ether 8:0:20:b4:f5:43
# ifconfig qfe2 plumb
# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.133.17.34 netmask fffff800 broadcast 10.133.23.255
ether 8:0:20:b4:f5:43
qfe2: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 513
inet 0.0.0.0 netmask 0
ether 8:0:20:b3:83:82
# ifconfig qfe2 10.133.17.78 netmask 255.255.248.0 up
# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.133.17.34 netmask fffff800 broadcast 10.133.23.255
ether 8:0:20:b4:f5:43
qfe2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 513
inet 10.133.17.78 netmask fffff800 broadcast 10.255.255.255
ether 8:0:20:b3:83:82

To have Solaris plumb and set up an IP on an interface, create a "hostname.<interface>" in the
/etc directory with a valid host name. Next add that host name to the /etc/hosts file as follows:

#cat /etc/hostname.qfe2
Myhost
#cat /etc/hosts
#
# Internet host table
#
127.0.0.1 localhost
10.133.17.78 Myhost
Please contact Sun for more information about Solaris configuring an interface at boot time.



Ways to prevent and reduce the effects of split-brain in VERITAS
Cluster Server for UNIX

Issue:

Ways to prevent and reduce the effects of split-brain in VERITAS Cluster Server for
UNIX

Solution:

This document discusses split-brain with intent to indicate current and future options provided
by VERITAS Cluster Server (VCS) to prevent split-brain. Additional considerations for limiting
the effects of split-brain once it happens are also mentioned.

What is split brain? The following is taken from the VCS 3.5 User's Guide, and provides a
discussion on split-brain.

Network Partitions and Split-Brain
Under normal conditions, when a VCS system ceases heartbeat communication with its peers
due to an event such as power loss or a system crash, the peers assume the system has failed
and issue a new, "regular" membership excluding the departed system. A designated system in
the cluster then takes over the service groups running on the departed system, ensuring the
application remains highly available. However, heartbeats can also fail due to network failures.
If all network connections between any two groups of systems fail simultaneously, a network
partition occurs. When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or "split-brain". A split brain
occurs when two independent systems configured in a cluster assume they have exclusive
access to a given resource (usually a file system or volume). The most serious problem caused
by a network partition is that it affects the data on shared disks. All failover management
software uses a predefined method to determine if its peer is "alive". If the peer is alive, the
system recognizes it cannot safely take over resources. Split brain occurs when the method of
determining peer failure is compromised. In virtually all failover management software (FMS)
systems, split-brain situations are rare. A true split brain means multiple systems are online and
have accessed an exclusive resource simultaneously.

Note Splitting communications between cluster nodes does not constitute a split brain. A split-
brain means cluster membership was affected in such a way that multiple systems use the
same exclusive resources, usually resulting in data corruption. The goal is to minimize the
chance of a system taking over an exclusive resource while another has it active, yet
accommodate a system powering off. In other words, a way to discriminate between a system
that has failed and one that is simply not communicating.

How VCS Avoids Split Brain
VCS uses heartbeats to determine the "health" of its peers. These can be private network
heartbeats, public (low-priority) heartbeats, and disk heartbeats. Regardless of the heartbeat
configuration, VCS determines that a system has faulted (due to power loss, kernel panic, etc.)
when all heartbeats fail simultaneously. For this method to work, the system must have two or
more functioning heartbeats and all must fail simultaneously. For VCS to encounter split brain,
the following events must occur:


A service group must be online on a system in a cluster.
The service group must have a system (or systems) designated in its SystemList
attribute as a potential failover target.
All heartbeat communication between the system with the online service group
and the system designated as the potential takeover target must fail simultaneously
while the original system stays online.
The potential takeover target must actually bring resources online that are
typically an exclusive, ownership-type item, such as disk groups, volume, or file
systems.


Jeopardy Defined
The design of VCS requires that a minimum of two heartbeat-capable channels be available
between nodes to protect against network failure. When a node is missing a single heartbeat
connection, VCS can no longer discriminate between a system loss and a loss of the last
network connection. It must then handle loss of communications on a single network differently
from loss on multiple networks. This procedure is called "jeopardy." As mentioned previously,
low latency transport (LLT) provides notification of reliable versus unreliable network
communications to global atomic broadcast (GAB). GAB uses this information, with or without a
functional disk heartbeat, to delegate cluster membership. If the system heartbeats are lost
simultaneously across all channels, VCS determines the system has failed. The services
running on that system are then restarted on another. However, if the node was running with
one heartbeat only (in jeopardy) prior to the loss of a heartbeat, VCS does not restart the
applications on a new node. This action of disabling failover is a safety mechanism that
prevents data corruption.

Split-Brain Prevention
What can be done to avoid split-brain? VCS provides a number of functions aimed at the
prevention of split-brain situations. The following list contains a brief explanation of each
prevention method.

Private Heartbeat - VERITAS recommends a minimum of two dedicated 100 megabit private
links between cluster nodes. These must be completely isolated from each other so the failure
of one heartbeat link cannot possibly affect the other.

Configuring private heartbeats to share any infrastructure is not recommended. Configurations
such as running two shared heartbeats to the same hub or switch, or using a single virtual local
area network (VLAN) to trunk between two switches induce a single point of failure in the
heartbeat architecture. The simplest guideline is "No single failure, such as power, network
equipment or cabling can disable both heartbeat connections."


Low-Priority Heartbeat - Heartbeat over public network does minimum traffic over the network
until you get down to one normal heartbeat remaining. Then it becomes a full functional
heartbeat.

Use of a low priority link is also recommended to provide further redundancy.
The low priority link prevents a jeopardy condition on loss of any single private link and provides
additional redundancy (consider low-pri heartbeat along with two private network heartbeats).


Disk Heartbeat - With disk heartbeating configured, each system in the cluster periodically
writes to and reads from specific regions on a dedicated shared disk. This exchange consists of
heartbeating only, and does not include communication about cluster status.

With disk heartbeating configured in addition to the private network connections, VCS has
multiple heartbeat paths available. For example, if one of two private network connections fails,
VCS has the remaining network connection and the disk heartbeat region that allow heartbeats
to continue normally.

Service Group Heartbeats - Disk heartbeats that are checked before a service group is
brought online.
This is designed to further assist in preventing a data corruption problem. If for some reason, a
system comes up and prepares to take over a service group, a service group heartbeat
configured at the bottom of the dependency tree first checks if any other system is writing to the
disk. The local system, via the ServiceGroupHB agent, tries to obtain "ownership" of the
available disks as specified by the disks attribute. The system gains ownership of a disk when it
determines that the disk is available and not owned by another system.

SCSI II Disk Reservations - Reserves and monitors SCSI disks for a system, enabling a
resource to go online on that system, when using the DiskReservation agent. The agent
supports all SCSI II disks. Use this agent to specify a list of raw disk devices, and reserve all or
a percentage of accessible disks for an application. The reservation prevents disk data
corruption by restricting other systems from accessing and writing to the disks. An automatic
probing feature allows systems to maintain reservations even when the disks or bus are reset.
The optional FailFast feature minimizes data corruption in the event of a reservation conflict by
causing the system to panic.
Note: The DiskReservation agent is supported on Solaris 2.7 and above. The agent is not
supported with dynamic multipathing software, such as VERITAS DMP.

IP Checking - This method is used in either the preonline-ipc event trigger, or simply make an
IP resource the first resource to online in the service group. Both methods check to make sure
the IP addresses for this service group are not being used by another system before onlining
the service group.

Auto Disabling Service Groups - (non-configurable) When VCS does not know the status of a
service group on a particular system, it autodisables the service group on that system.
Autodisabling occurs under the following conditions:


When the VCS engine, HAD, is not running on the system.
When all resources within the service group are not probed on the system.
When a particular system is visible through disk heartbeat only.


Under these conditions, all service groups that include the system in their SystemList attribute
are autodisabled. This does not apply to systems that are powered off.

When the VCS process (HAD) is killed, other systems in the cluster mark all service groups
capable of going online on the rebooted system as autodisabled. The AutoDisabled flag is
cleared when the system goes offline. As long as the system goes offline within the interval
specified in the ShutdownTimeout value, VCS treats this as a system reboot.

I/O Fencing SCSI III Reservations - I/O Fencing (VxFEN) is scheduled to be included in the
VCS 4.0 version. VCS can have parallel or failover service groups with disk group resources in
them. If the cluster has a split-brain, VxFEN should force one of the subclusters to commit
suicide in order to prevent data corruption. The subcluster which commits suicide should never
gain access to the disk groups without joining the cluster again. In parallel service groups, it is
necessary to prevent any active processes from writing to the disks. In failover groups, however,
access to the disk only needs to be prevented when VCS fails over the service group to another
node. Some multipathing products will be supported with I/O Fencing.


Minimizing the Effects of Split-Brain
In addition to avoiding split brain, there are utilities in place to help minimize the impact of
effects of split-brain should it still occur. The use of the concurrency trigger script, and the -j
option to the gabconfigcommand are listed below with brief descriptions.

Concurrency Violation Trigger Script - The violation trigger script that will offline a failover
service group that has resources online on more than one node at a time. Violation is invoked
when a resource (of a failover service group) is online on more than one node. This can
happen, when a resource goes online by itself while being online (thru VCS) on another node.

Gabconfig -j - If a network partition occurs, a cluster can "split" into two or more separate
mini-clusters. When two clusters join as one, VCS designates that one system be ejected. GAB
prints diagnostic messages and sends iofence messages to the system being ejected. The
system receiving the iofence messages tries to kill the client process. If the -j option is used
in gabconfig, the system is halted when the iofence message is received.


What happens after a forced stop of VERITAS Cluster Server
(VCS) when hastart is re-run?

Issue:

What happens after a forced stop of VERITAS Cluster Server (VCS) when hastart is re-run?

Solution:

There will be times when a forced stop of VCS may need to be initiated, e.g because of a hung
"had" daemon. If so, what happens when VCS is restarted via "hastart"?

When "hastop -all -force" is initiated, the "had" daemons are stopped on all nodes in the
respective cluster, but the actual resources (e.g. volumes, disk groups, etc) remain online,
which makes this action transparent to users. There is no monitoring of resources at this time,
because the engine is completely down on all cluster nodes.

When "hastart" is run on a node after the hastop -all -force, the engine will run all the monitor
scripts for all the present resource types referenced in main.cf (the main VCS configuration file)
in order to evaluate the status of the resources. If all the resources pass their monitors, then the
engine will report that resource as online, and in turn will report the respective service group as
online. If a resource fails its monitor script, then it will report it as offline. If all of the resources
are online, but one resource reports as offline, and that resource is critical, it will cause the
group to offline and try to failover, depending on the /etc/VRTSvcs/conf/config/main.cf
configured behavior.

When an "hastart" is performed after doing a hastop -all -force, if a given service group was
offline, "had" will not perform an online of the group; it will just check what is running. All results
can be gleaned from the engine log in /var/VRTSvcs/log/engine/engine.log_A

What is AutoDisable and how did the service groups become that
way?


Issue:


What is AutoDisable and how did the service groups become that way?

Solution:


When VCS does not know the status of a service group on a particular system, it autodisables
the service group on that system. This functionality exists to increase the protection of the data
when VCS does not completely know the states of the resources. It can be argued that the
changes are on the conservative side, but when large amounts of critical data are being
controlled by VCS, erring on the side of caution is wise.

Autodisabling occurs under the following conditions:
When the VCS engine, HAD, is not running on the system.
When all resources within the service group are not probed on the system.
When a particular system is visible through disk heartbeat only.

Under these conditions, all service groups that include the system in their SystemList attribute
are autodisabled. This does not apply to systems that are powered off.
Using the command "hagrp -autoenable <group name> -sys <system name>" will clear the
auto-disable flag for the listed system. Important note: The system name that should be used is
that of the system which caused the auto-disable state. For instance, if there is a service group
called "A" running on node 1 and it has nodes 2 and 3 in its system list, and node 3 is not
running VCS for an upgrade, the command would be "hagrp -autoenable A -sys 3" to clear
the autodisable. The other option is to remove the affected system from the service group's
system list. To achieve this, use "hagrp -modify <group name> SystemList -delete <list of
names >". For the above example, to change the system list for group A to remove node 3, the
command would be "hagrp -modify A SystemList -delete 3", where node 1 has a priority of 0
and node 2 has a priority of 1.
Caution: To bring a group online manually after VCS has autodisabled the group, make sure
that the group is not fully or partially active on any system that has the AutoDisabled attribute
set to 1 by VCS. Specifically, verify that all resources that may be corrupted by being active on
multiple systems are brought down on the designated systems. Then, clear the AutoDisabled
attribute for each system