Beruflich Dokumente
Kultur Dokumente
UNIX, Fundamentals
(Appendixes)
HA-VCS-410-101A-2-10-SRT (100-002149-B)
COURSE DEVELOPERS Disclaimer
Bilge Gerrits
The information contained in this publication is subject to change without
Siobhan Seeger notice. VERITAS Software Corporation makes no warranty of any kind
Dawn Walker with regard to this guide, including, but not limited to, the implied
warranties of merchantability and fitness for a particular purpose.
VERITAS Software Corporation shall not be liable for errors contained
herein or for incidental or consequential damages in connection with the
furnishing, performance, or use of this manual.
LEAD SUBJECT MATTER
EXPERTS
Copyright
Geoff Bergren
Paul Johnston Copyright 2005 VERITAS Software Corporation. All rights reserved.
Dave Rogers No part of the contents of this training material may be reproduced in any
form or by any means or be used for the purposes of training or education
Jim Senicka
without the written permission of VERITAS Software Corporation.
Pete Toemmes
Trademark Notice
VERITAS, the VERITAS logo, and VERITAS FirstWatch, VERITAS
Cluster Server, VERITAS File System, VERITAS Volume Manager,
TECHNICAL VERITAS NetBackup, and VERITAS HSM are registered trademarks of
CONTRIBUTORS AND VERITAS Software Corporation. Other product names mentioned herein
REVIEWERS
may be trademarks and/or registered trademarks of their respective
Billie Bachra companies.
Barbara Ceran
VERITAS Cluster Server for UNIX, Fundamentals
Bob Lucas
Participant Guide
Gene Henriksen
April 2005 Release
Margy Cassidy
VERITAS Software Corporation
350 Ellis Street
Mountain View, CA 94043
Phone 6505278000
www.veritas.com
Table of Contents
Appendix A: Lab Synopses
Lab 2 Synopsis: Validating Site Preparation ........................................................... A-2
Lab 3 Synopsis: Installing VCS ............................................................................... A-6
Lab 4 Synopsis: Using the VCS Simulator............................................................ A-18
Lab 5 Synopsis: Preparing Application Services................................................... A-24
Lab 6 Synopsis: Starting and Stopping VCS......................................................... A-29
Lab 7 Synopsis: Online Configuration of a Service Group.................................... A-31
Lab 8 Synopsis: Offline Configuration of a Service Group.................................... A-38
Lab 9 Synopsis: Creating a Parallel Service Group .............................................. A-47
Lab 10 Synopsis: Configuring Notification ............................................................ A-52
Lab 11 Synopsis: Configuring Resource Fault Behavior....................................... A-55
Lab 13 Synopsis: Testing Communication Failures .............................................. A-60
Lab 14 Synopsis: Configuring I/O Fencing............................................................ A-66
Table of Contents i
Copyright 2005 VERITAS Software Corporation. All rights reserved.
Lab 11 Solutions: Configuring Resource Fault Behavior .................................... C-133
Lab 13 Solutions: Testing Communication Failures............................................ C-149
Lab 14 Solutions: Configuring I/O Fencing ......................................................... C-163
Visually
Visuallyinspect
inspectthe
theclassroom
classroomlablabsite.
site.
Complete
Complete and validate the designworksheet.
and validate the design worksheet.
Use
Usethe
thelab
labappendix
appendixbest
bestsuited
suitedto
toyour
your
experience
experiencelevel:
level:
?? Appendix
AppendixA:
A:Lab
LabSynopses
Synopses
?? Appendix
AppendixB:
B:Lab
LabDetails
Details
?? Appendix
AppendixC:
C:Lab
LabSolutions
Solutions
train2
train1
System Definition Sample Value Your Value
System train1
System train2
See
Seethe
thenext
nextslide
slidefor
forlab
labassignments.
assignments.
A
Your system host name train1
your_sys
1 Verify that the Ethernet network interfaces for the two cluster interconnect
links are cabled together using crossover cables.
Note: In actual implementations, each link should use a completely separate
infrastructure (separate NIC and separate hub or switch). For simplicity of
configuration in the classroom environment, the two interfaces used for the
cluster interconnect are on the same NIC.
Four NodeUNIX
Software Share
192.168.XX.100
train6 train7
192.168.XX.106 Hub/Switch 192.168.XX.107
train5 Hub/Switch train8
192.168.XX.105 192.168.XX.108
train4 train9
192.168.XX.104 192.168.XX.109
train3 train10
Hub/Switch
Hub/Switch
Hub/Switch
Hub/Switch
192.168.XX.103 192.168.XX.110
SAN
train2 Disk train11
192.168.XX.102 Array 192.168.XX.111
train1 SAN train12
192.168.XX.101 Tape 192.168.XX.112
Library
LAN LAN
2 Verify that the public interface is cabled and accessible on the classroom public
network.
Virtual Academy
Skip this step.
1 Check the PATH environment variable. If necessary, add the /sbin, /usr/
sbin, /opt/VRTS/bin, and /opt/VRTSvcs/bin directories to your
A
PATH environment variable.
Verify that ssh configuration files are set up in order to install VCS on Linux or to
run remote commands without prompts for passwords.
If you do not configure ssh, you are required to type in the root passwords for all
systems for every remote command issued during the following services
preparation lab and the installation procedure.
If you do not want to use ssh with automatic login using saved passphrases on a
regular basis, run the following commands at the command line. This is in effect
only for this session.
exec /usr/bin/ssh-agent $SHELL
ssh-add
Save your passphrase during your GNOME session.
1 Open a console window so you can observe messages during later labs.
vcs1
Link 1:______
Link 1:______ Link 2:______
Link 2:______
Public:______ Public:______
train1 train2
4.x ## ./installer
./installer Software
4.x
location:_______________________________
Pre-4.0
Pre-4.0 ## ./installvcs
./installvcs
Subnet:_______
A
Node names, cluster train1 train2 vcs1 1 train1
name, and cluster ID train3 train4 vcs2 2 train2
train5 train6 vcs3 3 vcs1
train7 train8 vcs4 4 1
train9 train10 vcs5 5
train11 train12 vcs6 6
Cluster interconnect Ethernet interface for Solaris: qfe0
interconnect link #1 Sol Mob: dmfe0
AIX: en2
HP-UX lan1
Linux: eth1
VA: bge2
Ethernet interface for Solaris: qfe1
interconnect link #2 Sol Mob: dmfe1
AIX: en3
HP-UX lan2
Linux: eth2
VA: bge3
Public network Solaris: eri0
interface Sol Mob: dmfe0
interface AIX: en1
HP-UX lan0
Linux: eth1
VA: bge0
Installation software
location
install_dir
License
A
Installation software location:
_____________________________________________________________
2 This first step is to be performed from only one system in the cluster. The install
script installs and configures all systems in the cluster.
c If a license key is needed, obtain one from your instructor and record it
here.
License Key: ____________________________________________
3 If you did not install the Java GUI package as part of the installer (VPI)
process (or installvcs for earlier versions of VCS), install the VRTScscm
Java GUI package on each system in the cluster. The location of this package is
in the pkgs directory under the install location directory given to you by your
instructor.
_____________________________________________________________
2 Install any VCS patches or updates, as directed by your instructor. Use the
operating system-specific command.
3 Install any other software indicated by your instructor. For example, if your
classroom uses VCS 3.5, you may be directed to install VERITAS Volume
Manager and VERITAS File System.
You can use the worksheet at the end of this lab synopsis to verify and record your
cluster configuration.
A
1 Verify that VCS is now running using hastatus.
Verify GUI connectivity with the Java GUI and the Web GUI. Both GUIs can
connect to the cluster with the default user of admin and password as the default
password.
2 Start the Java GUI and connect to the cluster using these values:
First system:
A
/etc/VRTSvcs/comms/llttab Sample Value Your Value
set-node train1
(host name)
set-cluster 1
(number in host name of odd
system)
link Solaris: qfe0
Sol Mob: dfme0
AIX: en2
HP-UX lan1
Linux: eth2
VA: bge2
link Solaris: qfe1
Sol Mob: dmfe1
AIX: en3
HP-UX lan2
Linux: eth3
VA: bge3
A
UserNames admin=password
ClusterAddress 192.168.xx.91
Administrators admin
Optional Attributes
CounterInterval 5
A
VCSWeb webip
webip csgnic
A
Simulator
configuration
directory:
sim_config_dir
7 Add a cluster.
11 From the Simulator GUI, start the vcs_operations cluster, launch the VCS Java
Console for the vcs_operations simulated cluster, and log in as oper with
password oper.
Note: While you may use admin/password to log in, the point of using oper is to
demonstrate the differences in privileges between VCS user accounts.
A
2 Determine the status of all service groups.
3 Which service groups have service group operator privileges set for the oper
account?
4 Which resources in the AppSG service group have the Critical resource
attribute enabled?
6 Which immediate child resources does the Oracle resource in the OracleSG
service group depend on?
What happens?
What happens?
What happens?
4 Take all service groups that you have privileges for offline everywhere.
9 Bring all service groups that you have privileges for online on S3.
A
What happens to the OracleSG service group?
What happens?
What happens?
What happens?
9 Clear the fault on the Oracle resource in the OracleSG service group.
11 Save and close the configuration, log off from the GUI, and stop the simulator.
/bob1/loopy /sue1/loopy
bobDG1 sueDG1
/bob1 bobVol1
disk1 sueVol1 /sue1
disk2
Disk/Lun Disk/Lun
See
Seenext
nextslide
slidefor
forclassroom
classroomvalues.
values.
Lab Assignments
Use the design worksheet to gather and record the values needed to complete the
preparation steps.
A
Resource Name nameNIC1
Resource Type NIC
Required Attributes
Device Solaris: eri0
Sol Mob: dmfe0
AIX: en1
HP-UX: lan0
Linux: eth0
VA: bge0
NetworkHosts* 192.168.xx.1 (HP-UX
only)
Critical? No (0)
Enabled? Yes (1)
3 Create a mount points, mount the file system on your cluster system, and verify
that it is mounted.
1 Verify that an IP address exists on the base interface for the public network.
A script named loopy is used as the example application for this lab exercise.
1 Obtain the location of the loopy script from your instructor.
A
loopy script location:
__________________________________________________________
2 Copy this file to a file named loopy on the file system you created.
Complete the following steps to migrate the application to the other system.
1 Stop all resources used in this service to prepare to manually migrate the
service.
2 On the other cluster system, import your disk group and bring up the remaining
storage resources and the virtual IP address.
4 After you have verified that all resources are working properly on the second
system, stop all resources.
A
vcs1
train1 train2
## hastop
hastop all
all -force
-force
2 Open the cluster configuration and verify that the .stale file has been
created.
7 Return all systems to a running state (from one system in the cluster). View the
build process to see the LOCAL_BUILD and REMOTE_BUILD system
states.
A
Create a service
group.
Add resources to
the service group
from the bottom of
the dependency
tree.
Substitute the
name you used to
create the disk
group and volume.
Fill in the design worksheet with values appropriate for your cluster and use the
information to create a service group.
2 Save the cluster configuration and view the configuration file to verify your
changes.
Add NIC, IP, DiskGroup, Volume, and Process resources to the service group
using the information from the design worksheets.
A
After each resource is added:
Bring each resource online.
Save the cluster configuration.
System IP Address
train1 192.168.xx.51
train2 192.168.xx.52
train3 192.168.xx.53
train4 192.168.xx.54
train5 192.168.xx.55
train6 192.168.xx.56
train7 192.168.xx.57
train8 192.168.xx.58
train9 192.168.xx.59
train10 192.168.xx.60
train11 192.168.xx.61
train12 192.168.xx.62
A
Resource Type DiskGroup
Required Attributes
DiskGroup nameDG1
Optional Attributes
StartVolumes 1
StopVolumes 1
Critical? No (0)
Enabled? Yes (1)
After you have verified that all resources are online, link the resources as shown in
worksheet.
A
Resource Dependency Definition
Service Group nameSG1
Parent Resource Requires Child Resource
nameVol1 nameDG1
nameMount1 nameVol1
nameIP1 nameNIC1
nameProcess1 nameMount1
nameProcess1 nameIP1
3 Save the cluster configuration and view the configuration file to verify your
changes.
4 Close the cluster configuration after all students working in your cluster are
finished.
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
App
DG1 Working DG2
Workingtogether,
together,follow
followthe
theoffline
offline DG
configuration
configurationprocedure.
procedure.
Alternately,
Alternately,work
workalone
aloneand
anduse
usethe
the
GUI
GUIto
tocreate
createaanew
newservice
servicegroup.
group.
Lab Assignments
Complete the following worksheet for the resources managed by the service
groups you create in this lab. Then follow the procedure to configure the resources.
A
Partner system host name Use the same system as
their_sys previous labs
Name prefix for your name
objects
3 Create a mount points, mount the file system on your cluster system, and verify
it is mounted.
5 Start the loopy and verify that the application is working correctly.
6 Stop the resources to prepare to place them under VCS control in the next
section of the lab.
In the design worksheet, record information needed to create a new service group
using the offline process described in the next section.
A
Service Group Definition Sample Value Your Value
Group nameSG2
Required Attributes
FailOverPolicy Priority
SystemList train1=0 train2=1
Optional Attributes
AutoStartList train1
System IP Address
train1 192.168.xx.71
train2 192.168.xx.72
train3 192.168.xx.73
train4 192.168.xx.74
train5 192.168.xx.75
train6 192.168.xx.76
train7 192.168.xx.77
train8 192.168.xx.78
train9 192.168.xx.79
train10 192.168.xx.80
train11 192.168.xx.81
train12 192.168.xx.82
A
Resource Type DiskGroup
Required Attributes
DiskGroup nameDG2
Optional Attributes
StartVolumes 1
StopVolumes 1
Critical? No (0)
Enabled? Yes (1)
A
nameVol2 nameDG2
nameMount2 nameVol2
nameIP2 nameNIC2
nameProcess2 nameMount2
nameProcess2 nameIP2
1 Working with your lab partner, verify that the cluster configuration is saved
and closed.
3 Create copies of the main.cf and types.cf files in the test subdirectory.
Linux
Also copy the vcsApacheTypes.cf file.
4 One student at a time, modify the main.cf file in the test directory on one
system in the cluster.
5 Edit the attributes of each copied resource to match the design worksheet
values shown earlier in this section.
7 Stop VCS on all systems, but leave the applications still running.
8 Copy the main.cf file from the test subdirectory into the configuration
directory.
9 Start the cluster from the system where you edited the configuration file and
start the other system in the stale state.
10 Bring the new service group online on your system. Students can bring their
own service groups online.
A
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
DB
DG1 Network Network DG2
DG
NIC Phantom NetworkSG
NetworkSG
Work with your lab partner to create a parallel service group containing network
resources using the information in the design worksheet.
Use the values in the following tables to create NIC and Phantom resources and
then bring them online. Remember to save the cluster configuration.
A
Resource Definition Sample Value Your Value
Service Group NetworkSG
Resource Name NetworkNIC
Resource Type NIC
Required Attributes
Device Solaris: eri0
Sol Mob: dmfe0
AIX: en1
HP-UX: lan0
Linux: eth0
VA: bge0
Critical? No (0)
Enabled? Yes (1)
Critical? No (0)
Enabled? Yes (1)
Working on your own, use the values in the tables to replace the NIC resources
with Proxy resources and create new links.
1 Use the values in the tables to replace the NIC resources with Proxy resources
and create new links.
A
2 Switch each service group (nameSG1, nameSG2, ClusterService) to ensure
that they can run on each system.
nameSG1 nameSG2
ClusterService
NotifierMngr
Optional Lab
resfault
resfault
Triggers
Triggers nofailover
nofailover SMTP
SMTPServer:
Server:
resadminwait
resadminwait
___________________________________
___________________________________
1 Work with your lab partner to add a NotifierMngr type resource to the
A
ClusterService service group using the information in the design worksheet.
2 Bring the resource online and test the service group by switching it between
systems.
4 Save and close the cluster configuration and view the configuration file to
verify your changes.
Note: In the next lab, you will see the effects of configuring notification and
triggers when you test various resource fault scenarios.
Use the following procedure to configure triggers for notification. In this lab, each
student creates a local copy of the trigger script on their own system. If you are
working alone in the cluster, copy your completed triggers to the other system.
#!/bin/sh
echo `date` > /tmp/resfault.msg
echo message from the resfault trigger >> /tmp/
resfault.msg
echo Resource $2 has faulted on System $1 >> /tmp/
resfault.msg
echo Please check the problem. >> /tmp/resfault.msg
/usr/lib/sendmail root </tmp/resfault.msg
rm /tmp/resfault.msg
5 If you are working alone, copy all triggers to the other system.
A
Critical=1
FaultPropagation=0
nameSG1 FaultPropagation=1
nameSG2
ManageFaults=NONE
ManageFaults=ALL
RestartLimit=1
Note:
Note:Network
Networkinterfaces
interfacesfor
forvirtual
virtualIP
IPaddresses
addresses
are
areunconfigured
unconfiguredtotoforce
forcethe
theIP
IPresource
resourcetotofault.
fault.
In
Inyour
yourclassroom,
classroom,the
theinterface
interfaceyou
youspecify
specifyis:______
is:______
Replace
Replacethe
thevariable
variableinterface
interfacein
inthe
thelab
labsteps
stepswith
withthis
this
value.
value.
This part of the lab exercise explores the default behavior of VCS. Each student
works independently in this lab.
1 Verify that all resources in the nameSG1 service group are currently set to
critical; if not, set them to critical.
2 Set the IP and Process resources to not critical in the nameSG1 service group.
4 Verify that your nameSG1 service group is currently online on your system.
What happens?
8 Set the IP and process resource to critical in the nameSG1 service group.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Verify that your nameSG1 service group is currently online on your system.
What happens?
4 Without clearing faults from the last failover, unconfigure the virtual IP
address on their system.
What happens?
5 Clear the nameIP1 resource on all systems and bring the nameSG1 service
group online on your system.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
A
2 Verify that your nameSG1 Service group is currently online on your system.
What happens?
What happens?
7 Did unfreezing the service group cause a failover or any resources to come
offline? Explain why or why not.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set the FaultPropagation attribute for the nameSG1 service group to off (0).
What happens?
4 Clear the faulted resource and bring the resource back online.
5 Set the ManageFaults attribute for the nameSG1 service group to NONE and
set the FaultPropagation attribute back to one (1).
What happens?
What happens?
9 Recover the resource from the ADMIN_WAIT state by faulting the service
group.
10 Clear the faulted nameIP1 resource and switch the nameSG1 service group
back to your system.
11 Set ManageFaults back to ALL for the nameSG1 service group and save the
cluster configuration.
This section illustrates failover behavior of a resource type using restart limits.
1 Verify that all resources in the nameSG1 service group are set to critical.
A
2 Set the RestartLimit Attribute for the Process resource type to 1.
3 Stop the loopy process running in the nameSG1 service group by sending a
kill signal.
What happens?
4 Stop the loopy process running in the nameSG1 service group by sending a
kill signal.
What happens?
5 Clear the faulted resource and switch the nameSG1 service group back to your
system.
6 When all students have completed the lab, save and close the configuration.
trainxx
trainxx
O trainxx
trainxx
Optional Lab
Trigger
Trigger injeopardy
injeopardy
Use the following procedure to configure triggers for jeopardy notification. In this
lab, students create a local copy of the trigger script on their own systems.
A
1 Create a text file in the /opt/VRTSvcs/bin/triggers directory named
injeopardy. Add the following lines to the file:
#!/bin/sh
echo `date` > /tmp/injeopardy.msg
echo message from the injeopardy trigger >> /tmp/
injeopardy.msg
echo System $1 is in Jeopardy >> /tmp/injeopardy.msg
echo Please check the problem. >> /tmp/injeopardy.msg
/usr/lib/sendmail root </tmp/injeopardy.msg
rm /tmp/injeopardy.msg
3 If you are working alone, copy the trigger to the other system.
4 Continue with the next lab sections. The Multiple LLT Link Failures
Jeopardy section of this lab shows the effects of configuring the InJeopardy
trigger.
Working with your lab partner, use the procedures to create a low-priority link and
then fault communication links and observe what occurs in a cluster environment
when fencing is not configured.
2 Shut down VCS, leaving the applications running on all systems in the cluster.
Solaris Mobile
Skip this step for mobile classrooms. There is only one public interface and it
A
is already configured as a low-priority link.
_____________________________________________________________
Notes:
Use lltlink_enable to restore the LLT link.
The utilities prompt you to select an interface.
These classroom utilities are provided to enable you to simulate
disconnecting and reconnecting Ethernet cables without risk of damaging
connectors.
Run the utility from one system only, unless otherwise specified.
2 Remove all but one LLT link and watch for the link to expire in the console.
Solaris Mobile
Remove only the one high-priority LLT link (dmfe1).
2 Remove all but one LLT link and watch for the link to expire in the console.
Solaris Mobile
Remove only the one high-priority LLT link (dmfe1).
3 From each system, verify that the links are down by checking the status of
GAB.
4 Remove the last LLT link and watch for the link to expire in the console.
A
7 Change the NIC resource type MonitorInterval attribute back to 60 seconds.
trainxx trainxx
Disk 1:___________________
Disk 3:___________________
nameDG1, nameDG2
Visually
Visuallyinspect
inspectthe
theclassroom
classroomlablabsite.
site.
Complete
Complete and validate the designworksheet.
and validate the design worksheet.
Use
Usethe
thelab
labappendix
appendixbest
bestsuited
suitedto
toyour
your
experience
experiencelevel:
level:
?? Appendix
AppendixA:
A:Lab
LabSynopses
Synopses
?? Appendix
AppendixB:
B:Lab
LabDetails
Details
?? Appendix
AppendixC:
C:Lab
LabSolutions
Solutions
train2
train1
System Definition Sample Value Your Value
System train1
System train2
See
Seethe
thenext
nextslide
slidefor
forlab
labassignments.
assignments.
In this lab, you work with your partner to prepare the systems for installing VCS.
Brief instructions for this lab are located on the following page:
Lab 2 Synopsis: Validating Site Preparation, page A-2
Solutions for this exercise are located on the following page:
Lab 2 Solutions: Validating Site Preparation, page C-3
Lab Assignments
Fill in the table with the applicable values for your lab cluster.
B
AIX: en1
HP-UX lan0
Linux: eth1
VA bge0
Admin IP address for 192.168.xx.xxx
your_sys
1 Verify that the Ethernet network interfaces for the two cluster interconnect
links are cabled together using crossover cables.
Note: In actual implementations, each link should use a completely separate
infrastructure (separate NIC and separate hub or switch). For simplicity of
configuration in the classroom environment, the two interfaces used for the
cluster interconnect are on the same NIC.
Four NodeUNIX
Software Share
192.168.XX.100
train6 train7
192.168.XX.106 Hub/Switch 192.168.XX.107
train5 Hub/Switch train8
192.168.XX.105 192.168.XX.108
train4 train9
192.168.XX.104 192.168.XX.109
train3 train10
Hub/Switch
Hub/Switch
Hub/Switch
Hub/Switch
192.168.XX.103 192.168.XX.110
SAN
train2 Disk train11
192.168.XX.102 Array 192.168.XX.111
train1 SAN train12
192.168.XX.101 Tape 192.168.XX.112
Library
LAN LAN
4 Determine the base IP address configured on the public network interface for
both your system and your partners system.
5 Verify that the public IP address of each system in your cluster is listed in the
/etc/hosts file.
Other Checks
1 Check the PATH environment variable. If necessary, add the /sbin, /usr/
sbin, /opt/VRTS/bin, and /opt/VRTSvcs/bin directories to your
PATH environment variable.
B
2 Check the VERITAS licenses to determine whether a VERITAS Cluster Server
license is installed.
Verify that ssh configuration files are set up in order to install VCS on Linux or to
run remote commands without prompts for passwords.
If you do not configure ssh, you are required to type in the root passwords for all
systems for every remote command issued during the following services
preparation lab and the installation procedure.
To configure ssh:
1 Log on to your system.
2 Generate a DSA key pair on this system by running the following command:
ssh-keygen -t dsa
b Ensure that you copy the line to the other systems in your cluster.
B
rpm -q openssh-askpass-gnome
2 If you do not have a $HOME/.Xclients file (you should not have one after
installation), run switchdesk to create it. In your $HOME/.Xclients file,
edit the following:
exec $HOME/.Xclients-default
a Click the Startup Programs Tab and Add and enter /usr/bin/ssh-add
in the Startup Command text area.
b Set the priority to a number higher than any existing commands to ensure
that it is executed last. A good priority number for ssh-add is 70 or
higher. The higher the priority number, the lower the priority. If you have
other programs listed, this one should have the lowest priority.
c Click OK to save your settings, and exit the GNOME Control Center.
4 Log out and then log back into GNOME; in other words, restart X.
1 Open a console window so you can observe messages during later labs.
vcs1
Link 1:______
Link 1:______ Link 2:______
Link 2:______
Public:______ Public:______
train1 train2
4.x ## ./installer
./installer Software
4.x
location:_______________________________
Pre-4.0
Pre-4.0 ## ./installvcs
./installvcs
Subnet:_______
In this lab, you work with your lab partner to install VCS on both systems.
Brief instructions for this lab are located on the following page:
Lab 3 Synopsis: Installing VCS, page A-6
Solutions for this exercise are located on the following page:
Lab 3 Solutions: Installing VCS, page C-13
B
train11 train12 vcs6 6
Cluster interconnect Ethernet interface for Solaris: qfe0
interconnect link #1 Sol Mob: dmfe0
AIX: en2
HP-UX lan1
Linux: eth1
VA: bge2
Ethernet interface for Solaris: qfe1
interconnect link #2 Sol Mob: dmfe1
AIX: en3
HP-UX lan2
Linux: eth2
VA: bge3
Public network Solaris: eri0
interface Sol Mob: dmfe0
interface AIX: en1
HP-UX lan0
Linux: eth0
VA: bge0
Installation software
location
install_dir
License
____________________________________________________________
B
2 This step is to be performed from only one system in the cluster. The install
script installs and configures all systems in the cluster.
Notes:
For VCS 4.x, install Storage Foundation HA (which includes VCS,
Volume Manager, and File System).
Use the information in the previous table or design worksheet to
respond to the installation prompts.
Sample prompts and input are provided at the end of the lab solution in
Appendix C.
For versions of VCS before 4.0, use installvcs.
c If a license key is needed, obtain one from your instructor and record it
here.
License Key: _________________________________
3 If you did not install the Java GUI package as part of the installer (CPI)
process (or installvcs for earlier versions of VCS), install the VRTScscm
Java GUI package on each system in the cluster. The location of this package is
in the pkgs directory under the install location directory given to you by your
instructor.
B
_______________________________________
install_dir
2 Install any VCS patches or updates, as directed by your instructor. Use the
operating system-specific command, as shown in the following examples.
Solaris
pkgadd -d /install_dir/pkgs VRTSxxxx
HP
swinstall -s /install_dir/pkgs VRTSxxxx
AIX
installp -a -d /install_dir/pkgs/VRTSxxxx.rte.bff
VRTSxxxx.rte
Linux
rpm -ihv VRTSxxxx-x.x.xx.xx-GA_RHEL.i686.rpm
3 Install any other software indicated by your instructor. For example, if your
classroom uses VCS 3.5, you may be directed to install VERITAS Volume
Manager and VERITAS File System.
a Verify that the cluster ID, system names, and network interfaces specified
during install are present in the /etc/llttab file.
B
b Verify the system names in the /etc/llthosts file.
Verify that the number of systems in the cluster matches the value for the
-n flag set in the /etc/gabtab file.
Verify the cluster name, system names, and IP address for the Cluster Manager
in the /etc/VRTSvcs/conf/config/main.cf file.
Verify GUI connectivity with the Java GUI and the Web GUI. Both GUIs can
connect to the cluster with the default user of admin and password as the default
password.
2 Start the Java GUI and connect to the cluster using these values:
Cluster alias: nameCluster
Host name: ip_address (used during installation)
Failover retries: 12 (retain default)
This lab uses the VERITAS Cluster Server Simulator and the Cluster Manager
Java Console. You are provided with a preconfigured main.cf file to learn about
managing the cluster.
Brief instructions for this lab are located on the following page:
Lab 4 Synopsis: Using the VCS Simulator, page A-18
Solutions for this exercise are located on the following page:
Lab 4 Solutions: Using the VCS Simulator, page C-35
Local Simulator
config directory
sim_config_dir
4 Add a cluster.
___________________________________________
cf_files_dir
9 Launch the VCS Java Console for the vcs_operations simulated cluster.
Note: While you may use admin/password to log in, the point of using oper is to
demonstrate the differences in privileges between VCS user accounts.
3 Which service groups have service group operator privileges set for the oper
account?
4 Which resources in the AppSG service group have the Critical resource
attribute enabled?
6 Which immediate child resources does the Oracle resource in the OracleSG
service group depend on?
What happens?
What happens?
B
3 Attempt to take the Oracle service group offline on S1.
What happens?
4 Take all service groups that you have privileges for offline everywhere.
9 Bring all service groups that you have privileges for online on S3.
What happens?
What happens?
What happens?
9 Clear the fault on the Oracle resource in the OracleSG service group.
/bob1/loopy /sue1/loopy
bobDG1 sueDG1
/bob1 bobVol1
disk1 sueVol1 /sue1
disk2
Disk/Lun Disk/Lun
See
Seenext
nextslide
slidefor
forclassroom
classroomvalues.
values.
The purpose of this lab is to prepare the loopy process service for high availability.
Brief instructions for this lab are located on the following page:
Lab 5 Synopsis: Preparing Application Services, page A-24
Solutions for this exercise are located on the following page:
Lab 5 Solutions: Preparing Application Services, page C-51
Lab Assignments
Fill in the table with the applicable values for your lab cluster.
B
Linux: eth0
VA bge0
IP Address train1 192.168.xxx.51
ipaddress train2 192.168.xxx.52
train3 192.168.xxx.53
train4 192.168.xxx.54
train5 192.168.xxx.55
train6 192.168.xxx.56
train7 192.168.xxx.57
train8 192.168.xxx.58
train9 192.168.xxx.59
train10 192.168.xxx.60
train11 192.168.xxx.61
train12 192.168.xxx.62
Application script location
class_sw_dir
3 Initialize a disk for Volume Manager using the disk device from the worksheet.
4 Create a disk group with the name from the worksheet using the initialized
disk.
Complete the following steps to set up a virtual IP address for the application.
1 Verify that an IP address exists on the base interface for the public network.
B
3 Verify that the virtual IP address is configured.
A script named loopy is used as the example application for this lab exercise.
__________________________________________________________
2 Copy or type this code to a file named loopy on the file system you created
previously in this lab.
3 Verify that you have a console window open to see the display from the script.
Complete the following steps to migrate the application to the other system.
1 Stop your loopy process by sending a kill signal. Verify that the process is
stopped.
2 Remove the virtual IP address configured earlier in this lab. Verify that the IP
B
address is no longer configured.
10 Verify that your mount point directory exists. Create it if it does not exist.
Complete the following steps to bring the application offline on the other system
so that it is ready to be placed under VCS control.
1 While still logged into the other system, stop your loopy process by sending a
kill signal. Verify that the process is stopped.
2 Remove the virtual IP address configured earlier in this lab. Verify that the IP
address is no longer configured.
vcs1
train1 train2
## hastop
hastop all
all -force
-force
The following procedure demonstrate how the cluster configuration changes states
during startup and shutdown, and shows how the .stale file works.
Brief instructions for this lab are located on the following page:
Lab 6 Synopsis: Starting and Stopping VCS, page A-29
Solutions for this exercise are located on the following page:
Lab 6 Solutions: Starting and Stopping VCS, page C-63
Note: Complete this section with your lab partner.
4 Verify that the .stale file has been created in the directory,
/etc/VRTSvcs/conf/config.
B
10 Verify that the .stale file is present in the /etc/VRTSvcs/conf/config
directory. This file should exist.
11 Return all systems to a running state (from one system in the cluster).
12 Watch the console during the build process to see the LOCAL_BUILD and
REMOTE_BUILD system states.
The purpose of this lab is to create a service group while VCS is running using
either the Cluster Manager graphical user interface or the command-line interface.
Brief instructions for this lab are located on the following page:
Lab 7 Synopsis: Online Configuration of a Service Group, page A-31
Solutions for this exercise are located on the following page:
Lab 7 Solutions: Online Configuration of a Service Group, page C-67
Classroom-Specific Values
Fill in this table with the applicable values for your lab cluster.
Fill in the design worksheet with values appropriate for your cluster and use the
information to create a service group.
B
FailOverPolicy Priority
SystemList train1=0 train2=1
Optional Attributes
AutoStartList train1
1 If you are using the GUI, start Cluster Manager and log in to the cluster.
4 Modify the SystemList to allow the service group to run on the two systems
specified in the design worksheet.
5 Modify the AutoStartList attribute to allow the service group to start on your
system.
6 Verify that the service group can autostart and that it is a failover service group.
7 Save the cluster configuration and view the configuration file to verify your
changes.
Complete the following steps to add NIC, IP, DiskGroup, Volume, and Process
resources to the service group using the information from the design worksheet.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
6 Save the cluster configuration and view the configuration file to verify your
changes.
System IP Address
train1 192.168.xx.51
train2 192.168.xx.52
train3 192.168.xx.53
train4 192.168.xx.54
train5 192.168.xx.55
train6 192.168.xx.56
train7 192.168.xx.57
train8 192.168.xx.58
train9 192.168.xx.59
train10 192.168.xx.60
train11 192.168.xx.61
train12 192.168.xx.62
3 Set the required attributes for this resource, and any optional attributes, if
needed.
B
5 Bring the resource online on your system.
7 Save the cluster configuration and view the configuration file to verify your
changes.
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
6 Verify that the resource is online in VCS and at the O/S level.
7 Save the cluster configuration and view the configuration file to verify your
changes.
B
Volume nameVol1
DiskGroup nameDG1
Critical? No (0)
Enabled? Yes (1)
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
6 Verify that the resource is online in VCS and at the operating system level.
7 Save the cluster configuration and view the configuration file to verify your
changes.
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
6 Verify that the resource is online in VCS and at the operating system level.
7 Save the cluster configuration and view the configuration file to verify your
changes.
B
PathName /bin/sh
Optional Attributes
Arguments /name1/loopy name 1
Critical? No (0)
Enabled? Yes (1)
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
5 Ensure that you have the console or a terminal window open for loopy output.
7 Verify that the resource is online in VCS and at the operating system level.
8 Save the cluster configuration and view the configuration file to verify your
changes.
nameMount1 nameVol1
nameIP1 nameNIC1
nameProcess1 nameMount1
nameProcess1 nameIP1
3 Save the cluster configuration and view the configuration file to verify your
changes.
Complete the following steps to test the service group on each system in the
service group SystemList.
1 Test the service group by switching away from your system in the cluster.
2 Verify that the service group came online properly on their system.
B
3 Test the service group by switching it back to your system in the cluster.
4 Verify that the service group came online properly on your system.
2 Save the cluster configuration and view the configuration file to verify your
changes.
3 Close the cluster configuration after all students working in your cluster are
finished.
group nameSG1 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
B
DiskGroup nameDG1 (
DiskGroup = nameDG1
)
IP nameIP1 (
Device = eri0
Address = "192.168.27.51"
)
Mount nameMount1 (
MountPoint = "/name1"
BlockDevice = "/dev/vx/dsk/nameDG1/
nameVol1"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess1 (
PathName = "/bin/sh"
Arguments = "/name1/loopy name 1"
)
NIC nameNIC1 (
Device = eri0
)
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
App
DG1 Working DG2
Workingtogether,
together,follow
followthe
theoffline
offline DG
configuration
configurationprocedure.
procedure.
Alternately,
Alternately,work
workalone
aloneand
anduse
usethe
the
GUI
GUIto
tocreate
createaanew
newservice
servicegroup.
group.
The purpose of this lab is to add a service group by copying and editing the
definition in main.cf for nameSG1.
Brief instructions for this lab are located on the following page:
Lab 8 Synopsis: Offline Configuration of a Service Group, page A-38
Solutions for this exercise are located on the following page:
Lab 8 Solutions: Offline Configuration of a Service Group, page C-89
Lab Assignments
Complete the following worksheet for the resources managed by the service
groups you create in this lab. Then follow the procedure to configure the resources.
B
class_sw_dir
2 Initialize a disk for Volume Manager using the disk device from the worksheet.
3 Create a disk group with the name from the worksheet using the initialized
disk.
9 Copy the loopy script to your file system created in this lab.
a Stop the loopy process by sending a kill signal. Verify that the process is
stopped.
B
d Deport your disk group and verify that it is deported.
In the design worksheet, record information needed to create a new service group
using the offline process described in the next section.
B
AIX: en1
HP-UX: lan0
Linux: eth0
VA bge0
Address 192.168.xx.** see table
Optional Attributes
Netmask 255.255.255.0
Critical? No (0)
Enabled? Yes (1)
System IP Address
train1 192.168.xx.71
train2 192.168.xx.72
train3 192.168.xx.73
train4 192.168.xx.74
train5 192.168.xx.75
train6 192.168.xx.76
train7 192.168.xx.77
train8 192.168.xx.78
train9 192.168.xx.79
train10 192.168.xx.80
train11 192.168.xx.81
train12 192.168.xx.82
B
nameVol2 (no spaces)
FSType vxfs
FsckOpt -y
Critical? No (0)
Enabled? Yes (1)
nameMount2 nameVol2
nameIP2 nameNIC2
nameProcess2 nameMount2
nameProcess2 nameIP2
Note: You may choose to use the GUI to create the nameSG2 service group. If so,
skip this section and complete the Alternate Lab section instead.
1 Working with your lab partner, verify that the cluster configuration is saved
and closed.
B
2 Change to the VCS configuration directory.
4 Copy the main.cf and types.cf files into the test subdirectory.
Linux
Also copy the vcsApacheTypes.cf file.
6 Edit the main.cf file in the test directory on one system in the cluster.
a For each students service group, copy the nameSG1 service group
structure to a nameSG2.
b Rename all of the resources within the nameSG1 service group to end with
2 instead of 1, as shown in the following table.
9 Stop VCS on all systems, but leave the applications still running.
11 Copy the main.cf file from the test subdirectory into the configuration
directory.
12 Start the cluster from the system where you edited the configuration file.
13 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
16 Bring the new service group online on your system. Students can bring their
own service groups online.
Use the information in the design worksheet in the previous section to create a new
service group, using the GUI to copy resources from the nameSG1 service group.
B
3 Create the service group.
4 Modify the SystemList to allow the service group to run on the two systems
specified in the design worksheet.
5 Modify the AutoStartList attribute to allow the service group to start on your
system.
6 Verify that the service group can autostart and that it is a failover service group.
7 Save the cluster configuration and view the configuration file to verify your
changes.
Note: When you paste a copied resource or resource tree, the Name Clashes
window is displayed, which enables you to rename each resource you are
pasting.
10 Modify each resource to set the attribute values as specified in the worksheet.
11 Save the cluster configuration and view the configuration file to verify your
changes.
13 Bring the nameSG2 resources online, starting from the bottom of the
dependency tree.
Note: In the GUI, the Close configuration action saves the configuration
automatically.
group nameSG2 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG2 (
B
DiskGroup = nameDG2
)
IP nameIP2 (
Device = eri0
Address = "192.168.27.71"
)
Mount nameMount2 (
MountPoint = "/name2"
BlockDevice = "/dev/vx/dsk/nameDG2/
nameVol2"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess2 (
PathName = "/bin/sh"
Arguments = "/name2/loopy name 2"
)
NIC nameNIC2 (
Device = eri0
)
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
DB
DG1 Network Network DG2
DG
NIC Phantom NetworkSG
NetworkSG
The purpose of this lab is to add a parallel service group to monitor the NIC
resource and replace the NIC resources in the failover service groups with Proxy
resources.
Brief instructions for this lab are located on the following page:
Lab 9 Synopsis: Creating a Parallel Service Group, page A-47
Solutions for this exercise are located on the following page:
Lab 9 Solutions: Creating a Parallel Service Group, page C-109
Work with your lab partner to create a parallel service group containing network
resources using the information in the design worksheet.
3 Modify the SystemList to allow the service group to run on the systems
specified in the design worksheet.
4 Modify the AutoStartList attribute to allow the service group to start on both
B
systems.
5 Modify the Parallel attribute to allow the service group to run on both systems.
Use the values in the following tables to create NIC and Phantom resources.
Critical? No (0)
Enabled? Yes (1)
3 Set the required attributes for this resource, and any optional attributes, if
needed.
5 Verify that the resource is online. Because it is a persistent resource, you do not
need to bring it online.
B
8 Enable the resource.
9 Verify that the status of the NetworkSG service group now shows as online.
Use the values in the tables to replace the NIC resources with Proxy resources and
create new links.
Note: Only one student can delete the ClusterService NIC resource.
2 Add a proxy resource to each failover service group using the service group
naming convention:
nameProxy1
nameProxy2
csgProxy
B
3 Set the value for each Proxy TargetResName attribute to NetworkNIC.
Use the values in the tables to replace the NIC resources with Proxy resources and
create new links.
include "types.cf"
cluster vcs (
UserNames = { admin = ElmElgLimHmmKumGlj }
ClusterAddress = "192.168.27.51"
Administrators = { admin }
B
CounterInterval = 5
)
system train1 (
)
system train2 (
)
group ClusterService (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1, train2 }
OnlineRetryLimit = 3
Tag = CSG
)
IP webip (
Device = eri0
Address = "192.168.27.42"
NetMask = "255.255.255.0"
)
Proxy csgProxy (
TargetResName = NetworkNIC
)
VRTSWebApp VCSweb (
group NetworkSG (
SystemList = { train1 = 0, train2 = 1 }
Parallel = 1
AutoStartList = ( train1, train2 }
)
NIC NetworkNIC (
Device = eri0
)
Phantom NetworkPhantom (
)
group nameSG1 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG1 (
DiskGroup = nameDG1
)
IP nameIP1 (
Device = eri0
Address = "192.168.27.51"
)
B
Process nameProcess1 (
PathName = "/bin/sh"
Arguments = "/name1/loopy name 1"
)
Proxy nameProxy1 (
TargetResName = NetworkNIC
)
Volume nameVol1 (
Volume = nameVol1
DiskGroup = nameDG1
)
group nameSG2 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG2 (
IP nameIP2 (
Device = eri0
Address = "192.168.27.71"
)
Mount nameMount2 (
MountPoint = "/name2"
BlockDevice = "/dev/vx/dsk/nameDG2/
nameVol2"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess2 (
PathName = "/bin/sh"
Arguments = "/name2/loopy name 2"
)
Proxy nameProxy2 (
TargetResName = NetworkNIC
)
Volume nameVol2 (
Volume = nameVol2
DiskGroup = nameDG2
)
nameSG1 nameSG2
ClusterService
NotifierMngr
Optional Lab
resfault
resfault
Triggers
Triggers nofailover
nofailover SMTP
SMTPServer:
Server:
resadminwait
resadminwait
___________________________________
___________________________________
Work with your lab partner to add a NotifierMngr type resource to the
ClusterService service group using the information in the design worksheet.
B
Resource Type NotifierMngr
Required Attributes
SmtpServer localhost
SmtpRecipients root Warning
PathName /xxx/xxx (AIX only)
Critical? No (0)
Enabled? Yes (1)
4 Set the required attributes for this resource and any optional attributes, if
needed.
7 Bring the resource online on the system running the ClusterService service
group.
1 Test the service group by switching it to the other system in the cluster.
2 Verify that the service group came online properly on the other system.
3 Test the service group by switching it back to the original system in the cluster.
B
4 Verify that the service group came online properly on the original system.
6 Save and close the cluster configuration and view the configuration file to
verify your changes.
Note: In the next lab, you will see the effects of configuring notification and
triggers when you test various resource fault scenarios.
Use the following procedure to configure triggers for notification. In this lab, each
student creates a local copy of the trigger script on their own system. If you are
working alone in the cluster, copy your completed triggers to the other system.
1 Create a text file in the /opt/VRTSvcs/bin/triggers directory named
resfault. Add the following lines to the file:
#!/bin/sh
echo `date` > /tmp/resfault.msg
echo message from the resfault trigger >> /tmp/
resfault.msg
echo Resource $2 has faulted on System $1 >> /tmp/
resfault.msg
echo Please check the problem. >> /tmp/resfault.msg
/usr/lib/sendmail root </tmp/resfault.msg
rm /tmp/resfault.msg
#!/bin/sh
echo `date` > /tmp/nofailover.msg
echo message from the nofailover trigger >> /tmp/
nofailover.msg
echo no failover for service group $2 >> /tmp/
nofailover.msg
echo Please check the problem. >> /tmp/nofailover.msg
/usr/lib/sendmail root </tmp/nofailover.msg
rm /tmp/nofailover.msg
#!/bin/sh
echo `date` > /tmp/resadminwait.msg
echo message from the resadminwait trigger >> /tmp/
resadminwait.msg
echo Resource $2 on System $1 is in adminwait for
Reason $3 >> /tmp/resadminwait.msg
echo Please check the problem. >> /tmp/resadminwait.msg
B
/usr/lib/sendmail root </tmp/resadminwait.msg
rm /tmp/resadminwait.msg
5 If you are working alone, copy all triggers to the other system.
Note:
Note:Network
Networkinterfaces
interfacesfor
forvirtual
virtualIP
IPaddresses
addresses
are
areunconfigured
unconfiguredtotoforce
forcethe
theIP
IPresource
resourcetotofault.
fault.
In
Inyour
yourclassroom,
classroom,the
theinterface
interfaceyou
youspecify
specifyis:______
is:______
Replace
Replacethe
thevariable
variableinterface
interfacein
inthe
thelab
labsteps
stepswith
withthis
this
value.
value.
The purpose of this lab is to observe how VCS responds to faults in a variety of
scenarios.
Brief instructions for this lab are located on the following page:
Lab 11 Synopsis: Configuring Resource Fault Behavior, page A-55
Solutions for this exercise are located on the following page:
Lab 11 Solutions: Configuring Resource Fault Behavior, page C-133
This part of the lab exercise explores the default behavior of VCS. Each student
works independently in this lab.
2 Verify that all resources in the nameSG1 service group are currently set to
B
critical; if not, set them to critical.
3 Set the IP and Process resources to not critical in the nameSG1 service group.
4 Change the monitor interval for the IP resource type to 10 seconds and the
offline monitor interval for the IP resource type to 30 seconds.
6 Verify that your nameSG1 service group is currently online on your system. If
it is not, bring it online or switch it to your system.
10 Set the IP and process resource to critical in the nameSG1 service group.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
3 Verify that your nameSG1 service group is currently online on your system. If
it is not online locally, bring it online or switch it to your system.
5 Without clearing faults from the last failover, unconfigure the virtual IP
address on their system.
6 Clear the nameIP1 resource on all systems and bring the nameSG1 service
group online on your system.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
B
3 Verify that your nameSG1 Service group is currently online on your system. If
it is not online locally, bring it online or switch it to your system.
8 Did unfreezing the service group cause a failover or any resources to come
offline? Explain why or why not.
This section illustrates service group failover behavior using the ManageFaults
and FaultPropagation attributes.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
3 Set the FaultPropagation attribute for the nameSG1 service group to off (0).
5 Clear the faulted resource and bring the resource back online.
6 Set the ManageFaults attribute for the nameSG1 service group to NONE and
set the FaultPropagation attribute back to one (1).
10 Recover the resource from the ADMIN_WAIT state by faulting the service
B
group.
11 Clear the faulted nameIP1 resource and switch the nameSG1 service group
back to your system.
12 Set ManageFaults back to ALL for the nameSG1 service group and save the
cluster configuration.
This section illustrates failover behavior of a resource type using restart limits.
1 Verify that all resources in the nameSG1 service group are set to critical.
4 Stop the loopy process running in the nameSG1 service group by sending a
kill signal.
5 Stop the loopy process running in the nameSG1 service group by sending a
kill signal.
6 Clear the faulted resource and switch the nameSG1 service group back to your
system.
7 When all students have completed the lab, save and close the configuration.
trainxx
trainxx
O trainxx
trainxx
Optional Lab
Trigger
Trigger injeopardy
injeopardy
The purpose of this lab is to configure a low-priority link and then pull network
cables and observe how VCS responds.
Brief instructions for this lab are located on the following page:
Lab 13 Synopsis: Testing Communication Failures, page A-60
Solutions for this exercise are located on the following page:
Lab 13 Solutions: Testing Communication Failures, page C-149
Use the following procedure to configure triggers for jeopardy notification. In this
lab, students create a local copy of the trigger script on their own systems. If you
are working alone in the cluster, copy your completed triggers to the other system.
B
#!/bin/sh
echo `date` > /tmp/injeopardy.msg
echo message from the injeopardy trigger >> /tmp/
injeopardy.msg
echo System $1 is in Jeopardy >> /tmp/injeopardy.msg
echo Please check the problem. >> /tmp/injeopardy.msg
/usr/lib/sendmail root </tmp/injeopardy.msg
rm /tmp/injeopardy.msg
3 If you are working alone, copy the trigger to the other system.
4 Continue with the next lab sections. The Multiple LLT Link Failures
Jeopardy section of this lab shows the effects of configuring the InJeopardy
trigger.
Working with your lab partner, use the procedures to create a low-priority link and
then fault communication links and observe what occurs in a cluster environment
when fencing is not configured.
2 Shut down VCS, leaving the applications running on all systems in the cluster.
Solaris Mobile
Skip this step for mobile classrooms. There is only one public interface and it
is already configured as a low-priority link.
B
8 Start GAB on each system.
_____________________________________________________________
cd /tmp
Notes:
Use lltlink_enable to restore the LLT link.
The utilities prompt you to select an interface.
These classroom utilities are provided to enable you to simulate
disconnecting and reconnecting Ethernet cables without risk of damaging
connectors.
Run the utility from one system only, unless otherwise specified.
5 Using the lltlink_disable utility, remove one LLT link and watch for
the link to expire in the console or system log file.
2 Use lltlink_disable to remove all but one LLT link and watch for the
link to expire in the console.
Solaris Mobile
B
Remove only the one high-priority LLT link (dmfe1).
2 Remove all but one LLT link and watch for the link to expire in the console.
Solaris Mobile
Remove only the one high-priority LLT link (dmfe1).
5 Remove the last LLT link and watch for the link to expire in the console.
Note: If you have more than two systems in the cluster, you must stop
HAD on all systems on either side of the network partition.
trainxx trainxx
Disk 1:___________________
Disk 3:___________________
nameDG1, nameDG2
The purpose of this lab is to set up I/O fencing in a two-node cluster and simulate
node and communication failures.
Brief instructions for this lab are located on the following page:
Lab 14 Synopsis: Configuring I/O Fencing, page A-66
Solutions for this exercise are located on the following page:
Lab 14 Solutions: Configuring I/O Fencing, page C-163
B
or
evenfendg
/etc/vxfendg oddfendg
or
evenfendg
UseFence cluster attribute SCSI3
b Display your cluster ID. Your cluster ID determines your coordinator disk
group name.
2 Optional for the classroom: Use the vxfentsthdw utility to verify that the
shared storage disks support SCSI-3 persistent reservations.
Notes:
For the purposes of this lab, you do not need to test the disks. The disks
used in this lab support SCSI-3 persistent reservations. The complete steps
are given here as a guide for real-world use.
To see how the command is used, you can run vxfentsthdw on a disk
not in use; this will enable you to continue with the lab while the
vxfentsthdw is running.
Create a test disk group with one disk and run vxfentsthdw on that test
disk group.
Use the -r option to perform read-only testing of data disks.
4 Start the fencing driver on each system using the vxfen init script.
How many keys are present for each disk and why?
B
c
1 Verify that you have a Storage Foundation Enterprise license installed on each
system for fencing support using vxlicrep.
2 Working together, verify that the cluster configuration is saved and closed.
5 Copy the main.cf and types.cf files into the test subdirectory.
7 Edit the main.cf file on that one system to set UseFence to SCSI3.
9 Stop VCS and shut down the applications. The disk groups must be reimported
for fencing to take effect.
10 Copy the main.cf file from the test subdirectory into the configuration
directory.
11 Start the cluster from the system where you edited the configuration file.
12 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
1 If the service groups with disk groups did not come online at cluster startup,
bring them online now. This imports the disk groups, which initiates fencing
on the data disks. Each student can perform these steps on their service groups.
In most cases, the following sections require that you work together with your lab
partner to observe how fencing protects data in a variety of failure situations.
Steps you can perform on your own are indicated within the procedure.
1 Verify that the nameSG1 and nameSG2 service groups are online on your
system if two students are working on the cluster. If you are working alone,
ensure that you have a service group online on each system. This scenario
requires that disk groups be imported on each system. Switch them, if
necessary.
3 Verify the registrations and reservations on the data disks for the disk groups
imported on each system.
4 Fail one of the systems by removing power or hard booting the system.
Observe the failure.
5 Verify the registrations on the coordinator disks for the remaining system.
6 Verify that the service groups that were running on the failed system have
failed over to the remaining system.
8 Boot the failed system and observe it rejoin cluster membership. Verify cluster
membership and verify that the coordinator disks have registrations for both
systems again.
B
failures.
1 If you did not already perform this step in the Testing Communication
Failures lab, copy the lltlink_enable and lltlink_disable
utilities from the location provided by your instructor into the /tmp directory.
_____________________________________________________________
4 Verify that the nameSG1 and nameSG2 service groups are online on your
system if two students are working on the cluster. If you are working alone,
ensure that you have a service group online on each system. This scenario
requires that one disk group be imported on each system. Switch the service
groups, if necessary.
6 Verify the registrations and reservations on the data disks for the disk groups
imported on each system.
12 Verify that the registrations and reservations on the data disks are now for the
remaining system.
13 When the system that rebooted is running, check the status of GAB and HAD.
14 Verify that the coordinator disks have registrations for the remaining system
only.
16 Verify that cluster membership has been established for both systems and both
systems are now registered with the coordinator disks.
B
3 Unconfigure the fencing driver.
4 From one system, import and remove the coordinator disk group.
5 Use the offline configuration procedure to set the UseFence cluster attribute to
the value NONE in the main.cf file and restart the cluster with the new
configuration.
Note: You cannot set UseFence dynamically while VCS is running.
c Edit the main.cf file in the test directory on one system in the cluster to
set the value of UseFence to NONE.
8 Start the cluster from the system where you edited the configuration file.
9 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
Visually
Visuallyinspect
inspectthe
theclassroom
classroomlablabsite.
site.
Complete
Complete and validate the designworksheet.
and validate the design worksheet.
Use
Usethe
thelab
labappendix
appendixbest
bestsuited
suitedto
toyour
your
experience
experiencelevel:
level:
?? Appendix
AppendixA:
A:Lab
LabSynopses
Synopses
?? Appendix
AppendixB:
B:Lab
LabDetails
Details
?? Appendix
AppendixC:
C:Lab
LabSolutions
Solutions
train2
train1
System Definition Sample Value Your Value
System train1
System train2
See
Seethe
thenext
nextslide
slidefor
forlab
labassignments.
assignments.
In this lab, you work with your partner to prepare the systems for installing VCS.
Brief instructions for this lab are located on the following page:
Lab 2 Synopsis: Validating Site Preparation, page A-2
Step-by-step instructions for this lab are located on the following page:
Lab 2: Validating Site Preparation, page B-3
Lab Assignments
Fill in the following table with the applicable values for your lab cluster.
C
your_sys
1 Verify that the Ethernet network interfaces for the two cluster interconnect
links are cabled together using crossover cables.
Note: In actual implementations, each link should use a completely separate
infrastructure (separate NIC and separate hub or switch). For simplicity of
configuration in the classroom environment, the two interfaces used for the
cluster interconnect are on the same NIC.
Four NodeUNIX
Software Share
192.168.XX.100
train6 train7
192.168.XX.106 Hub/Switch 192.168.XX.107
train5 Hub/Switch train8
192.168.XX.105 192.168.XX.108
train4 train9
192.168.XX.104 192.168.XX.109
train3 train10
Hub/Switch
Hub/Switch
Hub/Switch
Hub/Switch
192.168.XX.103 192.168.XX.110
SAN
train2 Disk train11
192.168.XX.102 Array 192.168.XX.111
train1 SAN train12
192.168.XX.101 Tape 192.168.XX.112
Library
LAN LAN
hostname
4 Determine the base IP address configured on the public network interface for
both your system and your partners system.
ifconfig public_interface
cat /etc/hosts
ping public_IP_address
1 Check the PATH environment variable. If necessary, add the /sbin, /usr/
sbin, /opt/VRTS/bin, and /opt/VRTSvcs/bin directories to your
PATH environment variable.
If you are using the Bourne Shell (sh, ksh, or bash), use the following
command:
$PATH=/sbin:/usr/sbin:/opt/VRTS/bin:/opt/VRTSvcs/
bin:$PATH;
export PATH
If you are using the C Shell (csh or tcsh), use the following command:
% setenv PATH /sbin:/usr/sbin:/opt/VRTS/bin:/opt/
VRTSvcs/bin:$PATH
vxlicrep -s
Verify that ssh configuration files are set up in order to install VCS on Linux or to
run remote commands without prompts for passwords.
If you do not configure ssh, you are required to type in the root passwords for all
systems for every remote command issued during the following services
preparation lab and the installation procedure.
2 Generate a DSA key pair on this system by running the following command:
ssh-keygen -t dsa
b Ensure that you copy the line to the other systems in your cluster.
rpm -q openssh-askpass-gnome
2 If you do not have a $HOME/.Xclients file (you should not have one after
C
installation), run switchdesk to create it. In your $HOME/.Xclients file,
edit the following:
exec $HOME/.Xclients-default
a Click the Startup Programs Tab and Add and enter /usr/bin/ssh-add
in the Startup Command text area.
b Set the priority to a number higher than any existing commands to ensure
that it is executed last. A good priority number for ssh-add is 70 or
higher. The higher the priority number, the lower the priority. If you have
other programs listed, this one should have the lowest priority.
c Click OK to save your settings, and exit the GNOME Control Center.
4 Log out and then log back into GNOME; in other words, restart X.
1 Open a console window so you can observe messages during later labs.
vcs1
Link 1:______
Link 1:______ Link 2:______
Link 2:______
Public:______ Public:______
train1 train2
4.x ## ./installer
./installer Software
4.x
location:_______________________________
Pre-4.0
Pre-4.0 ## ./installvcs
./installvcs
Subnet:_______
In this lab, work with your lab partner to install VCS on both systems.
Brief instructions for this lab are located on the following page:
Lab 3 Synopsis: Installing VCS, page A-6
Step-by-step instructions for this lab are located on the following page:
Lab 3: Installing VCS, page B-11
C
Linux: eth1
VA: bge2
Ethernet interface for Solaris: qfe1
interconnect link #2 Sol Mob: dmfe1
AIX: en3
HP-UX lan2
Linux: eth2
VA: bge3
Public network interface Solaris: eri0
interface Sol Mob: dmfe0
AIX: en1
HP-UX lan0
Linux: eth0
VA: bge0
_____________________________________________________________
install_dir
2 This step is to be performed from only one system in the cluster. The install
script installs and configures all systems in the cluster.
C
cd install_dir
Notes:
For VCS 4.x, install Storage Foundation HA (which includes VCS,
Volume Manager, and File System).
Use the information in the previous table or design worksheet to
respond to the installation prompts.
Sample prompts and input are provided at the end of the lab.
For versions of VCS before 4.0, use installvcs.
c If a license key is needed, obtain one from your instructor and record it
here.
License Key: _________________________________
3 If you did not install the Java GUI package as part of the installer (VPI)
process (or installvcs for earlier versions of VCS), install the VRTScscm
Java GUI package on each system in the cluster. The location of this package is
in the pkgs directory under the install location directory given to you by your
instructor.
Solaris
pkgadd -d /install_dir/cluster_server/pkgs VRTScscm
HP
swinstall -s /install_dir/cluster_server/pkgs VRTScscm
AIX
installp -a -d /install_dir/cluster_server/pkgs/
VRTScscm.rte.bff VRTScscm.rte
Linux
rpm -ihv VRTScscm-4.1.00.0-GA_GENERIC.noarch.rpm
_______________________________________
install_dir
2 Install any VCS patches or updates, as directed by your instructor. Use the
operating system-specific command, as shown in the following examples.
Solaris
C
pkgadd -d /install_dir/pkgs VRTSxxxx
HP
swinstall -s /install_dir/pkgs VRTSxxxx
AIX
installp -a -d /install_dir/pkgs/VRTSxxxx.rte.bff
VRTSxxxx.rte
Linux
rpm -ihv VRTSxxxx-x.x.xx.xx-GA_RHEL.i686.rpm
3 Install any other software indicated by your instructor. For example, if your
classroom uses VCS 3.5, you may be directed to install VERITAS Volume
Manager and VERITAS File System.
hastatus -sum
lltconfig
gabconfig -a
a Verify that the cluster ID, system names, and network interfaces specified
during install are present in the /etc/llttab file.
cat /etc/llttab
cat /etc/llthosts
C
2 Explore the GAB configuration.
Verify that the number of systems in the cluster matches the value for the
-n flag set in the /etc/gabtab file.
cat /etc/gabtab
cat /etc/VRTSvcs/conf/config/main.cf
Verify GUI connectivity with the Java GUI and the Web GUI. Both GUIs can
connect to the cluster with the default user of admin and password as the default
password.
2 Start the Java GUI and connect to the cluster using these values:
Cluster alias: nameCluster
Host name: ip_address (used during installation)
Failover retries: 12 (retain default)
hagui &
Select File>New Cluster.
Selection Menu:
C
Product
L) License a Product P) Perform a
Preinstallation Check
U) Uninstall a Product D) View a Product
Description
Q) Quit ?) Help
SF Licensing Verification:
C
Do you want to enter another license key for train1?
[y,n,q,?] (n) n
C
Private Heartbeat NICs for train1: link1=qfe0
link2=qfe1
Private Heartbeat NICs for train2: link1=qfe0
link2=qfe1
A user name
A password for the user
User privileges (Administrator, Operator, or
Guest)
C
192.168.XXX.XXX
Enter the netmask for IP 192.168.XXX.XXX: [b,?]
(255.255.255.0) 255.255.255.0
NIC: eri0
IP: 192.168.27.91
Netmask: 255.255.255.0
C
........................................ Started
Starting GAB on train1
........................................ Started
Starting GAB on train2
........................................ Started
Starting Cluster Server on train1
............................. Started
Starting Cluster Server on train2
............................. Started
Confirming Cluster Server startup ...................
2 systems RUNNING
/opt/VRTS/install/logs/installer131114527.summary
/opt/VRTS/install/logs/installer131114527.log
/opt/VRTS/install/logs/installer131114527.response
include "types.cf"
cluster vcs (
UserNames = { admin = ElmElgLimHmmKumGlj }
CredRenewFrequency = 0
ClusterAddress = "192.168.27.91"
Administrators = { admin }
CounterInterval = 5
)
C
system train1 (
)
system train2 (
)
group ClusterService (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1, train2 }
OnlineRetryLimit = 3
)
IP webip (
Device = eri0
Address = "192.168.27.42"
NetMask = "255.255.255.0"
)
NIC csgnic (
Device = eri0
)
VRTSWebApp VCSweb (
This lab uses the VERITAS Cluster Server Simulator and the Cluster Manager
Java Console. You are provided with a preconfigured main.cf file to learn about
managing the cluster.
Brief instructions for this lab are located on the following page:
Lab 4 Synopsis: Using the VCS Simulator, page A-18
Step-by-step instructions for this lab are located on the following page:
Lab 4: Using the VCS Simulator, page B-21
Local Simulator
config directory:
sim_config_dir
PATH=$PATH:/opt/VRTScssim/bin
export PATH
VCS_SIMULATOR_HOME=/opt/VRTScssim
export VCS_SIMULATOR_HOME
hasimgui &
C
6 In a terminal window, change to the simulator configuration directory for the
new simulated cluster named vcs_operations.
cd /opt/VRTScssim/vcs_operations/conf/config
___________________________________________
cf_files_dir
cp cf_files_dir/main.cf /opt/VRTScssim/vcs_operations/
conf/config
cp cf_files_dir/types.cf /opt/VRTScssim/vcs_operations/
conf/config
cp cf_files_dir/OracleTypes.cf /opt/VRTScssim/
vcs_operations/conf/config
C
10 Log in as oper with password oper.
Note: While you may use admin/password to log in, the point of using oper is to
demonstrate the differences in privileges between VCS user accounts.
3
With the Cluster object name selected in the left-hand frame of the
Cluster Manager, click on the Status tab in the right-hand frame.
Notice the Systems-> indicator and count the number of named columns,
that is, one for each cluster member.
With the Cluster object name selected in the left-hand frame of the
Cluster Manager, click on the Status tab in the right frame. The service
C
groups with their names as labels are shown.
3 Which service groups have service group operator privileges set for the oper
account?
OraListener.
a Click on the OracleSG service group name in the left-hand frame of
the Cluster Manager.
b Click on the Resources tab in the right-hand frame.
c Observe the top-most parent resource in the resource dependency tree.
6 Which immediate child resources does the Oracle resource in the OracleSG
service group depend on?
OraMount.
a Click on the OracleSG service group name in the left-hand frame of
the Cluster Manager.
b Click on the Resources tab in the right-hand frame.
c Observe the child resources in the resource dependency tree for the
dependent parent resource named Oracle.
What happens?
C
Right-click on the AppSG service group, select Offline, and click S1.
What happens?
The Offline selection is displayed for this service group and you can take
the group offline because you have privileges for this service group.
Right-click on the OracleSG service group, select Offline, and click S1.
What happens?
4 Take all service groups that you have privileges for offline everywhere.
For each service group for which you have privileges (AppSG and
OracleSG) and that is not already offline everywhere:
a Right-click the service group.
b Select the Offline menu option and click All Systems.
9 Bring all service groups that you have privileges for online on S3.
C
failover.
What happens?
What happens?
What happens?
9 Clear the fault on the Oracle resource in the OracleSG service group.
a Right-click on Oracle.
b Choose Clear Fault from the menu.
c Choose S3.
The fault is now cleared.
C
13 Stop the simulator from the Simulator Java Console.
/bob1/loopy /sue1/loopy
bobDG1 sueDG1
/bob1 bobVol1
disk1 sueVol1 /sue1
disk2
Disk/Lun Disk/Lun
See
Seenext
nextslide
slidefor
forclassroom
classroomvalues.
values.
The purpose of this lab is to prepare the loopy process service for high availability.
Brief instructions for this lab are located on the following page:
Lab 5 Synopsis: Preparing Application Services, page A-24
Step-by-step instructions for this lab are located on the following page:
Lab 5: Preparing Application Services, page B-29
Lab Assignments
Fill in the table with the applicable values for your lab cluster.
C
train4 192.168.xxx.54
train5 192.168.xxx.55
train6 192.168.xxx.56
train7 192.168.xxx.57
train8 192.168.xxx.58
train9 192.168.xxx.59
train10 192.168.xxx.60
train11 192.168.xxx.61
train12 192.168.xxx.62
Application script location
class_sw_dir
vxdisk list
3 Initialize a disk for Volume Manager using the disk device from the worksheet.
vxdisksetup -i disk_device
4 Create a disk group with the name from the worksheet using the initialized
disk.
Complete the following steps to set up a virtual IP address for the application.
1 Verify that an IP address exists on the base interface for the public network.
ifconfig -a
A script named loopy is used as the example application for this lab exercise.
__________________________________________________________
class_sw_dir
2 Copy or type this code into a file named loopy on the file system you created
previously in this lab.
cp /class_sw_dir/loopy /name1/loopy
C
3 Verify that you have a console window open to see the display from the script.
Complete the following steps to migrate the application to the other system.
1 Stop your loopy process by sending a kill signal. Verify that the process is
stopped.
2 Remove the virtual IP address configured earlier in this lab. Verify that the IP
address is no longer configured.
Solaris
ifconfig -a
ifconfig virtual_interface unplumb
ifconfig -a
AIX
ifconfig -a
ifconfig interface ipaddress delete
ifconfig -a
HP-UX
netstat -in
ifconfig interface inet 0.0.0.0
netstat -i
Linux
ifconfig -a
ifconfig interface:instance down
ifconfig -a
umount /name1
mount | grep name1
C
vxdctl enable
10 Verify that your mount point directory exists. Create it if it does not exist.
ls -d /name1
mkdir /name1
Complete the following steps to bring the application offline on the other system
so that it is ready to be placed under VCS control.
1 While still logged into the other system, stop your loopy process by sending a
kill signal. Verify that the process is stopped.
2 Remove the virtual IP address configured earlier in this lab. Verify that the IP
address is no longer configured.
Solaris
C
ifconfig -a
ifconfig virtual_interface unplumb
ifconfig -a
AIX
ifconfig -a
ifconfig interface ipaddress delete
ifconfig -a
HP-UX
netstat -in
ifconfig interface inet 0.0.0.0
netstat -in
Linux
ifconfig -a
ifconfig interface:instance down
ifconfig -a
umount /name1
mount | grep name1
vcs1
train1 train2
## hastop
hastop all
all -force
-force
The following procedure demonstrate how the cluster configuration changes states
during startup and shutdown, and shows how the .stale file works.
Brief instructions for this lab are located on the following page:
Lab 6 Synopsis: Starting and Stopping VCS, page A-29
Step-by-step instructions for this lab are located on the following page:
Lab 6: Starting and Stopping VCS, page B-37
Note: Complete this section with your lab partner.
cd /etc/VRTSvcs/conf/config
ls -al .
haconf -makerw
ls -al .
hastop -all
6 Stop the cluster using the hastop -all -force command from one system
only to stop VCS forcibly and leave the applications running.
C
7 Start VCS on each system in the cluster.
hastart
hastatus -summary
The cluster configuration was left open when VCS was stopped.
ls -al /etc/VRTSvcs/conf/config
11 Return all systems to a running state (from one system in the cluster).
hastatus -summary
Any service groups that were online at the time that the hastop -all
-force command was run should still be online now that VCS has been
restarted.
ls -al /etc/VRTSvcs/conf/config
The purpose of this lab is to create a service group while VCS is running using
either the Cluster Manager graphical user interface or the command-line interface.
Brief instructions for this lab are located on the following page:
Lab 7 Synopsis: Online Configuration of a Service Group, page A-31
Step-by-step instructions for this lab are located on the following page:
Lab 7: Online Configuration of a Service Group, page B-41
Classroom-Specific Values
Fill in this table with the applicable values for your lab cluster.
Fill in the design worksheet with values appropriate for your cluster and use the
information to create a service group.
C
1 If you are using the GUI, start Cluster Manager and log in to the cluster.
hagui &
GUI: Right-click your cluster name in the left panel and select Add
Service Group.
4 Modify the SystemList to allow the service group to run on the two systems
specified in the design worksheet.
GUI: Select each system and click the right arrow button.
GUI: Click the Startup box for your system; then click OK to create the
service group.
6 Verify that the service group can autostart and that it is a failover service group.
GUI: Right click the service group, select Properties, and click Show all
attributes.
7 Save the cluster configuration and view the configuration file to verify your
changes.
view /etc/VRTSvcs/conf/config/main.cf
Complete the following steps to add NIC, IP, DiskGroup, Volume, and Process
resources to the service group using the information from the design worksheet.
C
Device Solaris: eri0
Sol Mob: dmfe0
AIX: en1
HP-UX: lan0
Linux: eth0
VA: bge0
NetworkHosts* 192.168.xx.1 (HP-UX
only)
Critical? No (0)
Enabled? Yes (1)
GUI:
a Right-click the service group and select Add Resource.
b Type the name from the table.
c Select the resource type from the list.
CLI:
Solaris
hares -modify nameNIC1 Device interface
AIX
hares -modify nameNIC1 Device interface
HP-UX
hares -modify nameNIC1 Device interface
hares -modify nameNIC1 NetworkHosts other_system1
other_system2
Linux
hares -modify nameNIC1 Device interface
5 Verify that the resource is online. Because this is a persistent resource, you do
not need to bring it online.
6 Save the cluster configuration and view the configuration file to verify your
changes.
view /etc/VRTSvcs/conf/config/main.cf
C
VA: bge0
Address 192.168.xx.** see table
Optional Attributes
Netmask 255.255.255.0
Critical? No (0)
Enabled? Yes (1)
System IP Address
train1 192.168.xx.51
train2 192.168.xx.52
train3 192.168.xx.53
train4 192.168.xx.54
train5 192.168.xx.55
train6 192.168.xx.56
train7 192.168.xx.57
train8 192.168.xx.58
train9 192.168.xx.59
train10 192.168.xx.60
train11 192.168.xx.61
train12 192.168.xx.62
GUI:
a Right-click the service group and select Add Resource.
b Type the name from the table.
c Select the resource type from the list.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
CLI:
hares -modify nameIP1 Device interface
hares -modify nameIP1 Address xxx.xxx.xxx.xxx
7 Save the cluster configuration and view the configuration file to verify your
changes.
C
CLI: haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
7 Save the cluster configuration and view the configuration file to verify your
changes.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
6 Verify that the resource is online in VCS and at the operating system level.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
7 Save the cluster configuration and view the configuration file to verify your
changes.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
1 Add the resource to the service group using either the GUI or CLI.
3 Set the required attributes for this resource, and any optional attributes, if
needed.
Note: If you are using the GUI to configure the resource, you do not need
to include the quotation marks.
5 Ensure that you have the console or a terminal window open for loopy output.
7 Verify that the resource is online in VCS and at the operating system level.
8 Save the cluster configuration and view the configuration file to verify your
changes.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
nameMount1 nameVol1
nameIP1 nameNIC1
nameProcess1 nameMount1
nameProcess1 nameIP1
1
1 Link resource pairs together based on the design worksheet.
hares -dep
3 Save the cluster configuration and view the configuration file to verify your
changes.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
Complete the following steps to test the service group on each system in the
service group SystemList.
1 Test the service group by switching away from your system in the cluster.
2 Verify that the service group came online properly on their system.
hastatus -summary
3 Test the service group by switching it back to your system in the cluster.
C
hagrp -switch nameSG1 -to your_sys
4 Verify that the service group came online properly on your system.
hastatus -summary
2 Save the cluster configuration and view the configuration file to verify your
changes.
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
3 Close the cluster configuration after all students working in your cluster are
finished.
group nameSG1 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG1 (
DiskGroup = nameDG1
)
IP nameIP1 (
C
Device = eri0
Address = "192.168.27.51"
)
Mount nameMount1 (
MountPoint = "/name1"
BlockDevice = "/dev/vx/dsk/nameDG1/
nameVol1"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess1 (
PathName = "/bin/sh"
Arguments = "/name1/loopy name 1"
)
NIC nameNIC1 (
Device = eri0
)
Volume nameVol1 (
Volume = nameVol1
DiskGroup = nameDG1
)
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
App
DG1 Working DG2
Workingtogether,
together,follow
followthe
theoffline
offline DG
configuration
configurationprocedure.
procedure.
Alternately,
Alternately,work
workalone
aloneand
anduse
usethe
the
GUI
GUIto
tocreate
createaanew
newservice
servicegroup.
group.
The purpose of this lab is to add a service group by copying and editing the
definition in main.cf for nameSG1.
Brief instructions for this lab are located on the following page:
Lab 8 Synopsis: Offline Configuration of a Service Group, page A-38
Step-by-step instructions for this lab are located on the following page:
Lab 8: Offline Configuration of a Service Group, page B-57
C
Volume name nameVol2
vxdisk list
2 Initialize a disk for Volume Manager using the disk device from the worksheet.
vxdisksetup -i disk_device
3 Create a disk group with the name from the worksheet using the initialized
disk.
All
mkdir /name2
Solaris, AIX
rsh their_sys mkdir /name2
HP-UX
remsh their_sys mkdir /name2
Linux
ssh their_sys mkdir /name2
mount
9 Copy the loopy script to your file system created in this lab.
C
cp /class_sw_dir/loopy /name2/loopy
View the console and verify that the new loopy process is echoing
nameSG2 in the message.
12 Stop the resources to prepare to place them under VCS control in the next
section of the lab.
a Stop the loopy process by sending a kill signal. Verify that the process is
stopped.
umount /name2
mount
Record information needed to create a new service group in the design worksheet.
C
Resource Definition Sample Value Your Value
Service Group nameSG2
Resource Name nameNIC2
Resource Type NIC
Required Attributes
Device Solaris: eri0
Sol Mob: dmfe0
AIX: en1
HP-UX: lan0
Linux: eth0
VA: bge0
NetworkHosts* 192.168.xx.1 (HP-UX
only)
Critical? No (0)
Enabled? Yes (1)
System IP Address
train1 192.168.xx.71
train2 192.168.xx.72
train3 192.168.xx.73
train4 192.168.xx.74
train5 192.168.xx.75
train6 192.168.xx.76
train7 192.168.xx.77
train8 192.168.xx.78
train9 192.168.xx.79
train10 192.168.xx.80
train11 192.168.xx.81
train12 192.168.xx.82
C
Resource Definition Sample Value Your Value
Service Group nameSG2
Resource Name nameVol2
Resource Type Volume
Required Attributes
Volume nameVol2
DiskGroup nameDG2
Critical? No (0)
Enabled? Yes (1)
nameMount2 nameVol2
nameIP2 nameNIC2
nameProcess2 nameMount2
nameProcess2 nameIP2
Note: You may choose to use the GUI to create the nameSG2 service group. If so,
skip this section and complete the Alternate Lab section instead.
1 Working with your lab partner, verify that the cluster configuration is saved
and closed.
cd /etc/VRTSvcs/conf/config
mkdir test
4 Copy the main.cf and types.cf files into the test subdirectory.
All
cp main.cf types.cf test
Linux
Also copy the vcsApacheTypes.cf file.
cd test
6 Edit the main.cf file in the test directory on one system in the cluster.
a For each students service group, copy the nameSG1 service group
structure to a nameSG2.
b Rename all of the resources within the nameSG1 service group to end with
2 instead of 1, as shown in the following table.
Partial Example:
# vi main.cf
group nameSG2 (
SystemList = { train3 = 0, train4 = 1 }
AutoStartList = { train3 }
)
C
DiskGroup nameDG2 (
DiskGroup = nameDG2
)
.
.
.
7 Edit the attributes of each copied resource to match the design worksheet
values shown earlier in this section.
9 Stop VCS on all systems, but leave the applications still running.
11 Copy the main.cf file from the test subdirectory into the configuration
directory.
cp main.cf ../main.cf
12 Start the cluster from the system where you edited the configuration file.
hastart
13 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
hastart -stale
hastatus -summary
hastatus -summary
Use the information in the design worksheet in the previous section to create a new
service group using the GUI to copy resources from the nameSG1 service group.
1 Start Cluster Manager and log in to the cluster.
hagui &
GUI: Right-click your cluster name in the left panel and select Add
Service Group.
4 Modify the SystemList to allow the service group to run on the two systems
specified in the design worksheet.
GUI: Select each system and click the right arrow button.
5 Modify the AutoStartList attribute to allow the service group to start on your
system.
GUI: Click the Startup box for your system then click OK to create the
service group.
6 Verify that the service group can autostart and that it is a failover service group.
GUI: Right-click the service group, select Properties, and click Show all
attributes.
7 Save the cluster configuration and view the configuration file to verify your
changes.
view /etc/VRTSvcs/conf/config/main.cf
f Select the Resources tab to display the resource view. There are no
resources yet in nameSG2.
C
g Right-click anywhere in the right pane display area of the Resources
tab.
h Select Paste.
k Click OK.
10 Modify each resource to set the attribute values as specified in the worksheet.
11 Save the cluster configuration and view the configuration file to verify your
changes.
view /etc/VRTSvcs/conf/config/main.cf
13 Bring the nameSG2 resources online, starting from the bottom of the
dependency tree.
Note: In the GUI, the Close configuration action saves the configuration
automatically.
group nameSG2 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG2 (
DiskGroup = nameDG2
)
IP nameIP2 (
C
Device = eri0
Address = "192.168.27.71"
)
Mount nameMount2 (
MountPoint = "/name2"
BlockDevice = "/dev/vx/dsk/nameDG2/
nameVol2"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess2 (
PathName = "/bin/sh"
Arguments = "/name2/loopy name 2"
)
NIC nameNIC2 (
Device = eri0
)
nameSG1
nameSG1 nameSG2
nameSG2 name
name
Process1 Process2
name name
DB
DG1 Network Network DG2
DG
NIC Phantom NetworkSG
NetworkSG
The purpose of this lab is to add a parallel service group to monitor the NIC
resource and replace the NIC resources in the failover service groups with Proxy
resources.
Brief instructions for this lab are located on the following page:
Lab 9 Synopsis: Creating a Parallel Service Group, page A-47
Step-by-step instructions for this lab are located on the following page:
Lab 9: Creating a Parallel Service Group, page B-73
Work with your lab partner to create a parallel service group containing network
resources using the information in the design worksheet.
C
1 Open the cluster configuration.
haconf -makerw
3 Modify the SystemList to allow the service group to run on the systems
specified in the design worksheet.
4 Modify the AutoStartList attribute to allow the service group to start on both
systems.
5 Modify the Parallel attribute to allow the service group to run on both systems.
Use the values in the following tables to create NIC and Phantom resources.
Critical? No (0)
Enabled? Yes (1)
All
hares -modify NetworkNIC Device interface
HP-UX
hares -modify NetworkNIC NetworkHosts other_system1
other_system2
5 Verify that the resource is online. Because it is a persistent resource, you do not
need to bring it online.
C
hares -display NetworkNIC
9 Verify that the status of the NetworkSG service group now shows as online.
hastatus -sum
haconf -dump
view /etc/VRTSvcs/conf/config/main.cf
Use the values in the tables to replace the NIC resources with Proxy resources and
create new links.
Note: Only one student can delete the ClusterService NIC resource.
2 Add a proxy resource to each failover service group using the service group
naming convention:
nameProxy1
nameProxy2
csgProxy
C
hares -add nameProxy1 Proxy nameSG1
hares -add nameProxy2 Proxy nameSG2
hares -add csgProxy Proxy ClusterService
haconf -dump
Use the values in the following tables to replace the NIC resources with Proxy
resources and create new links.
C
Service Group nameSG2
Parent Resource Requires Child Resource
nameIP2 nameProxy2
include "types.cf"
cluster vcs (
UserNames = { admin = ElmElgLimHmmKumGlj }
ClusterAddress = "192.168.27.51"
Administrators = { admin }
CounterInterval = 5
)
system train1 (
C
)
system train2 (
)
group ClusterService (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1, train2 }
OnlineRetryLimit = 3
Tag = CSG
)
IP webip (
Device = eri0
Address = "192.168.27.42"
NetMask = "255.255.255.0"
)
Proxy csgProxy (
TargetResName = NetworkNIC
)
group NetworkSG (
SystemList = { train1 = 0, train2 = 1 }
Parallel = 1
AutoStartList = ( train1, train2 }
)
NIC NetworkNIC (
Device = eri0
)
Phantom NetworkPhantom (
)
group nameSG1 (
SystemList = { train1 = 0, train2 = 1 }
AutoStartList = { train1 }
)
DiskGroup nameDG1 (
DiskGroup = nameDG1
)
Mount nameMount1 (
MountPoint = "/name1"
BlockDevice = "/dev/vx/dsk/nameDG1/
nameVol1"
FSType = vxfs
FsckOpt = "-y"
)
C
Process nameProcess1 (
PathName = "/bin/ksh"
Arguments = "/name1/loopy name 1"
)
Proxy nameProxy1 (
TargetResName = NetworkNIC
)
Volume nameVol1 (
Volume = nameVol1
DiskGroup = nameDG1
)
DiskGroup nameDG2 (
DiskGroup = nameDG2
)
IP nameIP2 (
Device = eri0
Address = "192.168.27.71"
)
Mount nameMount2 (
MountPoint = "/name2"
BlockDevice = "/dev/vx/dsk/nameDG2/
nameVol2"
FSType = vxfs
FsckOpt = "-y"
)
Process nameProcess2 (
PathName = "/bin/ksh"
Arguments = "/name2/loopy name 2"
)
Proxy nameProxy2 (
TargetResName = NetworkNIC
)
Volume nameVol2 (
Volume = nameVol2
DiskGroup = nameDG2
)
nameSG1 nameSG2
ClusterService
NotifierMngr
Optional Lab
resfault
resfault
Triggers
Triggers nofailover
nofailover SMTP
SMTPServer:
Server:
resadminwait
resadminwait
___________________________________
___________________________________
Work with your lab partner to add a NotifierMngr type resource to the
ClusterService service group using the information in the design worksheet.
C
PathName /xxx/xxx (AIX only)
Critical? No (0)
Enabled? Yes (1)
haconf -makerw
4 Set the required attributes for this resource and any optional attributes, if
needed.
AIX
hares -modify notifier SmtpServer localhost
hares -modify notifier SmtpRecipients -add root Warning
hares -modify notifier PathName /xxx/xxx
7 Bring the resource online on the system running the ClusterService service
group.
haconf -dump
1 Test the service group by switching it to the other system in the cluster.
2 Verify that the service group came online properly on the other system.
hastatus -sum
3 Test the service group by switching it back to the original system in the cluster.
4 Verify that the service group came online properly on the original system.
C
hastatus -sum
6 Save and close the cluster configuration and view the configuration file to
verify your changes.
Note: In the next lab, you will see the effects of configuring notification and
triggers when you test various resource fault scenarios.
Use the following procedure to configure triggers for notification. In this lab, each
student creates a local copy of the trigger script on their own system. If you are
working alone in the cluster, copy your completed triggers to the other system.
#!/bin/sh
echo `date` > /tmp/resfault.msg
echo message from the resfault trigger >> /tmp/
resfault.msg
echo Resource $2 has faulted on System $1 >> /tmp/
resfault.msg
echo Please check the problem. >> /tmp/resfault.msg
/usr/lib/sendmail root </tmp/resfault.msg
rm /tmp/resfault.msg
#!/bin/sh
echo `date` > /tmp/nofailover.msg
echo message from the nofailover trigger >> /tmp/
nofailover.msg
echo no failover for service group $2 >> /tmp/
nofailover.msg
echo Please check the problem. >> /tmp/nofailover.msg
/usr/lib/sendmail root </tmp/nofailover.msg
rm /tmp/nofailover.msg
#!/bin/sh
echo `date` > /tmp/resadminwait.msg
echo message from the resadminwait trigger >> /tmp/
resadminwait.msg
echo Resource $2 on System $1 is in adminwait for
Reason $3 >> /tmp/resadminwait.msg
echo Please check the problem. >> /tmp/resadminwait.msg
/usr/lib/sendmail root </tmp/resadminwait.msg
rm /tmp/resadminwait.msg
C
chmod 744 nofailover
chmod 744 resadminwait
5 If you are working alone, copy all triggers to the other system.
Note:
Note:Network
Networkinterfaces
interfacesfor
forvirtual
virtualIP
IPaddresses
addresses
are
areunconfigured
unconfiguredtotoforce
forcethe
theIP
IPresource
resourcetotofault.
fault.
In
Inyour
yourclassroom,
classroom,the
theinterface
interfaceyou
youspecify
specifyis:______
is:______
Replace
Replacethe
thevariable
variableinterface
interfacein
inthe
thelab
labsteps
stepswith
withthis
this
value.
value.
The purpose of this lab is to observe how VCS responds to faults in a variety of
scenarios.
Brief instructions for this lab are located on the following page:
Lab 11 Synopsis: Configuring Resource Fault Behavior, page A-55
Step-by-step instructions for this lab are located on the following page:
Lab 11: Configuring Resource Fault Behavior, page B-93
This part of the lab exercise explores the default behavior of VCS.
1 Open the cluster configuration.
haconf -makerw
2 Verify that all resources in the nameSG1 service group are currently set to
critical; if not, set them to critical.
3 Set the IP and Process resources to not critical in the nameSG1 service group.
C
hares -modify nameIP1 Critical 0
hares -modify nameProcess1 Critical 0
4 Change the monitor interval for the IP resource type to 10 seconds and the
offline monitor interval for the IP resource type to 30 seconds.
haconf -dump
6 Verify that your nameSG1 service group is currently online on your system. If
it is not, bring it online or switch it to your system.
hastatus -sum
hagrp -switch nameSG1 -to your_sys
10 Set the IP and process resource to critical in the nameSG1 service group.
haconf -dump
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
3 Verify that your nameSG1 service group is currently online on your system. If
it is not online locally, bring it online or switch it to your system.
C
hastatus -sum
hagrp -switch nameSG1 -to your_sys
The notifier sends two e-mail messagesone for the faulted resource
and one for the faulted service group. The resfault trigger should send
e-mail if configured.
5 Without clearing faults from the last failover, unconfigure the virtual IP
address on their system.
Solaris
rsh their_sys ifconfig interface removeif 192.168.xx.xx
HP
rsh their_sys ifconfig interface inet 0.0.0.0
AIX
rsh their_sys ifconfig interface ipaddress delete
Linux
ssh -l root their_sys ifconfig interface down
The group cannot fail over because there are no failover targets left.
The group stays offline.
The notifier sends two e-mail messagesone for the faulted resource
and one for the faulted service group. The resfault and nofailover
triggers should send e-mail, if configured.
6 Clear the nameIP1 resource on all systems and bring the nameSG1 service
group online on your system.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
3 Verify that your nameSG1 Service group is currently online on your system. If
C
it is not online locally, bring it online or switch it to your system.
hastatus -sum
hagrp -switch nameSG1 -to your_sys
There is no failover.
What happens?
The resource fault should clear on its own, when the agent probes the
resource (after the offline monitor interval), which is now online. You can
probe the resource to manually check the state more quickly.
This section illustrates service group failover behavior using the ManageFaults
and FaultPropagation attributes.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
3 Set the FaultPropagation attribute for the nameSG1 service group to off (0).
There is no failover.
5 Clear the faulted resource and bring the resource back online.
6 Set the ManageFaults attribute for the nameSG1 service group to NONE and
set the FaultPropagation attribute back to one (1).
C
VCS.
Solaris
ifconfig interface removeif 192.168.xx.xx
HP
ifconfig interface inet 0.0.0.0
AIX
ifconfig interface ipaddress delete
Linux
ifconfig interface down
There is no failover.
There is no failover.
c Did you receive e-mail notification?
11 Clear the faulted nameIP1 resource and switch the nameSG1 service group
back to your system.
C
12 Set ManageFaults back to ALL for the nameSG1 service group and save the
cluster configuration.
This section illustrates failover behavior of a resource type using restart limits.
1 Verify that all resources in the nameSG1 service group are currently set to
critical.
2 Set all resources to critical, if they are not already set, and save the cluster
configuration.
4 Stop the loopy process running in the nameSG1 service group by sending a
kill signal.
There is no failover.
C
The group fails over.
The notifier sends two e-mail messagesone for the faulted resource
and one for the faulted service group. The resfault trigger should send
e-mail if configured.
6 Clear the faulted resource and switch the nameSG1 service group back to your
system.
7 When all students have completed the lab, save and close the configuration.
trainxx
trainxx
O trainxx
trainxx
Optional Lab
Trigger
Trigger injeopardy
injeopardy
The purpose of this lab is to configure a low-priority link and then pull network
cables and observe how VCS responds.
Brief instructions for this lab are located on the following page:
Lab 13 Synopsis: Testing Communication Failures, page A-60
Step-by-step instructions for this lab are located on the following page:
Lab 13 Details: Testing Communication Failures, page B-101
Use the following procedure to configure triggers for jeopardy notification. In this
lab, students create a local copy of the trigger script on their own systems. If you
are working alone in the cluster, copy your completed triggers to the other system.
1 Create a text file in the /opt/VRTSvcs/bin/triggers directory named
injeopardy. Add the following lines to the file:
#!/bin/sh
echo `date` > /tmp/injeopardy.msg
echo message from the injeopardy trigger >> /tmp/
injeopardy.msg
echo System $1 is in Jeopardy >> /tmp/injeopardy.msg
echo Please check the problem. >> /tmp/injeopardy.msg
C
/usr/lib/sendmail root </tmp/injeopardy.msg
rm /tmp/injeopardy.msg
3 If you are working alone, copy the trigger to the other system.
Solaris, AIX
rcp injeopardy their_sys:/opt/VRTSvcs/bin/triggers/
injeopardy
HP-UX
remsh injeopardy their_sys:/opt/VRTSvcs/bin/triggers/
injeopardy
Linux
scp injeopardy their_sys:/opt/VRTSvcs/bin/triggers/
injeopardy
4 Continue with the next lab sections. The Multiple LLT Link Failures
Jeopardy section of this lab shows the effects of configuring the InJeopardy
trigger.
Working with your lab partner, use the procedures to create a low-priority link
and then fault communication links and observe what occurs in a cluster
environment when fencing is not configured.
2 Shut down VCS, leaving the applications running on all systems in the cluster.
gabconfig -U
lltconfig -U
Solaris Example
set-cluster 1
set-node train1
link tag1 /dev/qfe:0 - ether - -
link tag2 /dev/qfe:1 - ether - -
link-lowpri tag3 /dev/eri:0 - ether - -
AIX Example
set-cluster 1
set-node train1
link tag1 /dev/en:2 - ether - -
link tag2 /dev/en:3 - ether - -
link-lowpri tag3 /dev/en:1 - ether - -
HP-UX Example
set-cluster 10
set-node train1
link tag1 /dev/lan:1 - ether - -
link tag2 /dev/lan:2 - ether - -
link-lowpri tag3 /dev/lan:0 - ether - -
Linux Example
set-cluster 1
set-node train1
link tag1 eth1 - ether - -
link tag2 eth2 - ether - -
link-lowpri tag3 eth0 - ether - -
lltconfig -c
lltconfig
sh /etc/gabtab
Alternatively, you can start GAB using gabconfig. However, sourcing the
gabtab is preferred to ensure any changes to /etc/gabtab you may have
made are tested.
C
gabconfig -c -n 2
gabconfig -a
hastart
hastatus -sum
_____________________________________________________________
cd /tmp
haconf -makerw
Notes:
Use lltlink_enable to restore the LLT link.
The utilities prompt you to select an interface.
These classroom utilities are provided to enable you to simulate
disconnecting and reconnecting Ethernet cables without risk of damaging
connectors.
Run the utility from one system only, unless otherwise specified.
./lltlink_disable
lltstat -nvv
C
Replace the removed cable. To use the lltlink_disable utility, type:
./lltlink_enable
lltstat -nvv
gabconfig -a
2 Use lltlink_disable to remove all but one LLT link and watch for the
link to expire in the console.
Use lltlink_disable to remove all but one LLT links from operation
(private or low priority).
./lltlink_disable
Select the first LLT link from the list.
./lltlink_disable
Select the next LLT link from the list.
Solaris Mobile
Remove only the one high-priority LLT link (dmfe1).
lltstat -nvv
gabconfig -a
./lltlink_enable
Select the first LLT link to restore.
./lltlink_enable
Select the second LLT link to restore.
lltstat -nvv
gabconfig -a
gabconfig -a
2 Remove all but one LLT link and watch for the link to expire in the console or
system log.
Disable all but one LLT link (private or low priority). For each link, type:
./lltlink_disable
Solaris Mobile
Disable only the one high-priority LLT link (dmfe1).
lltstat -nvv
gabconfig -a
5 Remove the last LLT link and watch for the link to expire in the console.
lltstat -nvv
gabconfig -a
Each side of the cluster should only have membership for its node.
hastatus -sum
Note: If you have more than two systems in the cluster, you must stop
HAD on all systems on either side of the network partition.
C
b If you physically unplugged cables, restore communications reconnecting
the LLT link cables.
lltstat -nvv
gabconfig -a
hastart
hastatus -sum
haconf -makerw
trainxx trainxx
Disk 1:___________________
Disk 3:___________________
nameDG1, nameDG2
The purpose of this lab is to set up I/O fencing in a two-node cluster and simulate
node and communication failures.
Brief instructions for this lab are located on the following page:
Lab 14 Synopsis: Configuring I/O Fencing, page A-66
Step-by-step instructions for this lab are located on the following page:
Lab 14: Configuring I/O Fencing, page B-111
C
UseFence cluster attribute SCSI3
vxdisksetup -i coor_disk1
vxdisksetup -i coor_disk2
vxdisksetup -i coor_disk3
b Display your cluster ID. Your cluster ID determines your coordinator disk
group name.
cat /etc/llttab
If your cluster ID is odd, use oddfendg for the disk group name.
If your cluster ID is even, use evenfendg for the disk group name.
vxfentsthdw -g testdg
C
3 Enter the coordinator disk group name in the /etc/vxfendg fencing
configuration file on each system in the cluster.
4 Start the fencing driver on each system using the vxfen init script.
/etc/init.d/vxfen start
5 Verify that the /etc/vxfentab file has been created on each system and it
contains a list of the coordinator disks.
cat /etc/vxfentab
gabconfig -a
c How many keys are present for each disk and why?
There should be A------- keys for LLT node 0 and B------- keys for LLT
node 1 on each coordinator disk for each path to that coordinator disk.
Example:
Device Name: /dev/rdsk/c1t9d0s2
Total Number Of Keys: 2
key[0]:
Key Value [Numeric Format]: 65,45,45,45,45,45,45,45
Key Value [Character Format]: A-------
key[1]:
Key Value [Numeric Format]: 66,45,45,45,45,45,45,45
Key Value [Character Format]: B-------
1 On each system, verify that you have a Storage Foundation Enterprise license
installed for fencing support using vxlicrep.
vxlicrep
2 Working together, verify that the cluster configuration is saved and closed.
C
3 Change to the VCS configuration directory.
cd /etc/VRTSvcs/conf/config
mkdir test
5 Copy the main.cf and types.cf files into the test subdirectory.
cd test
Partial Example:
# vi main.cf
cluster vcs (
UserNames = { admin = ElmElgLimHmmKumGlj }
ClusterAddress = "192.168.27.51"
Administrators = { admin }
CounterInterval = 5
UseFence = SCSI3
. . .
)
9 Stop VCS and shut down the applications. The disk groups must be reimported
for fencing to take effect.
hastop -all
10 Copy the main.cf file from the test subdirectory into the configuration
directory.
cp main.cf ../main.cf
11 Start the cluster from the system where you edited the configuration file.
hastart
12 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
hastart -stale
hastatus -summary
1 If the service groups with disk groups did not come online at cluster startup,
bring them online now. This will import the disk groups, which initiate fencing
on the data disks. Each student can perform these steps on their service groups.
hastatus -sum
hagrp -online nameSG1 -sys your_sys
hagrp -online nameSG2 -sys your_sys
There should be AVCS keys on LLT node 0 imported disk groups and
BVCS on LLT node 1 imported disk groups.
C
# vxfenadm -g /dev/rdsk/data_disk1
# vxfenadm -r /dev/rdsk/data_disk2
In most cases, the following sections require that you work together with your lab
partner to observe how fencing protects data in a variety of failure situations.
Steps you can perform on your own are indicated within the procedure.
This command should fail because the node where the disk group is not
imported does not have rights to write to the disk, and therefore cannot
import the disk group and update the private region header information.
The error message should say: VxVM vxdg ERROR V-5-1-587 Disk
group nameDG1: import failed: No valid disk found
containing disk group.
This indicates that data corruption from a possible concurrency violation
has been prevented.
1 Verify that the nameSG1 and nameSG2 service groups are online on your
system if two students are working on the cluster. If you are working alone,
ensure that you have a service group online on each system. This scenario
requires that disk groups be imported on each system. Switch them, if
necessary.
hastatus -sum
# vxdisk list
# vxfenadm -g /dev/rdsk/data_disk
# vxfenadm -r /dev/rdsk/data_disk
C
Reading SCSI Reservation Information...
4 Fail one of the systems by removing power or hard booting the system.
Observe the failure.
LLT and GAB should time out heartbeats from the failed system. The
remaining system should fence off the drive.
6 Verify that the service groups that were running on the failed system have
failed over to the remaining system.
hastatus -sum
7 Verify that the registrations and reservations on the data disks are now for the
remaining system.
# vxdisk list
# vxfenadm -g /dev/rdsk/data_disk
8 Boot the failed system and observe it rejoin cluster membership. Verify cluster
membership and verify that the coordinator disks have registrations for both
systems again.
C
gabconfig -a
vxfenadm -g all -f /etc/vxfentab
1 If you did not already perform this step in the Testing Communication
Failures lab, copy the lltlink_enable and lltlink_disable
utilities from the location provided by your instructor into the /tmp directory.
_____________________________________________________________
cd /tmp
haconf -makerw
4 Verify that the nameSG1 and nameSG2 service groups are online on your
system if two students are working on the cluster. If you are working alone,
ensure that you have a service group online on each system. This scenario
requires that one disk group be imported on each system. Switch the service
groups, if necessary.
hastatus -sum
6 Verify the registrations and reservations on the data disks for the disk groups
imported on each system.
vxdisk list
vxfenadm -g /dev/rdsk/data_disk
vxfenadm -g /dev/rdsk/data_disk
. . .
./lltlink_disable
lltstat -nvv
gabconfig -a
One side of the cluster should panic and reboot. When the rebooted
system is back up, VCS cannot start there because it cannot seed.
Only one systems keys are displayed on the coordinator disks. The other
keys have been rejected.
hastatus -sum
C
The service groups that were running on the system that rebooted have
failed over to the running system.
12 Verify that the registrations and reservations on the data disks are now for the
remaining system.
vxdisk list
vxfenadm -g /dev/rdsk/data_disk
. . .
Only one systems keys are shown on the data disks. The other keys have
been rejected.
13 When the system that rebooted is running, check the status of GAB and HAD.
gabconfig -a
14 Verify that the coordinator disks have registrations for the remaining system
only.
shutdown -y
b If you physically unplugged the Ethernet cables for the LLT links,
reconnect the cluster interconnects.
16 Verify that cluster membership has been established for both systems and both
systems are now registered with the coordinator disks.
gabconfig -a
17 Set the monitor interval for the NIC resource type to back to 60.
haconf -makerw
hatype -modify NIC MonitorInterval 60
haconf -dump -makero
hastop -all
/etc/init.d/vxfen stop
C
4 From one system, import and remove the coordinator disk group.
5 Use the offline configuration procedure to set the UseFence cluster attribute to
the value NONE in the main.cf file and restart the cluster with the new
configuration.
Note: You cannot set UseFence dynamically while VCS is running.
cd /etc/VRTSvcs/conf/config
cp main.cf test
c Edit the main.cf file in the test directory on one system in the cluster to
set the value of UseFence to NONE.
# vi main.cf
cluster vcs (
UserNames = { admin = ElmElgLimHmmKumGlj }
ClusterAddress = "192.168.27.51"
Administrators = { admin }
CounterInterval = 5
UseFence=NONE
. . .
)
cp main.cf ..
8 Start the cluster from the system where you edited the configuration file.
hastart
9 Start the cluster in the stale state on the other system in the cluster (where the
configuration was not edited).
hastart -stale
hastatus -summary
WAIT States
ADMIN_WAIT: This state can occur under these circumstances:
A .stale flag exists and the main.cf file has a syntax problem.
The system is in local build and receives a disk error while reading
main.cf.
The system is in remote build and the last running system fails.
CURRENT_DISCOVER_WAIT: The system has joined a cluster and its
configuration file is valid.
CURRENT_PEER_WAIT: The system has a valid configuration file and
another system is building a configuration from disk.
BUILD States
LOCAL_BUILD: The system is building a configuration from disk.
REMOTE_BUILD: The system is building a configuration from a peer.
EXITED
EXITING States
D
LEAVING: The system is leaving the cluster gracefully. When agents have
been stopped, the system transitions to the EXITING state.
EXITING: The system is leaving the cluster.
EXITED: The system has left the cluster.
EXITING_FORCIBLY: The hastop -local -force command has
caused the system to exit the cluster. Agents are stopped but applications
continue to run.
OTHER States
RUNNING: The system is an active member of the cluster.
FAULTED: The system is leaving the cluster unexpectedly (ungracefully).
INITING: The system has joined the cluster.
UNKNOWN: The system has no entry in the configuration and has not joined
the cluster.
online UP
offline
clean
DOWN UNKNOWN
fault
Y
Set Opt Attributes
N
Success? Check Logs/Fix
Add/Test Resource
Y
More? Link Resources
N
D
Use this procedure to create a service group.
Note: When you switch a service group to another system, keep the service group
running on that system for the duration of the OfflineMonitorInterval (the default
is five minutes) to ensure that the agents properly report all resources offline on
other systems.
Set Non-Critical
Clusters
Attributes for global service groups are Major The attributes ClusterList,
mismatched. AutoFailover, and Parallel are
(Global Cluster option) mismatched for the same global
service group on different clusters.
Remote cluster has faulted. Major The trap for this event includes
(Global Cluster option) information on how to take over the
global service groups running on the
remote cluster before the cluster
faulted.
D
Heartbeat is down. Warning The connector on the local cluster has
lost its heartbeat connection to the
remote cluster.
Remote cluster is in RUNNING state. Normal The local cluster has a complete
(Global Cluster option) snapshot of the remote cluster,
indicating the remote cluster is in the
RUNNING state.
User has logged on to VCS Information A user log on has been recognized
because a user logged on via Cluster
Manager, or because a haxxx
command was invoked.
Agents
Resource state is unknown Warning VCS cannot identify the state of the
resource.
Resource monitoring has timed out Warning The monitoring mechanism for the resource
has timed out.
Resource is not going offline Warning VCS cannot take the resource offline.
Cluster resource health is declined Warning This is used by agents to give additional
information on the state of a resource.
Health of the resource declined while it was
online.
Resource went online by itself Warning (not for The resource was brought online on its
first probe) own.
Resource is being restarted by agent Information The resource is being restarted by its agent.
Cluster resource health is improved Information This is used by agents to give extra
information about state of resource. Health
of the resource improved while it was
online.
Systems
VCS has exited manually. Information VCS has exited gracefully from one node on
which it was previously running.
VCS is up but is not in the cluster. Information VCS is running on one node but the node is
not visible.
Service group has a concurrency SevereError A failover service group has come online on
violation more than one node in the cluster.
Service group has faulted and cannot SevereError The specified service group has faulted on all
be failed over anywhere nodes where the group could be brought
online, and there are no nodes to which the
group can fail over.
Service group is autodisabled Information VCS has autodisabled the specified group
because one node exited the cluster.
Service group is being switched Information The service group is being taken offline on
one node and being brought online on another.
Description Monitors the configured NIC. If a network link fails, or if a problem arises with
the device card, the resource is marked OFFLINE. The NIC listed in the Device
attribute must have an administration IP address, which is the default IP address
assigned to the physical interface of a host on a network. This agent does not
configure network routes or administration IP addresses.
Entry Point MonitorTests the network card and network link. Pings the network hosts or
broadcast address of the interface to generate traffic on the network. Counts the
number of packets passing through the device before and after the address is
pinged. If the count decreases or remains the same, the resource is marked
OFFLINE.
Type Definition
type NIC (
static str ArgList[] = { Device, NetworkType,
NetworkHosts, PingOptimize }
NameRule = group.Name + "_" + resource.Device
static int OfflineMonitorInterval = 60
static str Operations = None
str Device
str NetworkType
int PingOptimize = 1
str NetworkHosts[]
)
D
Sample NIC Configurations
Sample 1: Without Network Hosts (Using Default Ping Mechanism)
NIC NIC_le0 (
Device = le0
PingOptimize = 1
)
Sample 2: With Network Hosts
NIC NIC_le0 (
Device = le0
NetworkHosts = { "166.93.2.1", "166.99.1.2" }
)
Description Brings online, takes offline, and monitors a file system mount point.
Entry Points OnlineMounts a block device on the directory. If the mount process fails, the agent
attempts to run the fsck command on the raw device to remount the block device.
OfflineUnmounts the file system.
MonitorDetermines if the file system is mounted. Checks mount status using the stat
and statvfs commands.
CleanSee description on the following pages.
InfoSee description on the following pages.
State Definitions ONLINEIndicates that the block device is mounted on the specified mount point
OFFLINEIndicates that the block device is not mounted on the specified mount point
UNKNOWNIndicates that a problem exists with the configuration
FsckOpt string-scalar Options for fsck command. "-y" or "-n" must be included as arguments
to fsck; otherwise, the resource cannot come online. VxFS file systems
will perform a log replay before a full fsck operation (enabled by "-y")
takes place. Refer to the manual page on the fsck command for more
information.
SnapUmount integer-scalar If set to 1, this attribute automatically unmounts VxFS snapshots when
the file system is unmounted.
Default is 0 (No).
D
Type Definition
type Mount (
static str ArgList[] = { MountPoint, BlockDevice, FSType,
MountOpt, FsckOpt, SnapUmount }
NameRule = resource.MountPoint
str MountPoint
str BlockDevice
str FSType
str MountOpt
str FsckOpt
)
Mount export1 (
MountPoint= "/export1"
BlockDevice = "/dev/dsk/c1t1d0s3"
FSType = "vxfs"
FsckOpt = "-n"
MountOpt = "ro"
)
PathName string-scalar Defines complete pathname to access an executable program. This path
includes the program name. If a process is controlled by a script, the
PathName defines the complete path to the shell.
Pathname must not exceed 80 characters.
Arguments string-scalar Passes arguments to the process. If a process is controlled by a script, the
script is passed as an argument. Multiple arguments must be separated by
a single space. A string cannot accommodate more than one space
between arguments, nor allow for leading or trailing whitespace
D
characters. Arguments must not exceed 80 characters (total).
Type Definition
type Process (
static str ArgList[] = { PathName, Arguments }
NameRule = resource.PathName
str PathName
str Arguments
)
Process usr_lib_sendmail (
PathName = "/usr/lib/sendmail"
Arguments = "bd q1h"
)
cluster ProcessCluster (
.
.
.
group ProcessGroup (
SystemList = { sysa, sysb }
AutoStartList = { sysa }
)
Process Process1 (
PathName = "/usr/local/bin/myprog"
Arguments = "arg1 arg2"
)
Process Process2 (
PathName = "/bin/csh"
Arguments = "/tmp/funscript/myscript"
)
D
eject another; ejecting is final and atomic.
In the VCS implementation, a node registers the same key for all paths to the
device. A single preempt and abort command ejects a node from all paths to the
storage device.
Several important concepts are summarized below:
Only a registered node can eject another.
Because a node registers the same key down each path, ejecting a single key
blocks all I/O paths from the node.
After a node is ejected, it has no key registered, and it cannot eject others.
The SCSI-3 PR specification describes the method to control access to disks with
the registration and reservation mechanism. The method to determine who can
register with a disk and who is eligible to eject another node is implementation-
specific.
Shared Storage
Volume Resources
Volume resources are not required. They provide additional monitoring; however,
in environments with many volumes, the additional overhead of monitoring all the
volumes may be undesirable.
File Systems
Ensure that all file systems controlled by VCS resources are set to manual control
in the operating system configuration files. The operating system should not
perform any automatic mounts or unmounts.
SANs/Arrays
Shared disks on a SAN must reside in the same zone as all of the nodes in the
cluster.
Data residing on shared storage should be mirrored or protected by a hardware-
based RAID mechanism.
Use redundant storage and paths.
Use multiple single-port HBAs or SCSI controllers rather than multiport
interfaces to avoid single points of failure.
Include all cluster-controlled data in your backup planning and
implementation. Periodically test restoration of critical data to ensure that the
data can be restored.
Critical Resources
During configuration, consider initially setting all resources to non-critical. This
prevents service groups from failing over if you make errors when setting up a new
resource. Then set all resources to critical, which should cause a service group to
fault and fail over in the event the resource faults.
D
Proxy Resources
If you have multiple service groups that use the same network interface, you can
reduce monitoring overhead by using Proxy resources instead of NIC resources. If
you have many NIC resources, consider using Proxy resources to minimize any
potential performance impacts of monitoring.
Outside Services
Minimize reliance on services that are not within control of the cluster to ensure
high availability for your applications. Consider:
Network name resolution services
NFS mounts
NIS
In addition, ensure that external resources, such as DNS and gateways, are highly
available.
Testing
Test services on each failover target system before putting them under VCS
control.
Create a test cluster for performing the initial implementation and testing any
changes.
Test all possible failure scenarios.
Create and execute an acceptance/solution test plan before deploying a
cluster in a production environment and when making any changes.
JumpStart Compliance
VCS 4.1 is compliant with Solaris JumpStart technology.
VCS Simulator
VCS Simulator is a tool for simulating any cluster configuration and determining
how service groups will behave during cluster or system faults. With the simulator,
you can designate and fine-tune configuration parameters, view state transitions,
and evaluate complex, multinode configurations. The tool is especially valuable
because it enables you to design and evaluate a specific configuration without test
clusters or changes to existing production configurations.
I/O Fencing
VCS 4.0 provides a new capability, called I/O fencing, to arbitrate cluster
membership and ensure data integrity in the event of communication failure
among cluster members. The I/O fencing kernel module uses SCSI-III Persistent
Reservations and designated coordinator disks, as described in the I/O Fencing
chapter of the VERITAS Cluster Server 4.0 Users Guide.
Fire Drill
Fire drill is a procedure for testing the fault readiness of a configuration. A fire
drill on a VCS-controlled application uses a separate fire drill service group that
contains a copy of the live applications resources. See the VERITAS Cluster
Server 4.0 Users Guide for more information.
Steward
The Steward mechanism minimizes chances of a wide-area split-brain in two-node
clusters. The steward process can run on any system outside of the clusters in a
Global Cluster configuration. See the VERITAS Cluster Server 4.0 Users Guide
for more information.
D
monitor cycle. See the VCS 4.0 Users Guide for more information.
New Attributes
Resource Type Attributes
ActionTimeout
FireDrill
InfoInterval
First system:
link
set-cluster
link
link
E
Cluster Configuration (main.cf)
Administrators
Optional Attributes
CounterInterval
System
Required Attributes
FailoverPolicy
SystemList
Optional Attributes
AutoStartList
OnlineRetryLimit
Resource Name
Resource Type
Required Attributes
Optional Attributes
Critical?
E
Enabled?
Resource Name
Resource Type
Required Attributes
Critical?
Enabled?
Resource Name
Resource Type
Required Attributes
Optional Attributes
Enabled?
Resource Name
Resource Type
Required Attributes
E
Optional Attributes
Critical?
Enabled?