Beruflich Dokumente
Kultur Dokumente
Overview
Troubleshooting
Using Volume
Manager
Event
Notification
Service Group
Basics
Introduction
VCS_3.5_Solaris_R3.5_2002091
5
Cluster
Communication
Faults and
Failovers
Preparing
Resources
Terms
and
Concepts
Installing
Applications
Resources
and Agents
Installing
VCS
Managing
Cluster
Services
NFS
Resources
Using
Cluster
Manager
10-2
Objectives
After completing this lesson, you will be able to:
Describe how VCS responds to faults.
Implement failover policies.
Set limits and prerequisites.
Use system zones to control failover.
Control failover behavior using attributes.
Clear faults.
Probe resources.
Flush service groups.
Test failover.
VCS_3.5_Solaris_R3.5_2002091
5
10-3
Critical
online resource
in path?
10-4
Practice Exercise
Case
NonCritical
Offline
7
5
6
6,7
4,6
4,6,7
3
1
Resource 4 Faults
VCS_3.5_Solaris_R3.5_2002091
5
Taken
offline
due to
fault
Starts on
another
system
10-5
Practice Answers
5
3
1
NonCritical
Offline
Taken
offline
due to
fault
Starts on
another
system
6,7
All
6,7
All
6,7
4,6
6,7
All
4,6,7
6,7
All but 7
7
6
4
8
9
Resource 4 Fails
VCS_3.5_Solaris_R3.5_2002091
5
Case
10-6
Failover Policies
The AutoFailOver attribute indicates whether automatic
failover is enabled for the service group.
Default value is 1, enabled.
The FailOverPolicy attribute specifies how a target
system is selected:
PrioritySystem with the lowest priority number in the list is
selected (default).
RoundRobinSystem with the least number of active service
groups is selected.
LoadSystem with greatest available capacity is selected.
Example configuration:
hagrp modify group AutoFailOver 0
hagrp modify group FailOverPolicy Load
VCS_3.5_Solaris_R3.5_2002091
5
10-7
AP1
Svr1
Svr2
Svr3
10-8
Svr1
Svr2
VCS_3.5_Solaris_R3.5_2002091
5
Svr3
Svr4
10-9
Capacity
300
300
300
150
- Load
= Available
VCS_3.5_Solaris_R3.5_2002091
5
10-10
Determining Load
1.
2.
iPlanet requires
100 units of Load
iPlanet
100
Load
Capacity
-
Load
Sybase
125
Load
300
100
= Available 200
VCS_3.5_Solaris_R3.5_2002091
5
Sybase
requires
125
Oracle 8i
requires
150
Oracle 8i
150
Load
NFS1
75
Load
300
125
175
NFS shares
require 75
each
NFS2
75
Load
NFS3
75
Load
300
150 +75
75
150
75+75
0
10-11
Oracle 8i FAILS.
VCS brings Oracle 8i online on the server with 200
AvailableCapacity.
VCS recalculates AvailableCapacity based on new Load.
iPlanet
100
Load
Oracle 8i
150
Load
Capacity
-
Load
Sybase
125
Load
300
100
= Available 200
VCS_3.5_Solaris_R3.5_2002091
5
300
125
175
Oracle 8i
150
Load
NFS1
NFS2
75
Load
NFS3
75
Load
75
Load
300
150 +75
75
150
75+75
0
10-12
iPlanet
100
Load
Oracle 8i
150
Load
Capacity
-
Load
Sybase
125
Load
300
250
50
= Available
VCS_3.5_Solaris_R3.5_2002091
5
300
125
175
NFS2
75
Load
NFS1
NFS2
75
Load
NFS3
75
Load
75
Load
300
75
225
150
75+75
0
10-13
iPlanet
100
Load
Oracle 8i
150
Load
Capacity
-
Load
Sybase
125
Load
NFS3
75
Load
300
250
50
= Available
VCS_3.5_Solaris_R3.5_2002091
5
NFS2
75
Load
NFS1
NFS3
75
Load
75
Load
300
125
175
300
150
150
150
75+75
0
10-14
10-15
iPlanet
100
Load
Capacity
- DynLoad
Sybase
125
Load
300
90
= Available 210
VCS_3.5_Solaris_R3.5_2002091
5
40%
75%
Oracle 8i
150
Load
NFS1
75
Load
300
120
175
80%
Proces
s
40
Proces
Load
s
300
225
75
40
Load
100
80
20
10-16
System Svr4 (
Sybase
Oracle 8i
main.cf Capacity=100
125
150
LoadWarningLevel=90
Load
Load
NFS1
LoadTimeThreshold=600
)
75
Load
Capacity
- DynLoad
300
90
= Available 210
VCS_3.5_Solaris_R3.5_2002091
5
300
120
175
Proces
s
40
Proces
Load
s
300
225
75
40
Load
Srv4
100
80
20
10-17
System Limits
1.
Limits
4,512
4,512
4,512
1,128
- Prereq
= Current
VCS_3.5_Solaris_R3.5_2002091
5
10-18
2.
4,512
4,512
4,512
1,128
- Prereq
1,184
1,328
1,212
3,304
1,96
1,300
1,208
1,32
= Current
VCS_3.5_Solaris_R3.5_2002091
5
10-19
10-20
Failover Zones
Preferred failover zone for
database service group
sysa
Preferred failover
zone for Web service group
sysb
sysc
sysd
syse
sysf
Database
Web
The SystemList for both service groups includes all systems in the
cluster.
VCS_3.5_Solaris_R3.5_2002091
5
10-21
SystemZones Attribute
Used to define the preferred failover zones for each service
group.
If the service group is online in a system zone, it fails to other
systems in the same zone based on the FailOverPolicy, until
there are no systems available in that zone.
When there are no other systems for failover in the same zone,
VCS chooses a system in a new zone from the SystemList based
on the FailOverPolicy.
To define SystemZones:
Syntax:
hagrp modify group_name SystemZones \
sys1 zone# sys2 zone# sys zone#
Example:
hagrp modify OracleSG SystemZones sysa \
0 sysb 0 sysc 1 sysd 1 syse 1 sysf 1
VCS_3.5_Solaris_R3.5_2002091
5
10-22
ConfInterval
Determines the amount of time that a tolerance or restart
counter can be incremented
Default: 600 seconds
ToleranceLimit
Enables the monitor entry point to return OFFLINE several
times before the resource is declared FAULTED
Default: 0
VCS_3.5_Solaris_R3.5_2002091
5
10-23
Restart Example
RestartLimit=1
Resource to be restarted one time within
the ConfInterval timeframe
ConfInterval=180
Resource can be restarted once within a three
minute interval.
MonitorInterval=60 seconds (default value)
Resource is monitored every 60 seconds.
ConfInterval
Online
MonitorInterval
VCS_3.5_Solaris_R3.5_2002091
5
Online
Offline
Restart
Online
Offline
Faulted
10-24
Adjusting Monitoring
MonitorInterval:
Default value is 60 seconds for most resource
types.
Consider reducing to 10 or 20 seconds for testing.
Use caution when changing this value:
Load is increased on cluster systems.
Resources can fault if they cannot respond in the
interval specified.
OfflineMonitorInterval:
Default is 300 seconds for most resource types.
Consider reducing to 60 seconds for testing.
VCS_3.5_Solaris_R3.5_2002091
5
10-25
VCS_3.5_Solaris_R3.5_2002091
5
10-26
Preventing Failover
Frozen service group does not fail over when a critical
resource faults.
Service group must be unfrozen to enable fail over.
To freeze a service group:
hagrp -freeze service_group [-persistent]
A persistent freeze:
Requires the cluster configuration to be open
Remains in effect even if VCS stopped and restarted throughout
the cluster
VCS_3.5_Solaris_R3.5_2002091
5
10-27
Clearing Faults
Verify that the faulted resource is offline.
Fix the problem that caused the fault and clean
up any residual effects.
To clear a fault, type:
hares -clear resource_name [-sys system_name]
10-28
Probing Resources
Causes VCS to immediately monitor the
resource
To probe a resource, type:
hares probe resource_name sys system_name
VCS_3.5_Solaris_R3.5_2002091
5
10-29
10-30
Testing Failover
Use test resources, such as FileOnOff, when
applicable.
Set lower values for MonitorInterval,
OfflineMonitorInterval, and ConfInterval to detect
faults more quickly.
Manually online, offline, and switch the service group
among all systems.
Simulate failure of each resource in the service
group.
Simulate failover of the entire system.
VCS_3.5_Solaris_R3.5_2002091
5
10-31
Testing Examples
Force a resource to fault.
Reboot a system.
Halt and reboot a system.
Remove power from a system.
VCS_3.5_Solaris_R3.5_2002091
5
10-32
Summary
You should now be able to:
Describe how VCS responds to faults.
Implement failover policies.
Set limits and prerequisites.
Use system zones to control failover.
Control failover behavior using attributes.
Clear faults.
Probe resources.
Flush service groups.
Test failover.
VCS_3.5_Solaris_R3.5_2002091
5
10-33
Student Blue
BlueNFSSG
RedNFSSG
resfault
nofailover
sysoffline
VCS_3.5_Solaris_R3.5_2002091
5
Triggers
resfault
nofailover
sysoffline
10-34