Troubleshooting VRRP

The purpose of this article is to help in troubleshooting VRRP related issues on
NOkia Checkpoint Firewalls. One of the most common problems faced in Nokia VRRP
Implementations is that interfaces on active and standby firewalls go into the
master master state. THe main reason for this is because the individual vrids of
the master and backup firewall are not able to see the vrrp multicast requests
of each other.
The first step is to check the vrrp state of the interfaces. This is how you can
check that:
PrimaryFW-A[admin]# iclid
PrimaryFW-A> show vrrp
VRRP State
Flags: On
6 interface enabled
6 virtual routers configured
0 in Init state
0 in Backup state
6 in Master state
PrimaryFW-A>
PrimaryFW-A> exit
Bye.
PrimaryFW-A[admin]#
SecondaryFW-B[admin]# iclid
SecondaryFW-B> sh vrrp
VRRP State
Flags: On
6 interface enabled
0 in Init state
4 in Backup state
2 in Master state
SecondaryFW-B>
SecondaryFW-B> exit
Bye.
SecondaryFW-B[admin]#
In the example shown you see that 2 interfaces each from both firewalls are in t
he Master state.
The next step should involve running tcpdumps to see if the vrrp multicasts are
reaching the particular interface.
As the first troubleshooting measure, put a tcpdump on the problematic interface
of the master and backup firewalls. If you want to know what the problematic in
terface is, echo sh vrrp int | iclid should give you the answer. It is that interf
ace on the backup firewall which would be in a Master state.
PrimaryFW-A[admin]# tcpdump -i eth-s4p2c0 proto vrrp
tcpdump: listening on eth-s4p2c0
00:46:11.379961 O 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100 [t
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
When you put a tcpdump on the Primary Firewall, you see that the vrrp multicast
request is leaving the interface.
Next put the tcpdump on the secondary firewall.
SecondaryFW-B[admin]# tcpdump -i eth-s4p2c0 proto vrrp
00:19:38.507294 O 192.168.1.2 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 95 [to
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
Now you can see that the interface on both the primary and the secondary firewal
ls are broadcasting vrrp multicasts. This is because the vrrp multicasts are not
reaching the firewalls interfaces. This means there is a communication breakdow
n which can be possibly caused by network issues.
Once the network issue is resolved, communication would be possible and the inte
rface with the lower priority will go as the secondary or backup state.
Now let us discuss another scenario where there is a problem with the firewall i
nterfaces in Master Master state.
Again put a tcpdump on both the interfaces in question:
00:46:11.206994 I 10.10.10.1 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 95 [tos
0xc0]
os 0xc0]
0xc0]
os 0xc0]
0xc0]
os 0xc0]
0xc0]
os 0xc0]
0xc0]
os 0xc0]
s 0xc0]
00:19:38.630075 I 10.10.10.2 > 224.0.0.18: VRRPv2-adver 20: vrid 103 pri 100 [to
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
s 0xc0]
In the above example look at the vrid numbers of the incoming and outgoing packe
ts. From the vrids you see that that the vrids donot match. This is an indicatio
n that the cabling is not correct. The cables going to vrid 102 and 103 are not
connected correctly and they need to be swapped to fix this issue.
Swap the cables and the issue will be resolved. The firewall with the higher pri
ority will go into the Master state.
A properly functioning firewall will be like this:
PrimaryFW-A[admin]# iclid
PrimaryFW-A> sh vrrp
VRRP State
Flags: On
6 interface enabled
0 in Init state
0 in Backup state
6 in Master state
PrimaryFW-A> exit
Bye.
PrimaryFW-A[admin]#
SecondaryFW-B[admin]# iclid
SecondaryFW-B> sh vrrp
VRRP State
Flags: On
6 interface enabled
0 in Init state
6 in Backup state
0 in Master state
SecondaryFW-B> exit
Bye.
If you were to tcpdump the healthy interface, this is how it would look:
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
^C
97 packets received by filter
0 packets dropped by kernel
PrimaryFW-A[admin]#
18:26:07.415446 I 192.168.1.1 > 224.0.0.18: VRRPv2-adver 20: vrid 102 pri 100 [t
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
os 0xc0]
^C
VRRP Transitions can happen due to several causes:

The first (and most common) cause is that one or more of the monitored interface
s looses link state.
The next cause is due to network issues VRRP hello packets are not seen originat
ing from the master VRRP member on the backup.
The third cuase is that one of the Check Point critical devices fails to check-i
n its state to the Kernel within the specified timeout.

Solution

VRRP Transitions due to loss of link state

It is often difficult to determine if the VRRP transition has occured due to a l
oss of link state on one of the monitored interfaces. To isolate the failover ca
use to a link transition of one of the following interfaces do the following:
Gather switch statistics from the devices directly connected to the VRRP pair to
analyze whether or not you can determine if a link transition occurred.
Run following commands to determine what interface is loosing link state causing
the transition to occur.

(NOTE: This command shows Up to Down Transitions only. It will not increment whe
n the link state goes from Down to UP.)
ipso[admin]# clish -c show interfacemonitor
Interface Monitor
Interface eth1c0
Status up
Logical Name eth1c0
State PhysAvail,LinkAvail,Up,Broadcast,Multicast,AutoLink
MTU 1518
Up to Down Transitions 1
Interface eth2c0
Status up
Logical Name eth2c0
MTU 1518
Interface eth3c0
Status up
Logical Name eth3c0
MTU 1518
Interface eth4c0
Status down
Logical Name eth4c0
Interface loop0c0
Status up
Logical Name loop0c0
State PhysAvail,LinkAvail,Up,Loopback,Multicast
MTU 0
ipso[admin]# clish -c show vrrp interfaces
VRRP Interfaces
Interface eth1c0
Number of virtual routers: 1
Flags: MonitoredCircuitMode
Authentication: NoAuthentication
VRID 10
State: Master Time since transition
: 85236
BasePriority: 110 Effective Priority:
110
Master transitions: 3 Flags:

Advertisement interval: 1 Router Dead Interval:
3
VMAC Mode: VRRP VMAC:
00:00:5e:00:01:0a
Primary address: 10.207.159.5
Next advertisement:
Number of Addresses: 1
10.207.159.88
Monitored circuits
eth3c0 (priority 10)
Interface eth3c0
Number of virtual routers: 1
Flags: MonitoredCircuitMode
Authentication: NoAuthentication
VRID 10
State: Master Time since transition
: 85236
BasePriority: 110 Effective Priority:
110
Master transitions: 3 Flags:

Advertisement interval: 1 Router Dead Interval:
3
VMAC Mode: VRRP VMAC:
00:00:5e:00:01:0a
Primary address: 192.168.159.4
Next advertisement:
Number of Addresses: 1
192.168.159.88
Monitored circuits
eth1c0 (priority 10)

VRRP Transitions due to not recieving VRRP hello packets

In order to determine if VRRP hello packets are seen from the master on the back
up you will need to run tcpdump on each interface (configured for VRRP) looking
for the inbound hello packets.The following command will allow you to see all VR
RP hello packets:
ipso[admin]# tcpdump -vv -i eth1c0 proto vrrp
tcpdump: listening on eth1c0
18:18:20.605420 I 10.207.159.5 > 224.0.0.18: VRRPv2-adver 20: vrid 10 pri 110 in
t 1 sum 9684 naddrs 1 10.207.159.88 [tos 0xc0] (ttl 255, id 14906)

When analyzing the VRRP hello packet there are several things that need to be lo
oked at:
VRID make sure that the packets you are looking at belong to the VRID in questio
n.
pri this is the effective priority that is being announced to the other VRRP mem
ber

VRRP Transitions due to a failure of a Check Point Critical Device

VRRP will only monitor the state of the Check Point processes only if FW Monitori
ng is selected in the VRRP configuration. For troubleshooting purposes this can b
e disabled from Voyager to rule out a critical device failure. Nokia does not re
commend that customer run with this setting disabled in a production environment
.
A Check Point Critical Device is a process that is monitored by the cpha daemon.
These devices must report their state to the kernel within the timeout specifie
d. If the device fails to report its state to the kernel within the specified ti
meout the kernel will assume that there is a problem with the process and will f
orce a VRRP failover.
Note: When FW Monitoring is enabled on VRRP; any backward clock move will cause
fwd to go into problem state and as a result VRRP fail over will occur.
To obtain a list of the Check Point Critical Devices and timeouts run the follow
ing command:
ipso[admin]# cphaprob -i list
Built-in Devices:
Device Name: IPSO member status
Current state: OK
Registered Devices:
Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 102563 sec
Device Name: Filter
Timeout: none
Current state: OK
Time since last report: 102548 sec
Device Name: cphad
Timeout: 5 sec
Current state: OK
Time since last report: 0.2 sec
Device Name: fwd
Timeout: 5 sec
Current state: OK
Time since last report: 0.6 sec

To enable debugging (which will write an event to the messages file and console
upon a critical device failure) run the following commands:
ipso[admin]# ipsctl -w net:log:partner:status:debug 1

That will log to the console and to /var/log/messages. If you want to turn off:
ipso[admin]# ipsctl -w net:log:sink:console 0

After enabling debugging, analyze the /var/log/messages file and look for lines
containing noksr. The log event will look like the following:
Oct 12 18:55:28 IP650A [LOG_DEBUG] kernel: netlog:noksr_timeout .. Firewall-1/cp
had expired
Oct 12 18:55:28 IP650A [LOG_DEBUG] kernel: netlog:noksr_timeout .. Firewall-1/fw
d expired

Analyzing this information you will be able to determine exactly which critical
device has failed. You should then take a look at the timeout value for this cri
tical device to determine if the value is high enough.
In relatively high CPU usage situations failover may occur due to the critical d
evice not getting the CPU time required to check its state in with the kernel.
It is recommended to increase the parameter to 600 seconds if the machine is und
er heavy load.
If the above does not improve the situation, use the following command to comple
tely remove the FWD from the response list:
ipso[admin]# cphaprob -d fwd unregister

Take into consideration that this means that failover will not occur if the FWD
daemon crashes during normal operation.
To change a timeout value to a higher value use the following command:
ipso[admin]# cphaprob -d [device] -t [timeout] -s [state] -p register
Example:
ipso[admin]# cphaprob -d fwd -t 120 -s ok -p register
This command has registered the fwd process with the state OK and a timeout value
of 120 seconds.
(NOTE: this command will not survive a reboot so the commands will need to be a
dded to the fwstart script or rc.local with a 60 seconds sleep to make this pers
istant across reboots)
show vrrp interfaces

Detailed configuration of VRRP, including priority, hello interval, and VRID
clish -c show interfacemonitor
Displays interface transitions
cphaprob -i list
Displays Checkpoint critical processes and their timeouts.
To log critical process failures:
ipsctl -w net:log:partner:status:debug 1
That will log to the console and to /var/log/messages. If you want to turn off:
ipsctl -w net:log:sink:console 0
To change the timeout value of a monitored process:
cphaprob -d [device] -t [timeout] -s [state] -p register

Troubleshooting VRRP

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Troubleshooting VRRP

Hochgeladen von

Copyright:

Verfügbare Formate

The purpose of this article is to help in troubleshooting VRRP related issues on

VRRP Transitions can happen due to several causes:

show vrrp interfaces

Das könnte Ihnen auch gefallen