Sie sind auf Seite 1von 2

ian.bruce@ie.ibm.

com
conormcnally@ie.ibm.com
#######################################
VCS findings
1. Issue started because of orb_IP resource failed out side VCS control.This is
a critical resource so SG "ORB_SG" started to fail-over.
2. During the offline of the SG I can see this issue about "orb_application"
3. Checking the Application_A.log file there are these messages that suggest we
have problems on application's monitor script.
3. stopping of orb_application has been failed because mount point was busy (/op
t/o3sis/implrep) - resource - orb_mnt.
4. Unix admin manually dismounted /opt/o3sis/implrep and cleared SG fault, but i
t was still showing STOPPING/PARTIAL.
5. We did hastop -local -force & hastart to clear the SG state.
5. SG ORB_SG was still throwing error saying each resource need to probed, even
after probing also it was not starting on node2. (Still saying not probed)
6. Finally started it on node1.
As per logs, resource "orb_IP" went down out side VCS control and tried to offli
ne and failover the whole "ORB_SG" but it wasn't possible because of the issue o
n "orb_application"
orb_application monitor script which failed to complete within expected time. (6
0sec)
blamsgappp02# hatype -display | grep Application | grep MonitorTimeout | grep -v
Fault
Application MonitorTimeout
60
blamsgappp02#
Application_A.log updated with "orb_application" monitor script failed with in t
he expected time (60sec), and that's the cause of the failure when VCS tried to
failover the whole SG, indeed it wasn't able to understand the "orb_application"
status, and that's the reason why "ORB_SG" was still in "STOPPING/PARTIAL" stat
us, "orb_mnt" and "orb_dg" resource weren't offline yet and that's why filesyste
m are still mounted.
==============
2015/07/22 18:11:34 VCS ERROR V-16-2-13028 Thread(63) Resource(orb_application)
- the last (60) invocations of the monitor procedure did not complete within the
expected time.
2015/07/22 18:16:00 VCS ERROR V-16-2-13027 Thread(4) Resource(orb_application) monitor procedure did not complete within the expected time.
2015/07/22 18:26:19 VCS INFO V-16-2-13026 Thread(9) Resource(orb_application) monitor procedure finished successfully after failing to complete within the exp
ected time for (6) consecutive times.
==================================
015/07/22 16:15:35 VCS ERROR V-16-2-13064 (blamsgappp02) Agent is calling clean
for resource(orb_application) because the resource is up even after offline comp
leted.
2015/07/22 16:16:35 VCS ERROR V-16-2-13077 (blamsgappp02) Agent is unable to off
line resource(orb_application). Administrative intervention may be required.
2015/07/22 16:20:07 VCS ERROR V-16-2-13027 (blamsgappp01) Resource(orb_applicati
on) - monitor procedure did not complete within the expected time.

If monitor script doesn't complete within expected time VCS is not able to under
stand if a resource is up, down, faulted or whatever and whereas you have depend
encies on this application's resource your SG wasn't able to failover.On node 2
it didn't start properly because at that time situation wasn't clear
2015/07/22 16:15:35 VCS ERROR V-16-2-13027 Thread(4) Resource(orb_application) monitor procedure did not complete within the expected time.
2015/07/22 16:15:35 VCS ERROR V-16-2-13064 Thread(4) Agent is calling clean for
resource(orb_application) because the resource is up even after offline complete
d.
2015/07/22 16:15:35 VCS ERROR V-16-2-13068 Thread(4) Resource(orb_application) clean completed successfully.
2015/07/22 16:16:34 VCS WARNING V-16-2-13139 Thread(2) Canceling thread (4)
2015/07/22 16:16:35 VCS ERROR V-16-2-13077 Thread(5) Agent is unable to offline
resource(orb_application). Administrative intervention may be required.
2015/07/22 16:18:34 VCS WARNING V-16-2-13139 Thread(2) Canceling thread (5)

orb_application ArgListValues
blamsgappp01 User
gram
1
/opt/o3sis/tools/S98o3sisORBstart.sh

o3sis

StartPro
StopPro

gram

/opt/o3sis/tools/S98o3sisORBstop.sh

ogram

""

CleanPr
2036903
/opt/o3sis/tools/o3sis_checkorb.sh

MonitorProgram 1

PidFile
s

rProcesses
00.200"

Monito
"/opt/o3sis/bin/o3sis_ORBdaemon -ORBipaddress 10.137.2