Sie sind auf Seite 1von 212

Operations and Maintenance Manual

IntelliGate™ Alarm Lists

This material describing TOMIA is for general information purposes and may be modified by TOMIA at any time without notice. The information disclosed in
this document is disclosed on an “as is” basis, and TOMIA shall not be liable for the accuracy or completeness thereof. This material is the proprietary and/or
confidential information of TOMIA, and may not be disclosed, copied or transferred to any third party without TOMIA's prior express consent. All company
and brand products and service names are trademarks or registered trademarks of their respective holders.
© Copyright 2002-2020 TOMIA. All rights reserved.
84 532 1611-V710-100 GLR - IntelliGate™ Alarm Lists 5-Jan-2020
Proprietary and Confidential

Important Notice
This document is delivered subject to the following conditions and restrictions:
* This document contains proprietary and/or confidential information belonging to any of Telarix Inc.,
Starhome Ltd., Starhome Mach GMBH, or Starhome Mach Sarl (together “TOMIA”).
* Any unauthorized reproduction (electronic or mechanical), use, or disclosure of this material, or any part
thereof, is strictly prohibited. The contents of this document or any part thereof may be used solely for
the purpose for which they are provided.
* This document is intended solely for the use of entities expressly authorized by TOMIA.
* Material describing TOMIA is for general information purposes and may be modified by TOMIA at any
time without notice.
* The text and graphics are for the purpose of illustration and reference only. The specifications on which
they are based are subject to change without notice.
* Corporate and individual names and data used in examples herein are fictitious unless otherwise noted.
* The information disclosed in this document is disclosed on an “as is” basis, and TOMIA shall not be liable
for the accuracy or completeness thereof.
* The system and the software described in this guide are furnished under a license and may be used or
copied only in accordance with the terms of this license.
* Copyright ©2002-2020 TOMIA. All rights reserved
All company and brand products and service names are trademarks or registered trademarks of their
respective holders.

Document name
O-M-IntelliGate Alarm Lists-with GLR-710-100.docx

Operations and Maintenance Manual Page 2


Proprietary and Confidential

Table of Contents
1. Introduction ............................................................................................................................... 7
1.1 Alarm Monitoring Agents ........................................................................................................... 8
1.2 List of System Units Referred to in this Document ............................................................... 8

2. Platform Agents and Alarms Overview ................................................................................ 9


2.1 Alarm Management .................................................................................................................... 9
2.2 Alarm Severities ........................................................................................................................ 11

3. System Fault Analysis ........................................................................................................... 12


3.1 Signaling Gateway Unit (SGU-C type) Faults ....................................................................... 12
3.2 Signaling Gateway Unit (SGU type) Faults ........................................................................... 13
3.2.1 MPU (Probe) Failure .................................................................................................. 15
3.3 Signaling Gateway Unit (SGU-U type) Faults ....................................................................... 16
3.4 Application Unit (APU) and Database Server Faults ........................................................... 18
3.4.1 APU Interface Components per Service ................................................................. 18
3.4.2 System Behavior ........................................................................................................ 19
3.4.3 Application Unit and Database Server Faults (SGU/SGU-C) ............................ 19
3.5 Provisioning Server Unit (PSU) Faults ................................................................................... 21
3.6 System LAN Failure (Redundant Configuration) ................................................................. 21
3.7 System Power Failures (HP Servers) ...................................................................................... 22
3.8 Generic Network Alarms .......................................................................................................... 23

4. HP-based Signaling Gateway (SGU type and SGU-C type) Alarms .............................. 24
4.1 CCS Platform Alarms ............................................................................................................... 25
4.2 CCS Call Control Infrastructure Alarms ............................................................................... 27

Operations and Maintenance Manual Page 3


Proprietary and Confidential

4.3 CCS MAP Alarms ....................................................................................................................... 37


4.3.1 CCS MAP Operating System Alarms ....................................................................... 37
4.3.2 CCS MAP Application Alarms .................................................................................. 40
4.4 CCS ISUP Call Control Alarms ................................................................................................ 43
4.4.1 CCS ISUP Call Control Operating System Alarms................................................ 43
4.4.2 CCS ISUP Call Control Application Alarms ........................................................... 46
4.5 CCS INAP CAMEL Alarms ........................................................................................................ 51
4.5.1 CCS INAP CAMEL Operating System Alarms ........................................................ 51
4.5.2 CCS INAP CAMEL Application Alarms ................................................................... 59
4.6 USSD CNF Alarms .................................................................................................................... 67
4.6.1 USSD CNF Application Alarms ................................................................................ 67
4.6.2 USSD CNF OS Alarms............................................................................................... 72
4.7 TCABRT CNF Alarms ................................................................................................................ 74
4.8 Preparing CCS Alarms Logs for Sending to GSOC .............................................................. 77

5. Ulticom Infrastructure Alarms ............................................................................................ 78

6. HP Servers ................................................................................................................................. 79
6.1 Introduction ............................................................................................................................... 79
6.2 HP Server Alarms – Generic .................................................................................................... 81
6.3 HP Server Alarms – Standard X733 ..................................................................................... 103

7. HP Open View Alarms........................................................................................................... 127

8. Remote Power Distribution Unit (RMPDU) Agents ....................................................... 130


8.1 Sentry AC-RMPDU Alarms .................................................................................................... 130
8.2 CyperPower RMPDU Alarms .................................................................................................. 132
8.3 DC-RMPDU Alarms (Sentry 4820-XL-8) .............................................................................. 133
8.4 X733-Compliant RMPDU Alarms ......................................................................................... 135

Operations and Maintenance Manual Page 4


Proprietary and Confidential

9. Cisco LAN Switch Unit (LSU) and Router Agent ........................................................... 137
9.1 Cisco System Alarms – Generic ............................................................................................ 137
9.2 Cisco System Alarms X733-Compliant ............................................................................... 144

10. 3-COM LAN Switch Unit (LSU) Alarms ............................................................................. 151

11. TOMIA Platform Agent Application Related Alarms .................................................... 153


11.1 Generating Alarms .................................................................................................................. 153
11.2 Application Server Hardware Capacity Alarms .................................................................. 155
11.3 Generic Platform Agent Alarms ............................................................................................ 156
11.4 Tomcat Alarms ........................................................................................................................ 158
11.5 Reporting Tool Offline Database Alarms ............................................................................. 159
11.5.1 Alarm Classification ................................................................................................ 159
11.6 HP Server Serviceguard (Linux Cluster) Alarms ................................................................ 159
11.7 HP Server Veritas (Linux Cluster) Alarms........................................................................... 160
11.8 General Backup Alarms ......................................................................................................... 161
11.8.1 HP Data Protector .................................................................................................... 161
11.8.2 Oracle ........................................................................................................................ 162
11.9 TimesTen Database Alarms................................................................................................... 164
11.10 GTP Gateway Alarms .............................................................................................................. 165

1. Application Agent Alarms ................................................................................................... 166


1.1 Generating Alarms .................................................................................................................. 166
1.2 Application Module Alarm Tables ........................................................................................ 170
1.2.1 Monitor Alarms ........................................................................................................ 170
1.2.2 GLR Alarms .............................................................................................................. 171
1.2.3 SIP Server Alarms .................................................................................................... 174

Operations and Maintenance Manual Page 5


Proprietary and Confidential

1.2.4 Billing Alarms ........................................................................................................... 175


1.2.5 Capture Module Alarms .......................................................................................... 176
1.2.6 CDI Module Alarms ................................................................................................. 176
1.2.7 CDR Module Alarms ................................................................................................ 177
1.2.8 Diameter Alarms ...................................................................................................... 178
1.2.9 Performance Alarms ................................................................................................ 179
1.2.10 Gateway Alarms ....................................................................................................... 180
1.2.11 Load Balancer Alarms ............................................................................................. 181
1.2.12 MAP Interface Alarms .............................................................................................. 182
1.2.13 MAP Probe Interface (MPI) Alarms ........................................................................ 183
1.2.14 Network Trigger Originator (NETO) Alarms ......................................................... 189
1.2.15 Notification Alarms .................................................................................................. 190
1.2.16 Service Broker (Proxy) Alarms ............................................................................... 194
1.2.17 TOMIA IN Adapter Alarms ...................................................................................... 195
1.2.18 UIR Alarms ................................................................................................................ 196
1.2.19 Service Module Infrastructure Alarms ................................................................. 197
1.2.20 GTP Agent Alarms .................................................................................................... 200

2. Network Interface Module System Log Files ................................................................. 202


2.1.1 FEP and MAU Error Log Structure ....................................................................... 202

3. Service Module System Log Files ..................................................................................... 203


3.1 Gateway Location Register Service Logging Environment ................................................ 203
3.1.1 Mechanism for Maintenance of Error and Alarm Log Files .............................. 204
3.1.2 Log File Configuration ............................................................................................. 205
3.1.3 Alarm Log (alarmer.txt) File.................................................................................... 206
3.1.4 Wrapper Log (wrapper.log) File .............................................................................. 208
3.1.5 Error Log (error_log.txt) File ................................................................................... 209
3.1.6 Redundancy Logs ..................................................................................................... 210
3.1.7 stack_log.txt .............................................................................................................. 211

Operations and Maintenance Manual Page 6


Proprietary and Confidential Introduction

1. Introduction
This Operations and Maintenance Manual document describes the IntelliGate monitoring agent architecture
and contains alarm lists for the application and platform alarms. It also describes the logging procedure for
the service module.
IntelliGate monitoring and fault analysis refers to the SNMP traps sent by third party equipment, and by
TOMIA servers.
An alarm indicates a fault or a problem that prevents an expected result. Fault management consists of
detecting, isolating and correcting the problem.
This document describes alarms generated by the following sources:
* Hardware (Equipment Alarms): Faults that occur in the operating system, such as CPU, disk space and
memory utilization
* Communication faults due to the malfunction of:
▪ IntelliGate external interfaces (E1, SS7, Ethernet, WAN, SMPP, FTP)
▪ IntelliGate internal communication (UCMP, Oracle SQLNET, FTP)
* Environmental Alarms: Faults that occur due to change of environmental parameters that might affect
system behavior (for example, temperature)
* Processing Errors: Failure during the processing of input/output/writing
* Database Problems: Failure due to problems such as not enough space
* Application Process Errors: Faults that are created by an application bug or a system configuration
problem
* Configuration Errors
Each alarm comprises two notifications; the first transmitted one detecting the fault (up), and the second
transmitted on clearing the fault (down).

Note: In the case of a system restart, all alarms are canceled.

Operations and Maintenance Manual Page 7


Proprietary and Confidential Introduction

1.1 Alarm Monitoring Agents


The TOMIA IntelliGate Service Mobility Platform (SMP) incorporates many monitoring agents. TOMIA provides
two of these:
* The TOMIA platform agent: The platform agent handles alarms from the different hardware elements
comprising the IntelliGate platform. (Refer to section 2, Platform Agents and Alarms Overview).
* The TOMIA application agent: This agent deals with the alarms at the application level. Section 1
describes how TOMIA handles the application agent alarms. (Refer to section 1, Application Agent
Alarms).

1.2 List of System Units Referred to in this Document


System Unit Full Name Functionality

APU Application Unit Application Server


APU-M Application Unit – MGU type Media Gateway server
APU-D Application Unit – Database type Database server
APU-T Application Unit – Telephony server type Telephony server
SGU Signaling Gateway Unit – CCS type Signaling Gateway
SGU-C Signaling Gateway Unit – Comverse type Signaling Relay
SGU-U Signaling Gateway Unit – Ulticom type Signaling Relay
PSU Provisioning Server Unit Provisioning Server
LSU LAN Switch Unit Network LAN Switch
Router Router Network Routing Module
RMPDU Remote Managed Power Distribution Unit Power Distribution Unit
MPU Mobility Probe Unit Mobility Probe

Operations and Maintenance Manual Page 8


Proprietary and Confidential Platform Agents and Alarms Overview

2. Platform Agents and Alarms Overview


The IntelliGate product is comprised of a number of the following servers and devices:
* APU (Application Unit)
* PSU (Provisioning Server Unit)
* SGU/SGU-C/SGU-U/SGU/S (Signaling Gateway Module)
* RMPDU (Remote Managed Power Distribution Unit)
* Router
* LSU (LAN switch Unit)
Each of the above listed devices is equipped with at least one monitoring agent that enables system
monitoring.

2.1 Alarm Management


The system raises the alarms, transmits them to the customer’s OSS (Operational Support System), and
displays them on the OSS monitor.
This document provides a description of alarms raised by the IntelliGate and the information needed to
analyze and handle the cause of the alarms.
Various monitoring agents generate the alarms. The following table contains a list of agents for each
server/device:

Table 1: Agents on Servers/Devices

Server or device Monitoring Agents Comments

HP-based APU * HP Insight Manager Agent for Servers


* TOMIA Application Agent
* TOMIA Platform Agent
HP-based SGU * HP Insight Manager Agent for Servers
* HP-based SGU (CCS) Monitoring Agent

Operations and Maintenance Manual Page 9


Proprietary and Confidential Platform Agents and Alarms Overview

Server or device Monitoring Agents Comments

HP-based SGU-U (Ulticom) * HP Insight Manager Agent for Servers


* Ulticom Agent
* TimesTen Agent
* FEP Application
HP-based PSU * HP Insight Manager Agent for Servers
* TOMIA Platform Agent
HP-based APU-D * HP Insight Manager Agent for Servers

Cisco Router * LAN Switch Unit and Router Agent


RMPDU AC * AC RMPDU Agent * Note: For systems using DC RMPDU instead of AC, a DC
RMPDU Agent is installed.

Operations and Maintenance Manual Page 10


Proprietary and Confidential Platform Agents and Alarms Overview

2.2 Alarm Severities


There are five levels of alarms, ranging in severity from a situation normal message such as the heart beat
message used to declare system normalcy, through a warning, and on to the critical alarm. The perceived
severity of each of these different levels is as follows:
* Normal - situation is normal, no action required
* Warning - unusual situation, no immediate action need be taken
* Minor alarms - indicate a fault or problem that does not require immediate handling
* Major alarms - indicate a problem with a process that has to be handled as soon as possible, though not
necessarily immediately
* Critical alarms - indicate a severe problem that needs handling immediately (for example, critical
downtime, in which a process falls and stays down). A critical alarm has to be handled and fixed as soon
as possible, as it interferes with the delivery of TOMIA services.

Operations and Maintenance Manual Page 11


Proprietary and Confidential System Fault Analysis

3. System Fault Analysis


This section describes the impact of critical system faults on network behavior in an IntelliGate, for the
following components:
* Signaling Gateway Unit (SGU-C type) Faults
* Signaling Gateway Unit (SGU type) Faults
* Signaling Gateway Unit (SGU-U type) Faults
* Application Unit (APU) and Database Server Faults
* Provisioning Server Unit (PSU) Faults
* System LAN Failure (Redundant Configuration)
* System Power Failures (HP Servers)
* Generic Network Alarms

3.1 Signaling Gateway Unit (SGU-C type) Faults


Table 2 describes SS7 network interface faults, for a system with SGU-C.

Table 2: SS7 Network Interface Faults (for System with SGU-C)

Fault Processing and Notification SS7 Network Behavior

MTPL1-3 Link fail SGU-C sends an SNMP trap reporting the problem. * Links/linksets are dropped.
* TSC activates alternate route.
MTP DPC not available SGU-C sends an SNMP trap reporting the problem. * TSC is updated that DPC is not available.
SCCP GTT IMSI does not match or MSIN SGU-C sends an SNMP trap reporting the problem. * The SGU-C uses default GTT rule. The GTT rule
does not match configuration enables SCCP parameter translation (i.e.
specific TT value can be used in this case).
* Message is recovered.
* Rest of incoming messages will not be locked.

Operations and Maintenance Manual Page 12


Proprietary and Confidential System Fault Analysis

Table 3 describes the SGU-C-type faults.

Table 3: Signaling Gateway Unit (SGU-C) Faults

Fault Processing and Notification SS7 Network Behavior

SGU-C stops running due to APU detects a failure and sends an SNMP trap notifying that there is no connection to the SGU- * SS7 links are down
critical hardware or OS fault C. * TSC activates alternate route
SGU-C Application terminated APU detects a failure and sends an SNMP trap notifying that there is no connection to the SGU- * SS7 links are down
abnormally C. * TSC activates alternate route
Normal SGU shutdown APU detects a failure and sends an SNMP trap notifying that there is no connection to the SGU- * SS7 links down
C. * TSC activates alternate route

3.2 Signaling Gateway Unit (SGU type) Faults


Table 4 describes the SGU-type faults.

Table 4: Signaling Gateway Unit (SGU) Faults

No. Fault Processing and Notification SS7 Network Behavior

1 SGU (non-redundant configuration) * Signaling link terminated (goes down). * SS7 links are down.
stops running due to critical hardware or * Application server sends an SNMP trap. * TSC (switch) stop routing messages toward the SGU.
OS fault; or SGM process terminated
abnormally.
2 Active SGU (redundant configuration) * Signaling link of active SGU goes down. * Some SS7 links are down.
stops running due to critical hardware or * Standby SGU detects active SGU failure and * TSC continues to work with the SGU using unaffected
OS fault; or SGM process terminated performs switchover. SS7 links (load share mechanism).
abnormally. * APU sends an SNMP trap.
* APU performs switchover to the new active
SGU.

Operations and Maintenance Manual Page 13


Proprietary and Confidential System Fault Analysis

No. Fault Processing and Notification SS7 Network Behavior

3 Active SGU and standby SGU * Signaling link terminated (goes down). * SS7 links are down.
(redundant configuration) stops running * APU sends an SNMP trap. * TSC (switch) stops routing messages to the SGU.
due to critical hardware or OS fault; or
SGU process terminated abnormally.
4 APU goes down. SGU sends SSP (Subsystem Number Prohibited) Upon receiving the SSP, the TSC (switch) stops routing
to the switch to indicate that its application has messages to the SGU.
failed (SSN=x).
5 APU re-initializes. SGU sends SSA (Subsystem Number Allowed) to Upon receiving the SSA, the TSC (switch) renews sending
the switch to indicate that its application is restored. the routing messages to the SGU.
(SSN=x).

Operations and Maintenance Manual Page 14


Proprietary and Confidential System Fault Analysis

3.2.1 MPU (Probe) Failure


Table 5 describes the Mobility Probe Unit faults.

Table 5: MPU (Probe) Failure (for System with SGU)

Fault Processing and Notification SS7 Network Behavior

MPU stops functioning * Probing cluster group failover from active APU to standby APU MAP traffic is not monitored during a
* Other applications do not failover. failover.
* DB does not failover.
* An alarm is sent.

Operations and Maintenance Manual Page 15


Proprietary and Confidential System Fault Analysis

3.3 Signaling Gateway Unit (SGU-U type) Faults


Table 6 describes the SGU-U server faults.

Table 6: Signaling Gateway Unit (SGU-U) Server Faults

No. Fault Processing and Notification SS7 Network Behavior

1 IRM application failed in active server. * SNMP trap is sent. * The service is not affected.
* IRM application is restarted in standby server.
2 IRM application failed in both servers. * SNMP trap is sent. * No Message to HLR should be sent to IRM.
* SSP (Subsystem Prohibited) on SSN 6 (HLR) * If message to HLR is received anyway, it is relayed to the SS7
is sent to the concerned point. network.
* Messages to SSN 7, 8,149 (VLR, MSC, SGSN) are relayed.
* If an inbound roamer included in subscriber DB, has
performed a second UL in the VPMN, MT calls and MT_SMS
to that inbound roamer will not be established until the next
update location procedure.
3 FEP module fails in one server. * SNMP trap is sent. * The service is not affected.
* All traffic is internally routed to the second
FEP module in the other server.
4 Both FEP modules are down. * SNMP trap is sent. * No MAP message is routed to the IRM platform.
* SSP messages (Subsystem Prohibited) on * MT calls and MT_SMS to inbound roamers included in the
SSN 6, 7, 8, 145, 149 (HLR, VLR, MSC, subscriber DB will not be established until the next update
GMLC, SGSN) are sent to the configured location procedure.
point.

5 Active SGU-U server stops running * SNMP trap is sent. * SS7 links connected to the failed server are down.
due to critical hardware or OS fault. * Standby SGU-U server becomes active. * Application works on 50% links capacity.
* The system is in non-redundant state until the server is
restored.
6 Active SGU-U server goes down. * Signalware cluster manager sends SNMP * SS7 links connected to the failed server are down.
trap. * Application works on 50% links capacity.
* Standby SGU-U server becomes active. * The system is in non-redundant state until the server is
restored.

Operations and Maintenance Manual Page 16


Proprietary and Confidential System Fault Analysis

No. Fault Processing and Notification SS7 Network Behavior

7 Both IRM servers fail. * SS7 links are deactivated.


* SS7 traffic is rerouted.
* MT calls and MT_SMS to inbound roamers included in the
subscriber DB will not be established until the next update
location procedure.

Operations and Maintenance Manual Page 17


Proprietary and Confidential System Fault Analysis

3.4 Application Unit (APU) and Database Server Faults

3.4.1 APU Interface Components per Service


Table 7 lists the network access components that connect to the APU per service.

Table 7: Access Components Connected to APU per Service

Component/Service GLR IPN ICA OVMD Sparx Roaming VHE RAF


Control

SGU-C - √ √ √ - - √ √
SGU-U √ - - - - - - -
SGU - - √ √ √ - √ √
Probe - - √ (opt) √ (opt) √ √ √ (opt) √ (opt)
SMSC - -- √ (opt) - √ √ √ (opt) √ (opt)

Operations and Maintenance Manual Page 18


Proprietary and Confidential System Fault Analysis

3.4.2 System Behavior


The IntelliGate can have a single non-redundant application server.
For redundant systems, the IntelliGate may have two APUs. In this case, the APUs implement an active hot-
standby redundancy architecture based on HP Serviceguard or Veritas cluster software.
Database servers implement active hot-standby redundancy architecture based on the Oracle cluster.
An SGU/SGU-C/SGU-U is configured to either:
* Send SSP to stop traffic (Subsystem Signaling Prohibited), followed by SSA (Subsystem Signaling Allowed)
on completion of failover to redundant server
OR
SSN is always on, with one of the following options:
▪ Relay: Update Location or IDP to external SCP
▪ Default Call Handling (DCH): IDP for Non-CAMEL subscriber, with one of the following behaviors
• Continue
• Release
• Abort

3.4.3 Application Unit and Database Server Faults (SGU/SGU-C)


Note: for SGU-U, APU/Database Server fault does not affect service, but the operator cannot access the provisioning
and reporting tools.
Table 8 describes APU and Database Server (Managed Roaming) faults, system processing, notification, and
the SS7 network impact.

Note: Specific application fault behavior is a function of the redundant configuration.

Operations and Maintenance Manual Page 19


Proprietary and Confidential System Fault Analysis

Table 8: Application Unit and Database Server Faults

Fault Processing and Notification Network Behavior

The active APU/Database Server stops running due to * The cluster detects the fault and activates the standby
critical hardware or OS faults. APU/Database Server.
* The activated APU/Database Server sends an SNMP
trap notifying that it is activated.
The application terminated abnormally. * Application watchdog tries to activate the application.
If it fails then:
* The cluster detects the fault and activates the standby Relay or send SSP/SSA (depending on
APU/Database Server.
configured service behavior). Refer to Section
* Activated APU/Database Server sends an SNMP trap 3.4.2.
notifying that it is activated.
Both the active and the hot- standby APUs/Database Service is down.
Servers are unavailable.
The active APU cannot communicate with any A failover occurs, activating the standby APU.
SGU/SGU-C.
The active Application Server restarts X times. A failover occurs, activating the standby APU.

Operations and Maintenance Manual Page 20


Proprietary and Confidential System Fault Analysis

3.5 Provisioning Server Unit (PSU) Faults


Table 9 describes the Provisioning Server Unit (PSU) faults, system processing, notification, and SS7 network
impact.

Table 9: Provisioning Server Unit (PSU) Faults

Fault Processing and Notification SS7 Network Behavior

PSU stops running due to critical hardware or OS Operator cannot access the Web provisioning tool. The service is not affected.
faults.
The provisioning application is terminated abnormally. * Application watchdog tries to activate the application. The service is not affected.
If it fails thenthe operator cannot access the Web provisioning
tool.

3.6 System LAN Failure (Redundant Configuration)


The system LAN is optionally based on a redundant system with two LAN Switch Units (LSUs). The switches
are connected using two physical fast Ethernet connections (hot standby).
Each server is connected to two switches in team LAN configuration.
Table 10 describes the LAN faults, system processing, notification, and SS7 network impact when a fault
occurs in a redundant system.

Table 10: System LAN Failure

Fault Processing and Notification SS7 Network Behavior

Disconnection of a server from the LSU * The LAN mechanism detects the fault and routes the traffic The service is not affected.
to the redundant LAN connection.
* LSU generates SNMP alarm.
Disconnection of cable connecting two LSUs * LSUs detect the problem and route the traffic to the standby The service is not affected.
port (cable).
* Two LAN switches generate SNMP alarm.

Operations and Maintenance Manual Page 21


Proprietary and Confidential System Fault Analysis

Fault Processing and Notification SS7 Network Behavior

LSU stops running due to critical hardware/software * The LAN mechanism detects the fault and routes the traffic The service is not affected.
problem. to the redundant LAN connection.
* The other switches route the information through the
alternative route in the ring.
* Two LSUs generate SNMP alarm.
APU cannot access the SGU/SGU-C * APU detects the failure and sends an SNMP trap notifying SGU-C/SGU relays the message on
that there is no connection to the SGU. timeout (configurable) until the
* The application tries to restore the connection. connection is restored.
Disconnection of cable connecting two LSUs * LSUs detect the problem and route the traffic to the standby The service is not affected.
port (cable).
* Both LSUs generate an SNMP alarm.

3.7 System Power Failures (HP Servers)


When one power source serves the system, each server has only one power supply (described in Table 11
below). A redundant system continues with uninterrupted service when one power source is down.

Table 11: System Power Failure

Fault Processing and Notification SS7 Network Behavior

Single server loses its active power * The system is down; no alarms are sent. SS7 links are down; TSC activates alternate
source route.

Operations and Maintenance Manual Page 22


Proprietary and Confidential System Fault Analysis

3.8 Generic Network Alarms


Table 12 lists and describes the generic network alarms.

Table 12: Generic Network Alarms

OID Trap Name Description Severity

.1.3.6.1.6.3.1.1.5.1 coldStart A coldStart trap signifies that the SNMPv1 entity, acting in an agent role, is reinitializing Warning
itself and that its configuration may have been altered.
.1.3.6.1.6.3.1.1.5.2 warmStart A warmStart trap signifies that the SNMPv1 entity, acting in an agent role, is reinitializing Warning
itself such that its configuration is unaltered.
.1.3.6.1.6.3.1.1.5.3 LinkDown A linkDown trap signifies that the SNMP entity, acting in an agent role, has detected that Critical
the ifOperStatus object for one of its communication links is about to enter the down state
from some other state (but not from the notPresent state). This other state is indicated by
the included valueof ifOperStatus.
.1.3.6.1.6.3.1.1.5.4 LinkUp A linkUp trap signifies that the SNMP entity, acting in an agent role, has detected that the Clear / Normal
ifOperStatus object for one of its communication links left the down state and transitioned
into some other state (but not into the notPresent state). This other state is indicated by
the included value of ifOperStatus.
.1.3.6.1.6.3.1.1.5.5 authenticationFailure An authenticationFailure trap signifies that the SNMPv1 entity, acting in an agent role, Warning
has received a protocol
message that is not properly authenticated. While all implementations of the SNMPv1
must be capable of generating
this trap, the snmpEnableAuthenTraps object indicates whether this trap will be
generated.

Operations and Maintenance Manual Page 23


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4. HP-based Signaling Gateway (SGU type and SGU-C type)


Alarms
The HP-based version of the Signaling Gateway (SGU) and Signaling Relay (SGU-C) is based on the Comverse
Call Control Server (CCS), which consists of three logical layers:
* Computing - provides the Linux operating system functionality
* Communication - provides the OMNI software infrastructure, which manages SS7 signaling
* Application - provides the CCS application software, which provides the operational integration with
other Comverse components such as the Multimedia Unit (MMU)
The tables in the following sections document the alarms that fill the requirements of the operator for
interfacing with the Operations Support Systems.
* CCS Platform Alarms
* CCS Call Control Infrastructure Alarms
* CCS MAP Alarms
* CCS ISUP Call Control Alarms
* CCS INAP CAMEL Alarms
* USSD CNF Alarms
* TCABRT CNF Alarms

Note: Informs that the CCS is down. On restoration of CCS activity (CCS up), the alarm clears in the CCS but no
SNMP trap is sent.

Operations and Maintenance Manual Page 24


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.1 CCS Platform Alarms


Table 13 lists the CCS Platform Alarms and includes alarms for the following:
* CC.4.4.02
Table 13: Platform Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

4230001 408B71 Major NTP daemon is not N/A This alarm is generated when the NTP NTP service will Start the NTP
running. daemon is not started. not run, and time service by doing
drift between one of the following:
machines clocks * For Linux, type:
might occur. service ntpd start
* For Solaris,
type: svcadm
enable ntp
4230002 408B72 Minor ntpd configuration file N/A Alarm is generated when one of the NTP service will Add the correct
does not contain the following occurs: not work after the server entries to the
synchronized server, * NTPD is synchronized to a server next reboot. ntp.conf file.
or does not contain that does not appear in its Then restart the
any servers at all. configuration file. NTP service.
* NTPD does not contain any servers
at all.

Operations and Maintenance Manual Page 25


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

4230003 408B73 Major NTPD is running, but N/A Alarm is generated when one of the NTP service will 1. Ensure that the
not synchronized with following occurs: not work properly, servers have
server. * NTP configuration error. and time drift connectivity (via
* Network problem. between servers ping), and that
might occur. there are no
firewalls blocking
the NTP traffic.
2. Ensure that the
configuration is
correct.
3. Ensure that the
hostname and IP
addresses in the
/etc/hosts
/etc/ntp.conf files
are correct.

Operations and Maintenance Manual Page 26


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.2 CCS Call Control Infrastructure Alarms


Table 14 lists the CCS Call Control Infrastructure Alarms and includes alarms for the following:
* CC.7.5.1.1
* CC.8.0.0

Note: In the table, alarm IDs are listed with the relevant CC.
Table 14: CCS Call Control Infrastructure Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1418001 15A311 Major RSET <RSET FAIL Alarm is generated when Service will be 1. Check that all relevant link cables are
(CC_7.5.1. name> a route set (RSET) failure down for the plugged in and are undamaged.
1) (DPC=<DPC occurs. specific 2. Check that the OMNI application is
* 1418001 value>) LN RSET.
Alarm is cleared when running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) RSET status changes to 3. If problem persists, provide GSOC with
[<description>] "in service". CCS logs. Refer to Section 4.6
* 1418101 15A375 * Major RSET <RSET * Major = * Alarm is generated Application 1. Check that all relevant link cables are
(CC_7.5.1. * Minor name> Level 3 when level of RSET will handle plugged in and are undamaged.
1) * Warnin (DPC=<DPC * Minor = congestion exceeds only part of 2. Check that the OMNI application is
* 1418101 g value>) LN Level 2 threshold. the messages. running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) * Warning * Alarm is cleared either
= Level when congestion level 3. If problem persists, provide GSOC with
CONGESTED
1 drops below threshold CCS logs. Refer to Section 4.6
(LEVEL=<number
>) or when no congestion
messages are
displayed for RSET.
Alarm is not supported for
J7 protocol.

Operations and Maintenance Manual Page 27


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1418201 15A3D Major LNK <link name> FAIL Alarm is generated when If other LNKs 1. Check that all relevant link cables are
(CC_7.5.1. 9 LN (<logical an SS7 link (LNK) fails. exist in RSET, plugged in and are undamaged.
1) name>) all traffic will
Alarm is cleared when the 2. Check that the OMNI application is
* 1418201 [<description>] be rerouted to
LNK status changes to “in running on all CEs in the cluster.
(CC_8.0.0) them.
service”. 3. If problem persists, provide GSOC with
Otherwise, CCS logs. Refer to Section 4.6
RSET will fail.
* 1418401 15A4A Major RPC <RPC value> PROH * Alarm is generated Service will be 1. Check that all relevant link cables are
(CC_7.5.1. 1 LN (<logical when a remote point down for a plugged in and are undamaged.
1) name>) code (RPC) is specific 2. Check that the OMNI application is
* 1418401 [<description>] prohibited. RSSN, RPC, running on all CEs in the cluster.
(CC_8.0.0) * Alarm is cleared when and RSET.
RPC status changes to 3. If problem persists, provide GSOC with
"in service". CCS logs. Refer to Section 4.6
Alarm is not supported for
J7 protocol.
* 1418601 15A569 Major RSSN <RSSN PROH * Alarm is generated Service will be 1. Check that all relevant link cables are
(CC_7.5.1. name> when a remote down for a plugged in and are undamaged.
1) (RPC=<RPC subsystem number specific 2. Check that the OMNI application is
* 1418601 value>) LN (RSSN) is prohibited. RSSN, RPC, running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) * Alarm is cleared when and RSET.
the RSSN status 3. If problem persists, provide GSOC with
[<description>]
becomes "in service". CCS logs. Refer to Section 4.6
This alarm is not
supported for the J7
protocol.

Operations and Maintenance Manual Page 28


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1418701 15A5C Major LSSN <LSSN PROH * Alarm is generated Service will be 1. Check that all relevant link cables are
(CC_7.5.1. D name> LN when a local down for the plugged in and are undamaged.
1) (<logical name>) subsystem number specific LSSN. 2. Check that the OMNI application is
* 1418701 [<description>] (LSSN) is prohibited.
running on all CEs in the cluster.
(CC_8.0.0) * Alarm is cleared when
the LSSN status 3. If problem persists, provide GSOC with
becomes "in service". CCS logs. Refer to Section 4.6
This alarm is not
supported for the J7
protocol.
* 1418902 15A696 Major ASSOC FAIL * Alarm is generated If other 1. Check that all relevant link cables are
(CC_7.5.1. <association when an SCTP ASSOCs exist plugged in and are undamaged.
1) name> LN association (ASSOC) in the ASET, 2. Check that the OMNI application is
* 1418902 (<logical name>) fails. all traffic will running on all CEs in the cluster.
(CC_8.0.0) [<description>] * Alarm is cleared when be rerouted to
the ASSOC status 3. Check that all local and remote
them.
becomes "in service". association IPs are available (via
Otherwise, the
ping).
ASET will fail.
4. If problem persists, provide GSOC with
CCS logs. Refer to Section 4.6
1418903 15A697 * Major ASSOC * Major = This alarm is generated If either all 1. Check that all relevant link cables are
(CC_7.5.1. * Minor <association 70 when the local or remote local or all plugged in and are undamaged.
1) * Warnin name> LN * Minor = IP addresses of an SCTP remote IP 2. Check that the OMNI application is
* 1418903 g (<logical name>) 45 association (ASSOC) fail. addresses of running on all CEs in the cluster.
(CC_8.0.0) HAS (<percent>) * Warning any SCTP
= 20 3. If problem persists, provide GSOC with
UNAVAILABLE It is cleared when all association
CCS logs. Refer to Section 4.6.
ADDRESSE(S) SCTP association IP (ASSOC) are
(<list of addresses, both local and unavailable,
unavailable remote, become this ASSOC
addresse(s)>) accessible. will fail.

Operations and Maintenance Manual Page 29


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1418904 15A698 Major LOC IP <ip> INAC This alarm is generated If either all 1. Check that all relevant link cables are
(CC_7.5.1. ASSOC when a local IP address of local or all plugged in and are undamaged.
1) (<association an SCTP association remote IP 2. Check that the OMNI application is
* 1418904 name) LN (ASSOC) fails. addresses of running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) any SCTP
3. If problem persists, provide GSOC with
[<description>] It is cleared when it association
CCS logs. Refer to Section 4.6.
becomes available. (ASSOC) are
unavailable,
this ASSOC
will fail.
* 1418905 15A699 Major REM IP <ip> INAC This alarm is generated If either all 1. Check that all relevant link cables are
(CC_7.5.1. ASSOC when a remote IP address local or all plugged in and are undamaged.
1) (<association of an SCTP association remote IP 2. Check that the OMNI application is
* 1418905 name) LN (ASSOC) fails. addresses of running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) any SCTP
3. If problem persists, provide GSOC with
[<description>] It is cleared when it association
CCS logs. Refer to Section 4.6.
becomes available. (ASSOC) are
unavailable,
this ASSOC
will fail.
* 1418952 15A6C Major ASET <ASET FAIL * Alarm is generated Service will be 1. Check that all relevant link cables are
(CC_7.5.1. 8 name> when an SCTP down for the plugged in and are undamaged.
1) (DPC=<DPC association set (ASET) specific ASET. 2. Check that the OMNI application is
* 1418952 value>) LN fails.
running on all CEs in the cluster.
(CC_8.0.0) (<logical name>) * Alarm is cleared when
the ASET status 3. Check that all local and remote
[<description>]
becomes "in service". association IPs are available (via
ping).
4. If problem persists, provide GSOC with
CCS logs. Refer to Section 4.6

Operations and Maintenance Manual Page 30


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1465330 165BF2 Minor Peer application is One Alarm is generated when When both the 1. Check that CCS application is running
(CC_7.5.1. not accessible via occurrence the Peer application is not 1465330 and on all CEs in the cluster.
1) SLAN accessible via SLAN event 1465331 2. Type: slandisp. If the command returns
* 1465330 prints once during the alarms occur, failed statuses, check the physical
(CC_8.0.0) predefined interval. the system will SLAN connection via the COM ports.
switch over.
3. Verify that the SLAN configuration in
the /home/omni/conf/slan.cf file and
the physical SLAN connection match.
4. Send the following data to GSOC
▪ slandisp output on all CEs
▪ /home/omni/conf/slan.cf file from
all CEs
▪ Description of the physical SLAN
connection
▪ Access information

Operations and Maintenance Manual Page 31


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1465331 165BF3 Major Peer application is Two Alarm is generated when Partial service 1. Check that application is running on all
(CC_7.5.1. not accessible via occurrence the Peer application is not loss might CEs in the cluster and that each CE is
1) UDP LAN s accessible via UDP LAN occur. accessible (via ping and SSH) from
* 1465331 event prints twice during each CE.
(CC_8.0.0) the predefined interval. When both the 2. Type: ethtool <device>
1465330 and
3. Check that all relevant devices are
1465331
configured as follows:
alarms occur,
the system will ▪ Speed: 100 Mbps
switch over. ▪ Duplex: Full
▪ Auto-negotiation: On
4. Check that network cables are
physically connected.
5. If problem cannot be isolated, provide
GSOC with the following data:
▪ ifconfig output on all CEs
▪ ethtool <device> output for all
devices (eth0, etc) on all CEs
▪ Description of the physical LAN
connection
▪ Access information
* 1465340 165BF Critical CONGESTION One Alarm is generated when The 1. Check that all relevant link cables are
(CC_7.5.1. C DETECTED IN occurrence the CONGESTION application will plugged in and are undamaged.
1) THE SYSTEM DETECTED IN THE handle only 2. Check that the OMNI application is
* 1465331 SYSTEM event prints part of the running on all CEs in the cluster.
(CC_8.0.0) once during the messages.
3. If this problem persists, provide GSOC
predefined interval.
with CCS logs. Refer to Section 4.6.
Alarm is supported for the
A7 protocol only.

Operations and Maintenance Manual Page 32


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 1469110 166AB Major CCS <string> FAIL * Alarm generated when CCS 1. Wait two to three minutes for OMNI
(CC_7.5.1. 6 SERVICE OMNI application goes applications Autorestart.
Note: The
1) down. will be out of
severity 2. If the OMNI application is still down,
* 1469110 * Alarm clears in CCS service.
for this reboot it.
(CC_8.0.0) when OMNI application
alarm can 3. Send the /home/omni/<CE
goes up, but no SNMP
be either name>/tmp/Event.201.txt.<date>.<ind
trap sent.
Major or ex> files and access information to
Critical, GSOC.
according
to the site
specific
settings.
* 4085021 3E551 Major PARTITION 0.9 * Alarm generated when Service ability 1. Check the actual partition size by
(CC_7.5.1. D <string> partition full space can be typing: df -k
1) EXCEEDED (UNIQ) meets or reduced and 2. Find large files by typing: find
* 4085021 <number> exceeds threshold. the system <problematic partition or /> -name "*" -
(CC_8.0.0) PERCENT * Alarm cleared when can freeze. size +50000
partition full space is
below threshold. 3. Send the output of these commands
and the access information to GSOC.
Alarm is not supported for the J7
protocol.

Operations and Maintenance Manual Page 33


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 4087010 3E5CE * Major NTP daemon does * Major = * Alarm generated when Online time 1. Log in to the CCS as user root via
(CC_7.5.1. 2 * Warnin not run 0 process failure synchronizatio SSH.
1) g instance detected. n will not 2. Restart the ntpd daemon by typing:
* 4087010 s * Alarm cleared when work. service ntpd restart
(CC_8.0.0) * Warning process instance in
= 0.5 Linux is equal to 1. 3. Wait five minutes and check that the
instance service is functioning by typing: ntpq
Alarm not supported for J7
s –p
protocol.
The output should show the table of
configured time servers and peers
with their statuses
4. If errors are found, send the following
information to GSOC:
▪ Command output
▪ /etc/ntp/keys, /etc/ntp/step-tickers,
/etc/ntp.conf, and /etc/hosts files
▪ /var/log/messages file
▪ Access information

Operations and Maintenance Manual Page 34


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 4087000 3E5D3 * Major CE TIME DIFF * Major = * Alarm generated when Time 1. Check the ntp synchronization status
(CC_7.5.1. C * Minor WAS <number> 900 sec time difference synchronizatio by typing: ntpq –p
1) SEC * Minor = between targets is n between 2. Log in to the CCS as user root via
* 4087000 600 sec. above threshold. peer CCSs in SSH.
(CC_8.0.0) * Alarm cleared when the cluster will
difference is below 3. Try to restart ntpd on the
be lost.
threshold. nonsynchronized CE as user root by
typing: service ntpd restart
Alarm not supported for J7
protocol. 4. Wait two to three minutes and check
that the service is functioning by
typing: ntpq –p
The output must show the table of
configured servers and peers with
their statuses.
5. If errors are found, provide the
following information to GSOC
▪ Command output
▪ /etc/ntp/keys, /etc/ntp/step-tickers,
/etc/ntp.conf, and /etc/hosts files
▪ /var/log/messages file
▪ Access information

Operations and Maintenance Manual Page 35


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Message Text Threshold Possible Causes System/ Action
Service
Impact

* 4087504 3E5ED Major Process L3MTP 1900 * Alarm generated when Message loss Provide GSOC with CCS logs. Refer to
(CC_7.5.1. 0 overloaded messages number of messages in might occur. Section 4.6
1) in queue the
* 4087504 <$NODENAME>_L3M
(CC_8.0.0) TP processes queue
exceeds threshold
during predefined
interval.
* Alarm cleared when
number of messages in
queue is below
threshold during
predefined interval.

Operations and Maintenance Manual Page 36


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.3 CCS MAP Alarms


* CCS MAP Operating System Alarms
* CCS MAP Application Alarms

4.3.1 CCS MAP Operating System Alarms


Table 15 lists the CCS MAP Operating System Alarms and includes the CNF-GSM alarms for the following:
* CNF.0181.0
* CNF.0183.0
* CNF.0185.X
* CNF.0198.0
* CNF 0203.1 (CNF-GSM)

Note: In the table, alarm IDs are listed with the relevant CNF.

Operations and Maintenance Manual Page 37


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Table 15: CCS MAP Operating System Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 4080120 3E41F8 Major PROCESS N/A Alarm is generated Loss of redundancy (at the Proceed as follows:
(CNF.0203.1) GSM- when a process with the first occurrence) or loss of 1. Log in to the
(CNF-GSM) <Instance logical name GSM- service (when triggered on system as user
Number> <Release Number>-1 all CEs) will occur. omni.
HAS 0 <instance number> is
2. Type: tar czvf
INSTANCES terminated because of
omni_evt.tar.gz
an unexpected process
/home/omni/$(h
failure
ostname)/tmp/E
vent*txt*
3. Type: tar czvf
utu_runtime.tar.
gz
/home/utu/GSM/
logs/runtime/*
4. Send the
omni_evt.tar.gz
and
utu_runtime.tar.
gz files to
GSOC for
analysis.

Operations and Maintenance Manual Page 38


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 4080121 (CNF- 3E41F9 * Critical PROCESS * Critical = Alarm is generated Partial service loss might Call GSOC
GSM) * Major GSM- 1200000 when the memory occur when critical severity
* * Minor <Instance KB consumption of a is reached.
* Warning Number> * Major = process with the logical
HAS 1000000 name GSM- <instance
<memory KB number> exceeds one
size of * Minor = of the thresholds
process in 950000 because of an
KB> SIZE KB unexpected process
* Warning = memory problem.
900000
KB

Operations and Maintenance Manual Page 39


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.3.2 CCS MAP Application Alarms


Note: In the table, alarm IDs are listed with the relevant GSM.
Table 16 lists the CCS MAP Application Alarms and includes CNF-GSM alarms.

Note: In the table, alarm IDs are listed with the relevant GSM.
Table 16: CCS MAP Application Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1470300 166F5C * Critical Session * Critical = Alarm is generated when Partial service loss might occur Proceed as follows:
(CNF- * Major table is 95% SS7 traffic is higher than when critical severity is 1. Log in to the system
GSM) * Minor overloaded. * Major = expected or average reached. as user omni.
* Warning 91% TCAP session duration is
* Minor = 2. Type: cd /home/omni
longer than the system is
88% targets. 3. Type: tar czvf
* Warning = evt.tar.gz
85% /home/utu/GSM/stati
stics/*EventCounter
Dump*
4. Type:
/home/utu/<TCAP
Release>/bin/edbo
md GSM-<Instance
Number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:
q
7. Send the evt.tar.gz
and edbomd.txt files
to GSOC for
analysis

Operations and Maintenance Manual Page 40


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1470301 166F5D Major IMSI does N/A Alarm is generated when Might imply incorrect routing Call GSOC
(CNF- not match. the process with specified configuration, or message
GSM) logical name cannot route received from an unexpected
a message because the source.
IMSI does not exist in the
process routing tables.
* 1470302 166F5E Major E.214 does N/A Alarm is generated when Might imply incorrect routing Call GSOC
(CNF- not match. process with specified configuration, or message
GSM) logical name cannot route received from an unexpected
message by E.214, source.
because E.214 does not
exist in the process
routing tables.
* 1470303 166F5F * Critical A capacity * Critical = Alarm is generated when Partial service loss might occur Proceed as follows:
(CNF- * Major problem 6000 SS7 traffic is higher than when critical severity is 1. Log in to the system
GSM) * Minor has (100% system is targets. reached. as user omni.
occurred. allowed in
30 2. Type: cd /home/omni
seconds) 3. Type: tar czvf
* Major = evt.tar.gz
5400 /home/utu/GSM/stati
(90% stics/*EventCounter
allowed in Dump*
30 4. Send the evt.tar.gz
seconds) and edbomd.txt files
* Minor = to GSOC for
4800 analysis
(80%
allowed in
30
seconds)

Operations and Maintenance Manual Page 41


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1470304 166F60 * Critical Threshold * Critical = Alarm is generated when Partial service loss might occur Proceed as follows:
(CNF- * Major for 6000 SS7 traffic is higher than when critical severity is 1. Log in to the system
GSM) * Minor discarded (100% system is targets. reached. as user omni.
MSU's has allowed in
30 2. Type: cd /home/omni
been
exceeded seconds) 3. Type: tar czvf
* Major = evt.tar.gz
5400 /home/utu/GSM/stati
(90% stics/*EventCounter
allowed in Dump*
30 4. Send the evt.tar.gz
seconds) and edbomd.txt files
* Minor = to GSOC for
4800 analysis.
(80%
allowed in
30
seconds)

Operations and Maintenance Manual Page 42


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.4 CCS ISUP Call Control Alarms


* CCS ISUP Call Control Operating System Alarms
* CCS ISUP Call Control Application Alarms

4.4.1 CCS ISUP Call Control Operating System Alarms


Table 17 lists the CCS ISUP Call Control Operating System Alarms and includes alarms for the following:
* CC_7.5.1.1

Note: In the table, alarm IDs are listed with the relevant CNF.
Table 17: CCS ISUP Call Control Operating System Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

4087004 3E5CDC * Major CCS logger * Major = Three Alarm is generated Depending on the Call GSOC
(CNF.7.5.1.1) * Minor (logd) was instances when one or more traffic level, service
* Warning opened * Minor = Two instances of logd functionality might
instances utility are running. be reduced.
* Warning = One
If the logd utility
instance
runs for a long time
on a high-load
system, the system
might freeze or the
application might
fail, which could
cause full or partial
loss of service.

Operations and Maintenance Manual Page 43


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

4087005 3E5CDD * Major PROCESS * Major = 0 * Alarm is generated If problem persists 1. Wait three to five
(CNF.7.5.1.1) * Warning SIGH instances when a SIGH on one CE, it might minutes for the
FAILED * Warning = 0.5 process failure is cause a switchover. alarm to clear: If
instances detected. the alarm does not
If problem occurs
* Alarm is cleared clear, reboot the
on all CEs, the
when at least one CCS.
application might
instance of the
fail, which could 2. If problem persists,
process is running.
cause full or partial provide CCS logs
loss of service. to GSOC. Refer to
Section 4.6
4087006 3E5CDE * Major PROCESS * Major = 0 * Alarm is generated * If the problem Proceed as follows:
(CNF.7.5.1.1) * Warning LAN FAILED instances when a LAN persists on one
1. Wait three to five
* Warning = 0.5 process failure is CE, it might
minutes for the
instances detected. cause a
alarm to clear: If
* Alarm is cleared switchover.
the alarm does not
when at least one * If it occurs on all
clear, reboot the
instance of the CEs, the
CCS.
process is running. application might
fail, which could 2. If problem persists,
cause full or provide CCS logs
partial loss of to GSOC. Refer to
service. Section 4.6

Operations and Maintenance Manual Page 44


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

4087501 3E5ECD Major Process 1900 messages in * Alarm is generated Message loss might Proceed as follows:
(CNF.7.5.1.1) SIGH queue when the number occur. 1. Type ps -ef|grep
overloaded of messages in the
logd to check
SIGH0 queue
whether logd utility
exceeds the
is running
threshold during
the predefined 2. If logd is running,
interval. type pkill logd to
* Alarm is cleared kill process
when the number 3. If logd is not
of messages in the running, reboot
queue is below the system.
threshold during
the predefined
interval.

Operations and Maintenance Manual Page 45


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.4.2 CCS ISUP Call Control Application Alarms


Table 18 lists the CCS ISUP Call Control Application Alarms and includes alarms for the following:
* CC_7.5.1.1
* CC_8.0.0

Note: In the table, alarm IDs are listed with the relevant CC.
Table 18: CCS ISUP Call Control Application Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1465350 165C06 Major Peak capacity One occurrence Alarm is generated All new incoming Provide CCS logs to
(CC_7.5.1.1) reached, when ISUP traffic calls will be GSOC Refer to Section
* 1465350 possible reaches maximum released until time 4.6
(CC_8.0.0) message lost value, and Flow interval expires.
Control Mechanism
on CCS start to
reject incoming calls.
Maximum Capacity
value depends on
hardware platform,
the number of CEs in
the cluster, and the
call type (WHC or
regular "long" calls).

Operations and Maintenance Manual Page 46


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1467902 1665FE * Critical CAPACITY * Critical = Alarm is generated If alarm is Critical, 1. Check if the ISUP
(CC_7.5.1.1) * Major REACHED 120% when ISUP traffic all new incoming traffic on the system is
* 1467902 * Minor <number> * Major = 100% exceeds threshold calls will be in accordance with the
(CC_8.0.0) * Warning PERCENT * Minor = 90% during predefined released until time system characteristics.
* Warning = interval. interval expires (if 2. If problem persists,
80% FlowControl is
Alarm is cleared contact GSOC, and
when traffic falls enabled). provide the following
below threshold. data:
▪ Actual alarm text
(that is, the percent
of unequipped
CICs)
▪ The ISUP traffic
logs from the
switch(s)
▪ The CCS logs.
Refer to Section
4.6

Operations and Maintenance Manual Page 47


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1467950 16662E Minor <CCS name> 10% Alarm is generated Partial service loss 1. Check if the ISUP
(CC_7.5.1.1) HAS <number> when number of might occur. traffic on the system is
* 1467950 UNEQUIPPED unequipped CICs in accordance with the
(CC_8.0.0) CICS exceeds a system characteristics.
predefined threshold. 2. If problem persists,
Problem usually contact GSOC, and
occurs when CIC provide the following
configuration on the data:
CCS does not match
CIC configuration on
▪ Actual alarm text
(that is, the percent
the CCSNet clients
of unequipped
(for example, the
CICs)
MMU and CMS), and
on the switch.
▪ The ISUP traffic
logs from the
switch(s)
▪ The CCS logs.
Refer to Section
4.6

Operations and Maintenance Manual Page 48


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1467951 16662F Minor <CCS name> 10% Alarm is generated Partial service loss Provide CCS logs to
(CC_7.5.1.1) HAS <number> when number of might occur. GSOC Refer to Section
* 1467951 TRANSIENT CICs in transient 4.6
(CC_8.0.0) CICS state exceeds
predefined threshold.
Problem occurs due
to ISUP protocol
collisions when the
CCS does not
receive an
appropriate message
on affected CICs
from the switch,
which is usually
caused by mismatch
between CIC status
on CCS and status
on switch.

Operations and Maintenance Manual Page 49


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

* 1467953 166631 * Major <CCS name> * Major = 50% Alarm is generated Partial service loss Provide CCS logs to
(CC_7.5.1.1) * Minor HAS <number> * Minor = 25% when number of might occur. GSOC. Refer to Section
* 1467953 UNAVAILABLE CICs unavailable for 4.6
(CC_8.0.0) CICS calls is higher than a
predefined threshold.
This might occur due
to high ISUP traffic
or ISUP protocol
collisions, when
some ranges are
unavailable for some
reason.
Alarm is cleared
when number of
unavailable CICs is
below threshold.

Operations and Maintenance Manual Page 50


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Probable Cause Impact Action

1467954 166632 Minor <CCS name> 90% Alarm is generated Partial service loss Provide CCS logs to
* 1467954 HAS <number> when number of might occur. GSOC Refer to Section
(CC_7.5.1.1) CICS IN USE CICs in "in use" state 4.6
* 1467954 (call in progress) is
(CC_8.0.0) above a predefined
threshold. This might
occur due to high
ISUP traffic, a large
number of non-
released calls, or
ISUP protocol
collisions, when
some ranges are not
in use.
Alarm is cleared
when number of "in
use" CICs is below
the threshold.

4.5 CCS INAP CAMEL Alarms


* CCS INAP CAMEL Operating System Alarms
* CCS INAP CAMEL Application Alarms

4.5.1 CCS INAP CAMEL Operating System Alarms


Table 19 lists the CCS INAP CAMEL Operating System Alarms and includes alarms for the following:
* CNF.0178.0
* CNF.01882.1
* CNF.0188X
* CNF.0202.1

Operations and Maintenance Manual Page 51


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

* CNF-INAP

Note: In the table, alarm IDs are listed with the relevant CNF.
Table 19: CCS INAP CAMEL Operating System

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083120 3E4DB0 Major PROCESS N/A Alarm is Loss of redundancy (at the first Proceed as follows:
(CNF.0178.0) (CNF.0178.0) ERC-0178- generated when occurrence) or loss of service (when 1. Log in to the system
* 4083280 0-<instance a process with triggered on all CEs) will occur.
3E4E50 as user omni.
(CNF 0182.1) number> the logical name
(CNF 0182.1) 2. Type: tar czvf
* 4083520 HAS 0 ERC-0178-0-
(CNF 0188.X) 3E4F40 (CNF omni_evt.tar.gz
INSTANCES <instance
* 4080080 0188.X) /home/omni/$(hostnam
number> is
(CNF 0202.1) 3E41D0 e)/tmp/Event*txt*
terminated
CNF-INAP (CNF 0202.1) because of an 3. Type: tar czvf
unexpected utu_runtime.tar.gz
process failure. /home/utu/CNF.0178.0/
logs/runtime/*
4. Send the
omni_evt.tar.gz and
utu_runtime.tar.gz
files to GSOC for
review.

Operations and Maintenance Manual Page 52


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083121 3E4DB1 Major PROCESS N/A Alarm is Loss of redundancy (at the first Proceed as follows:
(CNF 0178.0) (CNF.0178.0) NOK-0178- generated when occurrence) or loss of service (when 1. Log in to the system
* 4083281 0-<instance a process with triggered on all CEs) will occur.
3E4E51 as user omni.
(CNF 0182.1) number> the logical name
(CNF 0182.1) 2. Type: tar czvf
* 4083521 HAS 0 NOK-0178-0-
(CNF 0188.X) 3E4F41 (CNF omni_evt.tar.gz
INSTANCES <instance
* 4080081 0188.X) /home/omni/$(hostnam
number> is
(CNF 0202.1) 3E41D1 e)/tmp/Event*txt*
terminated
CNF-INAP (CNF 0202.1) because of an 3. Type: tar czvf
unexpected utu_runtime.tar.gz
process failure. /home/utu/CNF.0178.0/
logs/runtime/*
4. Send the
omni_evt.tar.gz and
utu_runtime.tar.gz
files to GSOC for
review.

Operations and Maintenance Manual Page 53


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083122 3E4DB2 Major PROCESS N/A Alarm is Loss of redundancy (at the first Proceed as follows:
(CNF 0178.0) (CNF.0178.0) SINAP- generated when occurrence) or loss of service (when 1. Log in to the system
* 4083282 0178-0- a process with triggered on all CEs) will occur.
3E4E52 as user omni.
(CNF 0182.1) <instance the logical name
(CNF 0182.1) 2. Type: tar czvf
* 4083522 number> SINAP-0178-0-
(CNF 0188.X) 3E4F42 (CNF omni_evt.tar.gz
HAS 0 <instance
* 4080082 0188.X) /home/omni/$(hostnam
INSTANCES number> is
(CNF 0202.1) 3E41D2 e)/tmp/Event*txt*
terminated
CNF-INAP (CNF 0202.1) because of an 3. Type: tar czvf
unexpected utu_runtime.tar.gz
process failure. /home/utu/CNF.0178.0/
logs/runtime/*
4. Send the
omni_evt.tar.gz and
utu_runtime.tar.gz files
to GSOC for analysis.
* 4083123 3E4DB3 Major PROCESS N/A Alarm is Loss of redundancy (at the first Proceed as follows:
(CNF 0178.0) (CNF.0178.0) CAP-0178- generated when occurrence) or loss of service (when 1. Log in to the system
* 4083283 0-<instance a process with triggered on all CEs) will occur.
3E4E53 as user omni.
(CNF 0182.1) number> the logical name
(CNF 0182.1) 2. Type: tar czvf
* 4083523 HAS 0 CAP-0178-0-
(CNF 0188.X) 3E4F43 (CNF omni_evt.tar.gz
INSTANCES <instance
* 4080083 0188.X) /home/omni/$(hostnam
number> is
(CNF 0202.1) 3E41D3 e)/tmp/Event*txt*
terminated
CNF-INAP (CNF 0202.1) because of an 3. Type: tar czvf
unexpected utu_runtime.tar.gz
process failure. /home/utu/CNF.0178.0/
logs/runtime/*
4. Send the
omni_evt.tar.gz and
utu_runtime.tar.gz files
to GSOC for review.

Operations and Maintenance Manual Page 54


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083124 3E4DB4 * Critical PROCESS * Critical = Alarm is Partial service loss might occur Contact GSOC
(CNF.0178.0) (CNF.0178.0) * Major ERC-0178- 750000 generated when when critical severity is reached.
* 4083284 * Minor 0-<instance KB the memory
3E4E54
(CNF.0182.1) * Warning number> * Major = consumption of
(CNF 0182.1)
* 4083524 HAS 650000 a process with
(CNF.0188.X) 3E4F44 (CNF KB
<memory the logical name
* 4080084 0188.X) * Minor =
size of ERC-0178-0-
(CNF.0202.1) 3E41D4 process in 550000 <instance
CNF-INAP (CNF 0202.1) KB> SIZE KB number>
* Warning = exceeds one of
500000 the thresholds
KB because of an
unexpected
process memory
problem.
* 4083125 3E4DB5 * Critical PROCESS * Critical = This alarm is Partial service loss might occur Contact GSOC
(CNF.0178.0) (CNF.0178.0) * Major NOK-0178- 750000 generated when when critical severity is reached.
* 4083285 * Minor 0-<instance KB the memory
3E4E55
(CNF.0182.1) * Warning number> * Major = consumption of
(CNF 0182.1)
* 4083525 HAS 650000 a process with
(CNF.0188.X) 3E4F45 (CNF KB
<memory the logical name
* 4080085 0188.X) * Minor =
size of NOK-0178-0-
(CNF.0202.1) 3E41D5 process in 550000 <instance
CNF-INAP (CNF 0202.1) KB> SIZE KB number>
* Warning = exceeds one of
500000 the thresholds
KB because of an
unexpected
process memory
problem.

Operations and Maintenance Manual Page 55


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083126 3E4DB6 * Critical PROCESS * Critical = Alarm is Partial service loss might occur Contact GSOC
(CNF.0178.0) (CNF.0178.0) * Major SINAP- 750000 generated when when critical severity is reached.
* 4083286 * Minor 0178-0- KB the memory
3E4E56
(CNF.0182.1) * Warning <instance * Major = consumption of
(CNF 0182.1)
* 4083526 number> 650000 a process with
(CNF.0188.X) 3E4F46 (CNF KB
HAS the logical name
* 4080086 0188.X) * Minor =
<memory SINAP-0178-0-
(CNF.0202.1) 3E41D6 size of 550000 <instance
CNF-INAP (CNF 0202.1) process in KB number>
KB> SIZE * Warning = exceeds one of
500000 the thresholds
KB because of an
unexpected
process memory
problem.
* 4083127 3E4DB7 * Critical PROCESS * Critical = Alarm is Partial service loss might occur Contact GSOC
(CNF.0178.0) (CNF.0178.0) * Major CAP-0178- 750000 generated when when critical severity is reached.
* 4083287 * Minor 0-<instance KB the memory
3E4E57
(CNF.0182.1) * Warning number> * Major = consumption of
(CNF 0182.1)
* 4083527 HAS 650000 a process with
(CNF.0188.X) 3E4F47 (CNF KB
<memory the logical name
* 4080087 0188.X) * Minor =
size of CAP-0178-0-
(CNF.0202.1) 3E41D7 process in 550000 <instance
CNF-INAP (CNF 0202.1) KB> SIZE KB number>
* Warning = exceeds one of
500000 the thresholds
KB because of an
unexpected
process memory
problem.

Operations and Maintenance Manual Page 56


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4083288 3E4E58 * Major PROCESS N/A Alarm is Loss of redundancy (at the first Proceed as follows:
(CNF.0182.1) (CNF.0182.1) ECS1- generated when occurrence) or loss of service (when 1. Log in to the system
* 4083528 <CNF- a process with triggered on all CEs) will occur.
3E4F48 as user omni.
(CNF.0188.X) Number>- the logical name
(CNF.0188.X) 2. Type: tar czvf
* 4080088 <instance ECS1-<CNF-
(CNF.0202.1) 3E41D8 omni_evt.tar.gz
number> Number>-
CNF-INAP (CNF.0202.1) /home/omni/$(hostnam
HAS 0 <instance
e)/tmp/Event*txt*
INSTANCES number> is
terminated 3. Type: tar czvf
because of an utu_runtime.tar.gz
unexpected /home/utu/CNF.<CNF.
process failure. Number>/logs/runtime/*
4. Send the
omni_evt.tar.gz and
utu_runtime.tar.gz files
to GSOC for review.
* 4083289 3E4E59 * Critical PROCESS * Critical = Alarm is Partial service loss might occur Contact GSOC
(CNF.0182.1) (CNF.0182.1) * Major ECS1- 750000 generated when when critical severity is reached.
* 4083529 * Minor <CNF- KB the memory
3E4F49
(CNF.0188.X) Warning * * Major =
(CNF.0188.X) * Number>- consumption of
* 4080089 <instance 650000 a process with
(CNF.0202.1) 3E41D9 KB
number> the logical name
CNF-INAP (CNF.0202.1) * * Minor =
HAS ECS1-<CNF-
<memory 550000 Number>-
size of KB <instance
process in * Warning = number>
KB> SIZE 500000 exceeds one of
KB the thresholds
because of an
unexpected
process memory
problem.

Operations and Maintenance Manual Page 57


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Operations and Maintenance Manual Page 58


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.5.2 CCS INAP CAMEL Application Alarms


Note: In the table, alarm IDs are listed with the relevant CNF.
Table 20 lists the CCS INAP CAMEL Application Alarms and includes alarms for the following:
* CNF.0178.0
* CNF.01882.1
* CNF.0188X
* CNF.0202.1
* CNF-INAP

Note: In the table, alarm IDs are listed with the relevant CNF.

Operations and Maintenance Manual Page 59


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Table 20: CCS INAP CAMEL Application Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477800 168CA8 * Critical Session Critical = 95% This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major table is Major = 91% generated when might occur when 1. Log in to the system as user
* 1478200 0) * Minor overloaded Minor = 88% SS7 traffic is critical severity is omni.
(CNF.0182.1) * Warning . Warning = 85% higher than reached.
168E38 2. Type: cd /home/omni
* 1478800 expected or
(CNF.0182.
(CNF.0182.1) average TCAP 3. Type: tar czvf evt.tar.gz
1)
* 1470200 session duration /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169090 statistics/*EventCounterDump
is longer than
CNF-INAP (CNF.0182. *
the system is
1)
targets. 4. Type:
166EF8 /home/utu/CNF.0178.0/logs/../
(CNF.0202. bin/edbomd ERC-0178-0-
1) <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 60


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477801 168CA9 * Critical Session Critical = 95% This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major table is Major = 91% generated when might occur when 1. Log in to the system as user
* 1478201 0) * Minor overloaded Minor = 88% SS7 traffic is critical severity is omni.
(CNF.0182.1) * Warning . Warning = 85% higher than reached.
168E39 2. Type: cd /home/omni
* 1478801 expected or
(CNF.0182.
(CNF.0188.X) average TCAP 3. Type: tar czvf evt.tar.gz
1)
* 1470201 session duration /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169091 statistics/*EventCounterDump
is longer than
CNF-INAP (CNF.0182. *
the system is
1)
targets. 4. Type:
166EF9 /home/utu/CNF.0178.0/logs/../
(CNF.0202. bin/edbomd ERC-0178-0-
1) <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 61


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477802 168CAA * Critical Session Critical = 95% This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major table is Major = 91% generated when might occur when 1. Log in to the system as user
* 1478202 0) * Minor overloaded Minor = 88% SS7 traffic is critical severity is omni.
(CNF.0182.1) * Warning . Warning = 85% higher than reached.
168E3A 2. Type: cd /home/omni
* 1478802 expected or
(CNF.0182.
(CNF.0188.X) average TCAP 3. Type: tar czvf evt.tar.gz
1)
* 1470202 session duration /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169092 statistics/*EventCounterDump
is longer than
CNF-INAP (CNF.0182. *
the system is
1)
targets. 4. Type:
166EFA /home/utu/CNF.0178.0/logs/../
(CNF.0202. bin/edbomd ERC-0178-0-
1) <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 62


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477803 168CAB * Critical Session Critical = 95% This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major table is Major = 91% generated when might occur when 1. Log in to the system as user
* 1478203 0) * Minor overloaded Minor = 88% SS7 traffic is critical severity is omni.
(CNF.0182.1) * Warning . Warning = 85% higher than reached.
168EFB 2. Type: cd /home/omni
* 1478803 expected or
(CNF.0182.
(CNF.0188.X) average TCAP 3. Type: tar czvf evt.tar.gz
1)
* 1470203 session duration /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169093 statistics/*EventCounterDump
is longer than
CNF-INAP (CNF.0182. *
the system is
1)
targets. 4. Type:
166EFB /home/utu/CNF.0178.0/logs/../
(CNF.0202. bin/edbomd ERC-0178-0-
1) <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 63


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477804 168CAC * Critical An Critical = 2250 This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major application (90% allowed in generated when might occur when 1. Log in to the system as user
* 1478204 0) * Minor capacity 30 seconds) SS7 traffic is critical severity is omni.
(CNF.0182.1) problem Major = 2125 higher than the reached.
168E3C 2. Type: cd /home/omni
* 1478804 has (85% allowed in system is
(CNF.0182.
(CNF.0188.X) occurred. 30 seconds) targets. 3. Type: tar czvf evt.tar.gz
1)
* 1470204 Minor = 2000 /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169094 statistics/*EventCounterDump
(80% allowed in
CNF-INAP (CNF.0182. *
30 seconds)
1)
4. Type:
166EFC /home/utu/CNF.0178.0/logs/../
(CNF.0202. bin/edbomd ERC-0178-0-
1) <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 64


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1477805 168E3D * Critical A process Critical = ??? This alarm is Partial service loss Proceed as follows:
(CNF.0178.0) (CNF.0178. * Major instance (90% allowed in generated when might occur when 1. Log in to the system as user
* 1478205 0) * Minor capacity 30 seconds) SS7 traffic is critical severity is omni.
(CNF.0182.1) problem Major = ??? (85% higher than the reached.
168E3D 2. Type: cd /home/omni
* 1478805 occurred. allowed in 30 system is
(CNF.0182.
(CNF.0188.X) seconds) targets. 3. Type: tar czvf evt.tar.gz
1)
* 1470205 Minor = ??? (80% /home/utu/CNF.0178.0/logs/../
(CNF.0202.1) 169095 statistics/*EventCounterDump
allowed in 30
CNF-INAP (CNF.0188. *
seconds)
X)
Real values are 4. Type:
166EFD depended on /home/utu/CNF.0178.0/logs/../
(CNF.0202. number instances bin/edbomd ERC-0178-0-
1) and protocols <instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 65


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Threshold Possible Impact Action


Text Causes

* 1478206 168E3E * Critical Session Critical = 95% This alarm is Partial service loss Proceed as follows:
(CNF.0182.1) (CNF.0182. * Major table is Major = 91% generated when might occur when 1. Log in to the system as user
* 1478806 1) * Minor overloaded Minor = 88% SS7 traffic is critical severity is omni.
(CNF.0188.X) * Warning . (ECS1) Warning = 85% higher than reached.
169096 2. Type: cd /home/omni
* 1470206 expected or
(CNF.0188.
(CNF.0202.1) average TCAP 3. Type: tar czvf evt.tar.gz
X)
CNF-INAP session duration /home/utu/CNF.<CNF.Number>/l
166EFE ogs/../statistics/*EventCounterDu
is longer than
(CNF.0202. the system is mp*
1) targets. 4. Type:
/home/utu/CNF.<CNF.Number>/l
ogs/../bin/edbomd ECS1-<CNF-
Number>-<instance number>
5. At the prompt, type:
num>edbomd.txt
6. At the prompt, type:q
7. Send the evt.tar.gz and
edbomd.txt files to GSOC for
analysis.

Operations and Maintenance Manual Page 66


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.6 USSD CNF Alarms


* USSD CNF Application Alarms
* USSD CNF OS Alarms

4.6.1 USSD CNF Application Alarms


Table 21: USSD CNF Application Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 1475000 1681B8 Critical/Major/Minor/ Session table is * Critical = 95% This alarm Partial 1. Log in to the system as user omni.
(CNF.015 (CNF.0150.2 Warning overloaded. * Major = 91% is service loss 2. Type:
0.2) ) * Minor = 88% generated might occur
* 1478900 * Warning = 85% cd /home/omni
1690F4 when SS7 when critical
(CNF.018 traffic is severity is 3. Type:
(CNF.0189.0
9.0) higher reached. tar czvf evt.tar.gz /home/utu/<CNF
)
* 1479500 than Name>/statistics/*EventCounterDump
(CNF.019 16934C
expected *
5.X) (CNF.0195.X
or 4. Type:
* 1470000 )
average
(CNF.020 166E30 /home/utu/<CNF Name>/bin/edbomd
TCAP
0.1) (CNF.0200.1 <Process Name>
session
(CNF- ) duration is 5. At the prompt, type:
USSD) longer num>edbomd.txt
than the 6. At the prompt, type:
system is
q
targeted
for. 7. Send the evt.tar.gz and edbomd.txt
files to TOMIA customer support for
review.

Operations and Maintenance Manual Page 67


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 1475001 1681B9 Critical/Major/Minor A capacity * Critical = Single This alarm Partial 1. Log in to the system as user omni.
(CNF.015 (CNF.0150.2 problem has process [8370], is service loss 2. Type:
0.2) ) occurred. multiple processes generated might occur
* 1478901 [16200] (90% cd /home/omni
1690F5 when SS7 when critical
(CNF.018 allowed in 30 traffic is severity is 3. Type:
(CNF.0189.0
9.0) seconds) higher reached. tar czvf evt.tar.gz /home/utu/<CNF
)
* 1479501 * Major = Single than the Name>/statistics/*EventCounterDump
(CNF.019 16934D process [7905], system is *
5.X) (CNF.0195.X multiple processes targets. 4. Send the evt.tar.gz and edbomd.txt
* 1470001 ) [15300] (85%
files to TOMIA customer support for
(CNF.020 166E31 allowed in 30
review
0.1) (CNF.0200.1 seconds)
(CNF- ) Minor = Single
USSD) process [7440],
multiple processes
[14400] (80%
allowed in 30
seconds)

Operations and Maintenance Manual Page 68


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 1475002 1681BA Major ClientId[<client N/A This alarm If the client Contact TOMIA customer support to
(CNF.015 (CNF.0150.2 id>] is down. is is the last determine the cause of the alarm.
0.2) ) generated one (based
* 1478902 when the on routing
1690F6
(CNF.018 connectio configuratio
(CNF.0189.0
9.0) n to the n in the
)
* 1479502 SGAP CCS),
(CNF.019 16934E
1.0.0 service will
5.X) (CNF.0195.X
client is be lost; if it
* 1470002 )
lost. This is not,
(CNF.020 166E32 might be partial
0.1) (CNF.0200.1 caused by service loss
(CNF- ) a network might occur.
USSD) problem
or a client
applicatio
n failure.

Operations and Maintenance Manual Page 69


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 1475003 1681BB Major ClientID[<client N/A This alarm If the client Contact TOMIA customer support to
(CNF.015 (CNF.0150.2 id>] is is the last determine the cause of the alarm.
0.2) ) ClientType[<clie generated one (based
* 1478903 nt type>] is when the on routing
1690F7
(CNF.018 down. connectio configuratio
(CNF.0189.0
9.0) n to the n in the
)
* 1479503 SGAP CCS),
(CNF.019 16934F
2.0.0 service will
5.X) (CNF.0195.X
client is be lost; if it
* 1470003 )
lost. This is not,
(CNF.020 166E33 might be partial
0.1) (CNF.0200.1 caused by service loss
(CNF- ) a network might occur.
USSD) problem
or a client
applicatio
n failure.

Operations and Maintenance Manual Page 70


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 1475004 1681Bc Major <ESME(system N/A This alarm If the ESME Contact TOMIA customer support to
(CNF.015 (CNF.0150.2 id)> is down. is is the last determine the cause of the alarm.
0.2) ) generated one (based
* 1478904 when the on routing
1690F8
(CNF.018 connectio configuratio
(CNF.0189.0
9.0) n to the n in the
)
* 1479504 SMPP CCS),
(CNF.019 169350
client is service will
5.X) (CNF.0195.X
lost. This be lost; if it
* 1470004 )
might be is not,
(CNF.020 166E34 caused by partial
0.1) (CNF.0200.1 a network service loss
(CNF- ) problem might occur.
USSD) or a client
applicatio
n failure.

Operations and Maintenance Manual Page 71


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.6.2 USSD CNF OS Alarms


Table 22: USSD CNF OS Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Impact Action


Causes

* 4082000 3E4950 (CNF.0150.2) Major <Process N/A This alarm is Loss of 1. Log in to the system as user omni.
(CNF.015 Name> HAS generated redundancy
3E4F68 (CNF.0189.0) 2. Type:
0.2) 0 when the (at the first
* 4083560 3E5058 (CNF.0195.X) tar czvf omni_evt.tar.gz
INSTANCES USSD occurrence)
(CNF.018 3E4180 (CNF.0200.1) /home/omni/$(hostname)/tmp/Event*txt*
application or loss of
9.0) 3E418A (CNF-USSD) process is service 3. Type:
* 4083800 terminated (when tar czvf utu_runtime.tar.gz
(CNF.019 because of triggered /home/utu/<CNF Name>/logs/runtime/*
5.X) an on all CEs) 4. Send the omni_evt.tar.gz and
* 4080000 unexpected will occur. utu_runtime.tar.gz files to TOMIA
(CNF.020 process customer support for review.
0.1) failure.
4080010
(CNF-
USSD)

Operations and Maintenance Manual Page 72


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

* 4082001 3E4951 (CNF.0150.2) Critical/Major/Min <Process Up to: This alarm is Partial Contact TOMIA customer support to
(CNF.015 or/Warning Name> HAS CN0195.0): generated service loss determine the cause of the alarm.
3E4F69 (CNF.0189.0)
0.2) <memory when the might occur
3E5059 (CNF.0195.X) *Critical =
* 4083561 size of memory when
750000 KB
(CNF.018 3E4181 (CNF.0200.1) process in consumption critical
9.0) (CNF-USSD) *Major =
KB> SIZE of USSD severity is
* 4083801 650000 KB
application reached.
(CNF.019 *Minor = process
5.X) 550000 KB exceeds one
* 4080001 *Warning = of the
(CNF.020 500000 KB thresholds
0.1) (CNF- because of
(CNF.0200.1
USSD) an
):
unexpected
*Critical =
process
950000 KB
memory
*Major = problem."
850000 KB
*Minor =
750000 KB
*Warning =
700000 KB
From CNF-
USSD:
*WARNING:
900000
*MINOR:100
0000
*MAJOR:11
00000
*CRITICAL:
1200000

Operations and Maintenance Manual Page 73


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.7 TCABRT CNF Alarms


Table 23: TCABRT CNF Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

1471300 167344 * Critical Session table * Critical = Alarm is generated when Partial Proceed as follows:
* Major is overloaded. 95% SS7 traffic is higher than service loss 1. Log in to the system as user omni.
*Major = 91% expected or average TCAP might occur
* Minor 2. Type: cd /home/omni
session duration is longer when
* Warning *Minor = 88% 3. Type: tar czvf evt.tar.gz
than the system is targeted critical
*Warning = for. severity is /home/utu/<CNF
85% reached. Name>/statistics/*EventCounterDump*
4. Type: /home/utu/<CNF
Name>/bin/edbomd TCABRT-
<Release>-1
5. At the prompt, type: num>edbomd.txt
6. At the prompt, type: q
7. Send the evt.tar.gz and edbomd.txt
files to GSOC for analysis
1471301 167345 Major IMSI does not N/A Alarm is generated when Might imply Call GSOC
match. the process with specified incorrect
logical name cannot route a routing
message since the IMSI configuratio
does not exist in the n, or
process routing tables. message
received
from an
unexpected
source.

Operations and Maintenance Manual Page 74


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Dec ID Hex ID Severity Alarm Text Threshold Possible Causes Impact Action

1471303 167347 *Critical A capacity *Critical = Alarm is generated when Partial Proceed as follows:
*Major problem has 6000 (100% SS7 traffic is higher than service loss 1. Log in to the system as user omni.
occurred. allowed in 30 system is targeted for. might occur
*Minor 2. Type: cd /home/omni
seconds) when
critical 3. Type: tar czvf evt.tar.gz
*Major = 5400
severity is /home/utu/<CNF
(90% allowed
reached. Name>/statistics/*EventCounterDump*
in 30
seconds) 4. Send the evt.tar.gz and edbomd.txt
files to GSOC for analysis
*Minor = 4800
(80% allowed
in 30
seconds)
4080520 3E4388 Major "<Process N/A "This alarm is generated Loss of 1. Log in to the system as user omni.
Name> HAS 0 when the TCABRT redundancy 2. Type:
INSTANCES" application process is (at the first
tar czvf omni_evt.tar.gz
terminated because of an occurrence)
/home/omni/$(hostname)/tmp/Event*txt*
unexpected process failure." or loss of
service 3. Type:
(when tar czvf utu_runtime.tar.gz
triggered /home/utu/<CNF Name>/logs/runtime/*
on all CEs) 4. Send the omni_evt.tar.gz and
will occur. utu_runtime.tar.gz files to TOMIA
customer support for review.
4080521 3E4389 Critical/Majo "<Process WARNING:70 "This alarm is generated Partial Call GSOC
r/Minor/War Name> HAS 0000 when the memory service loss
ning <memory size MINOR:80000 consumption of TCABRT might occur
of process in 0 application process exceeds when
KB> SIZE" one of the thresholds critical
MAJOR:9000
because of an unexpected severity is
00
process memory problem." reached.
CRITICAL:10
00000

Operations and Maintenance Manual Page 75


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

Operations and Maintenance Manual Page 76


Proprietary and Confidential HP-based Signaling Gateway (SGU type and SGU-C type) Alarms

4.8 Preparing CCS Alarms Logs for Sending to GSOC


If the CCS is accessible via HTTP browser
1. Type the CCS IP address into the address field.
2. Log in to the Admin UI utility as user omni, root, or as any other maintainer- or expert-level user.
3. From the menu, select System Status > Application Info
4. When output is displayed: Select Tar File and save the dumpccs.<date>.tar file on your computer.
5. Send the file to GSOC
If you have SSH access:
1. As user omni, type: 999. The Admin UI utility starts.
2. Log in to the Admin UI utility as user omni, root, or as any other maintainer- or expert-level user.
3. From the menu, select System Status > Application Info
4. When the output is ready, exit the utility by pressing CTRL+C
5. Send the /home/omni/dumpccs/dumpccs.<date>.out and the
/home/omni/dumpccs/dumpccs.<date>.tar files to GSOC.

Note: Perform procedure on all CEs in cluster.

Operations and Maintenance Manual Page 77


Proprietary and Confidential Ulticom Infrastructure Alarms

5. Ulticom Infrastructure Alarms


To view the updated list of Ulticom infrastructure alarms (181 alarms), refer to the Ulticom Events List.xls

Operations and Maintenance Manual Page 78


Proprietary and Confidential HP Servers

6. HP Servers
This section contains the following topics:
* Introduction
* HP Server Alarms – Generic
* HP Server Alarms – Standard X733

6.1 Introduction
The server hardware alarms are based on HP Insight Management Agent capabilities. The HP Insight
Management Agent for HP ProLiant systems uses the Simple Network Management Protocol (SNMP) to
manage, monitor, and report on server fans, power supplies, LEDs, and environmental parameters.
HP Insight Management Agents operate on devices, performing in-depth monitoring of the device’s state by
collecting and measuring parameters. These parameters indicate the current state of subsystems by counting
the occurrences of particular events (for example, the number of read operations performed on a disk drive) or
by monitoring the state of a critical function (for example, whether the cooling fan is operating). Insight
Management Agents provide access to device management data using a Web browser over industry-standard
HTTP protocol, enabling access of data from any location with network access.
Insight Management Agents provide:
* Performance monitoring—Agents proactively monitor server performance by predefined thresholds for
memory, CPUs, NICs, and logical disks.
* System control—Agents monitor over 1,000 parameters and generate alerts in the event of a fault.
HP alarms number into the hundreds. A selection of the more prevalent alarms is listed in Section 6.2 (for
generic system alarms where the TOMIA proprietary OSS monitoring mediator is not installed), and 6.3 (for
X733-compliant alarms where the TOMIA proprietary OSS monitoring mediator is installed). For the complete
list and more information on Insight Management Agents for Microsoft Windows for HP ProLiant servers, refer
to the vendor documentation listed in Table 24 below.

Operations and Maintenance Manual Page 79


Proprietary and Confidential HP Servers

Table 24: HP Insight Management Agents Documentation

Manual Description PDF Filename

Installation Guide Step-by-step instructions for installation of HP Insight Management Agents and a HP Insight Manager v4.2 Agent Installation Guide –
reference for operation and troubleshooting April 2005
User Guide Information about using HP Insight Management Agents for servers, including HP Insight Management Agents User Guide
details about accessing and understanding the information provided
Reference Guide Listing of the Microsoft Windows NT and Windows 2000 Event Log messages Microsoft Windows Event ID and SNMP Traps
associated with SNMP traps generated by the HP Insight Management Agents
Release Notes Latest updates HP Insight Manager Release Notes
Help Helps user HP Insight Manager Help Guide
Repair HP Insight Manager Configure or Repair Agents

Operations and Maintenance Manual Page 80


Proprietary and Confidential HP Servers

6.2 HP Server Alarms – Generic


The following table lists the alarms as they are sent by the HP system (without the TOMIA proprietary OSS
monitoring mediator).

Note: For the alarms sent by the HP system configured to integrate with the TOMIA proprietary OSS monitoring
mediator, see section 6.3
Table 25: HP Server Alarms – Generic

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.10001 cpqMeRisingAlarm An alarm entry has crossed its rising threshold. The instances of those objects contained Critical
within the variable
list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10002 cpqMeFallingAlarm An alarm entry has crossed its falling threshold. The instances of those objects contained Critical
within the variable
list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10003 cpqMe2RisingAlarm An alarm entry has crossed its rising threshold. The instances of those objects contained Critical
within the variable
list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10004 cpqMe2FallingAlarm An alarm entry has crossed its falling threshold. The instances of those objects contained Critical
within the variable
list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10005 cpqMeRisingAlarmExtended An alarm entry has crossed its rising threshold. The instances of those objects contained Major
within the variable
list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10006 cpqMeFallingAlarmExtended An alarm entry has crossed its falling threshold. The instances of those objects contained Major
within the variable
list are those of the alarm entry, which generated this trap.

Operations and Maintenance Manual Page 81


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.10007 cpqMeCriticalRisingAlarmExtend An alarm entry has crossed its Critical rising threshold. The instances of those objects Critical
ed contained within the
variable list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.10008 cpqMeCriticalFallingAlarmExtend An alarm entry has crossed its Critical falling threshold. The instances of those objects Critical
ed contained within the
variable list are those of the alarm entry, which generated this trap.
.1.3.6.1.4.1.232.0.1001 cpqSeCpuThresholdPassed This trap is sent when an internal CPU error threshold has been passed on a particular Minor
CPU causing it to go degraded.
.1.3.6.1.4.1.232.0.11001 cpqHoGenericTrap Generic trap. Major
.1.3.6.1.4.1.232.0.11002 cpqHoAppErrorTrap An application has generated an exception. Specific error information is contained in the Major
variable cpqHoSwPerfAppErrorDesc.
.1.3.6.1.4.1.232.0.11003 cpqHo2GenericTrap Generic trap. Major
.1.3.6.1.4.1.232.0.11004 cpqHo2AppErrorTrap An application has generated an exception. Specific error information is contained in the Major
variable cpqHoSwPerfAppErrorDesc.
.1.3.6.1.4.1.232.0.11005 cpqHo2NicStatusOk This trap will be sent any time the status of a NIC changes to the OK condition. Clear /
Normal
.1.3.6.1.4.1.232.0.11006 cpqHo2NicStatusFailed This trap will be sent any time the status of a NIC changes to the Failed condition. Major
.1.3.6.1.4.1.232.0.11007 cpqHo2NicSwitchoverOccurred This trap will be sent any time the configured redundant NIC becomes the active NIC. Major
.1.3.6.1.4.1.232.0.11008 cpqHo2NicStatusOk2 This trap will be sent any time the status of a NIC changes to the OK condition. Clear /
Normal
.1.3.6.1.4.1.232.0.11009 cpqHo2NicStatusFailed2 This trap will be sent any time the status of a NIC changes to the Failed condition. Major
.1.3.6.1.4.1.232.0.11010 cpqHo2NicSwitchoverOccurred2 This trap will be sent any time the configured redundant NIC becomes the active NIC. Major
.1.3.6.1.4.1.232.0.11011 cpqHoProcessEventTrap A monitored process has either started or stopped running. Major
.1.3.6.1.4.1.232.0.11015 cpqHoCrashDumpNotEnabledTr This trap is sent to the user to notify him that the Crash Dump is not enabled. Warning
ap

Operations and Maintenance Manual Page 82


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.11016 cpqHoBootPagingFileTooSmallTr This trap is sent when the paging file size of the boot volume or the target volume of the Warning
ap memory dump file is too small to hold a crash dump.
.1.3.6.1.4.1.232.0.14001 cpqIdeDriveDegraded An IDE drive status has been set to degrade. Critical
.1.3.6.1.4.1.232.0.14002 cpqIdeDriveOk An IDE drive status has been set to ok Clear /
Normal
.1.3.6.1.4.1.232.0.14005 cpqIdeLogicalDriveStatusChange This trap signifies that the agent has detected a change in the status of an IDE logical drive. Critical
The variable cpqIdeLogicalDriveStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.15003 cpqClusterNodeDegraded This trap will be sent any time the condition of a node in the cluster becomes degraded. Major
.1.3.6.1.4.1.232.0.15004 cpqClusterNodeFailed This trap will be sent any time the condition of a node in the cluster becomes failed. Major
.1.3.6.1.4.1.232.0.15005 cpqClusterResourceDegraded This trap will be sent any time the condition of a cluster resource becomes degraded. Major
.1.3.6.1.4.1.232.0.15006 cpqClusterResourceFailed This trap will be sent any time the condition of a cluster resource becomes failed. Major
.1.3.6.1.4.1.232.0.15007 cpqClusterNetworkDegraded This trap will be sent any time the condition of a cluster network becomes degraded. Major
.1.3.6.1.4.1.232.0.15008 cpqClusterNetworkFailed This trap will be sent any time the condition of a cluster network becomes failed. Major
.1.3.6.1.4.1.232.0.16001 cpqFcaLogDrvStatusChange This trap signifies that the agent has detected a change in the status of External Array Critical
logical drive.
The variable cpqFcaLogDrvStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.16002 cpqFcaSpareStatusChange This trap signifies that the agent has detected a change in the status of External Array Critical
spare drive.
The variable cpqFcaSpareStatus indicates the current spare drive status.
The variable cpqFcaSpareBusNumber indicates the SCSI bus number associated with this
drive.
.1.3.6.1.4.1.232.0.16003 cpqFcaPhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a physical drive. Critical
The variable cpaFcaPhyDrvStatus indicates the current physical drive status.
The variable cpqFcaPhyDrvBusNumber indicates the SCSI bus number associated with
this drive.

Operations and Maintenance Manual Page 83


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.16004 cpqFcaAccelStatusChange This trap signifies that the agent has detected a change in the cpqFcaAccelStatus of Array Critical
Accelerator Cache Board.
The status is represented by the variable cpqFcaAccelStatus.
.1.3.6.1.4.1.232.0.16005 cpqFcaAccelBadDataTrap This trap signifies that the agent has detected Array Accelerator Cache Board that has lost Critical
battery power.
If data was being stored in the accelerator memory when the system lost power, that data
has been lost.
.1.3.6.1.4.1.232.0.16006 cpqFcaAccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the Array Critical
Accelerator Cache Board.
.1.3.6.1.4.1.232.0.16007 cpqFcaCntlrStatusChange This trap signifies that the agent has detected a change in the status of External Array Critical
Controller.
The variable cpqFcaCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.16014 cpqFcaCntlrActive This trap signifies that the Storage Agent has detected that a backup array controller in a Warning
duplexed pair has
switched over to the active role. The variable cpqFcaCntlrBoxIoSlot indicates the new
active controller index.
.1.3.6.1.4.1.232.0.16015 cpqFcaHostCntlrStatusChange This trap signifies that the Insight Agent has detected a change in the status of a Fibre Critical
Channel Host Controller.
The variable cpqFcaHostCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.16016 cpqFca2PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a physical drive. Critical
The variable cpaFcaPhyDrvStatus indicates the current physical drive status.
.1.3.6.1.4.1.232.0.16017 cpqFca2AccelStatusChange This trap signifies that the agent has detected a change in the status of Array Accelerator Critical
Cache Board.
The status is represented by the variable cpqFcaAccelStatus.
.1.3.6.1.4.1.232.0.16018 cpqFca2AccelBadDataTrap This trap signifies that the agent has detected Array Accelerator Cache Board that has lost Critical
battery power.
If data was being stored in the accelerator memory when the system lost power, that data
has been lost.

Operations and Maintenance Manual Page 84


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.16019 cpqFca2AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the Array Critical
Accelerator Cache Board.
.1.3.6.1.4.1.232.0.16020 cpqFca2CntlrStatusChange This trap signifies that the agent has detected a change in the status of External Array Critical
Controller. The variable
cpqFcaCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.16021 cpqFca2HostCntlrStatusChange This trap signifies that the agent has detected a change in the status of a Fibre Channel Critical
Host Controller.
The variable cpqFcaHostCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.16022 cpqExtArrayLogDrvStatusChang This trap signifies that the agent has detected a change in the status of an External Array Critical
e logical drive.
The variable cpqFcaLogDrvStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.16028 cpqFca3HostCntlrStatusChange This trap signifies that the agent has detected a change in the status of a Fibre Channel Critical
Host Controller.
The variable cpqFcaHostCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.18001 cpqNicConnectivityRestored This trap will be sent any time connectivity is restored to a logical adapter. This occurs Clear /
when Normal
the physical adapter in a single adapter configuration returns to the OK condition or at least
one physical adapter in a logical adapter group returns to the OK condition.
This can be caused by replacement of a faulty cable or re-attaching a cable that was
unplugged.
.1.3.6.1.4.1.232.0.18002 cpqNicConnectivityLost This trap will be sent any time the status of a logical adapter changes to the Failed Major
condition. This occurs when
the adapter in a single adapter configuration fails, or when the last adapter in a redundant
configuration fails.
This can be caused by loss of link due to a cable being removed from the adapter or the
Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.

Operations and Maintenance Manual Page 85


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.18003 cpqNicRedundancyIncreased This trap will be sent any time a previously failed physical adapter in a connected logical Clear /
adapter group Normal
returns to the OK condition. This trap is not sent when a logical adapter group has
connectivity restored
from a Failed condition. The cpqNicConnectivityRestored trap is sent instead.
This can be caused by replacement of a faulty cable or re-attaching a cable that was
unplugged.
.1.3.6.1.4.1.232.0.18004 cpqNicRedundancyReduced This trap will be sent any time a physical adapter in a logical adapter group changes to the Major
Failed condition,
but at least one physical adapter remains in the OK condition.
This can be caused by loss of link due to a cable being removed from the adapter or the
Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.
.1.3.6.1.4.1.232.0.18005 cpqNic2ConnectivityRestored This trap will be sent any time connectivity is restored to a logical adapter. This occurs Clear /
when the physical Normal
adapter in a single adapter configuration returns to the OK condition or at least one physical
adapter
in a logical adapter group returns to the OK condition.
This can be caused by replacement of a faulty cable or re-attaching a cable that was
unplugged.
.1.3.6.1.4.1.232.0.18006 cpqNic2ConnectivityLost This trap will be sent any time the status of a logical adapter changes to the Failed Major
condition. This occurs when
the adapter in a single adapter configuration fails, or when the last adapter in a redundant
configuration fails.
This can be caused by loss of link due to a cable being removed from the adapter or the
Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.

Operations and Maintenance Manual Page 86


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.18007 cpqNic2RedundancyIncreased This trap will be sent any time a previously failed physical adapter in a connected logical Clear /
adapter group returns Normal
to the OK condition. This trap is not sent when a logical adapter group has connectivity
restored from a Failed
condition. The cpqNicConnectivityRestored trap is sent instead.
This can be caused by replacement of a faulty cable or re-attaching a cable that was
unplugged.
.1.3.6.1.4.1.232.0.18008 cpqNic2RedundancyReduced This trap will be sent any time a physical adapter in a logical adapter group changes to the Major
Failed condition, but at least one physical adapter remains in the OK condition.
This can be caused by loss of link due to a cable being removed from the adapter or the
Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.
.1.3.6.1.4.1.232.0.18009 cpqNicVirusLikeActivityDetected This trap will be sent when the Virus Throttle Filter Driver detects virus like activity. Major
.1.3.6.1.4.1.232.0.18010 cpqNicVirusLikeActivityStopped This trap will be sent when the Virus Throttle Filter Driver no longer detects virus like Clear /
activity. Normal
.1.3.6.1.4.1.232.0.18011 cpqNic3ConnectivityRestored This trap will be sent any time connectivity is restored to a logical adapter. Clear/
Normal
.1.3.6.1.4.1.232.0.18012 cpqNic3ConnectivityLost This trap will be sent any time the status of a logical adapter changes to the Failed Major
condition.
.1.3.6.1.4.1.232.0.18013 cpqNic3RedundancyIncreased This trap will be sent any time a previously failed physical adapter in a connected logical Clear/
adapter group returns to the OK condition Normal
.1.3.6.1.4.1.232.0.18014 cpqNic3RedundancyReduce This trap will be sent any time a physical adapter in a logical adapter group changes to the Major
Failed condition, but at least one physical adapter remains in the OK condition.
.1.3.6.1.4.1.232.0.19001 cpqOsCpuTimeDegraded The Processor Time performance property is set to degraded. Critical
.1.3.6.1.4.1.232.0.19002 cpqOsCpuTimeFailed The Processor Time performance property is set to Critical. Critical
.1.3.6.1.4.1.232.0.19003 cpqOsCacheCopyReadHitsDegr The Cache CopyReadHits performance property is set to degraded. Critical
aded

Operations and Maintenance Manual Page 87


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.19004 cpqOsCacheCopyReadHitsFaile The Cache CopyReadHits performance property is set to Critical. Critical
d
.1.3.6.1.4.1.232.0.19005 cpqOsPageFileUsageDegraded The PagingFile Usage performance property is set to degraded. Critical
.1.3.6.1.4.1.232.0.19006 cpqOsPageFileUsageFailed The PagingFile Usage performance property is set to Critical. Critical
.1.3.6.1.4.1.232.0.19007 cpqOsLogicalDiskBusyTimeDegr The LogicalDisk BusyTime performance property is set to degraded. Critical
aded
.1.3.6.1.4.1.232.0.19008 cpqOsLogicalDiskBusyTimeFaile The LogicalDisk BusyTime performance property is set to Critical. Critical
d
.1.3.6.1.4.1.232.0.2001 cpqSiHoodRemoved The hood status has been set to remove. The system's hood is not in a properly installed Major
state. This
situation may result in improper cooling of the system due to airflow changes caused by the
missing hood.
.1.3.6.1.4.1.232.0.2008 cpqSiHotPlugSlotBoardRemoved Hot Plug Slot Board Removed. A Hot Plug Slot Board has been removed from the specified warning
chassis and slot.
.1.3.6.1.4.1.232.0.2009 cpqSiHotPlugSlotBoardInserted A Hot Plug Slot Board has been inserted into the specified chassis and slot. Clear /
Normal
.1.3.6.1.4.1.232.0.3001 cpqDa2LogDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array logical Critical
drive. The variable
cpqDaLogDrvStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.3002 cpqDa2SpareStatusChange This trap signifies that the agent has detected a change in the status of a drive array spare Critical
drive. The variable cpqDaSpareStatus
indicates the current spare drive status. The variable cpqDaSpareBusNumber indicates the
SCSI bus number associated with this drive.
.1.3.6.1.4.1.232.0.3003 cpqDa2PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive. The variable cpaDaPhyDrvStatus
indicates the current physical drive status. The variable cpqDaPhyDrvBusNumber indicates
the SCSI bus number associated with this drive.

Operations and Maintenance Manual Page 88


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.3004 cpqDa2PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold associated with one of the Critical
p physical drive objects on
a drive array has been exceeded. The variable cpqDaPhyDrvBusNumber indicates the
SCSI bus number associated with the drive.
.1.3.6.1.4.1.232.0.3005 cpqDa2AccelStatusChange This trap signifies that the Insight Agent has detected a change in the cpqDaAccelStatus of Critical
array accelerator cache.
The status is represented by the variable cpqDaAccelStatus.
.1.3.6.1.4.1.232.0.3006 cpqDa2AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache board that has Critical
lost battery power. If data
was being stored in the accelerator memory when the server lost power, that data has been
lost.
.1.3.6.1.4.1.232.0.3007 cpqDa2AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the array Critical
accelerator cache board.
The current battery status is indicated by the cpqDaAccelBattery variable.
.1.3.6.1.4.1.232.0.3008 cpqDa3LogDrvStatusChange This trap signifies that the agent has detected a change in thestatus of a drive array logical Critical
drive.
The variable cpqDaLogDrvStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.3009 cpqDa3SpareStatusChange This trap signifies that the agent has detected a change in thestatus of a drive array spare Critical
drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
The variable cpqDaSpareBusNumber indicates the SCSI bus number associated with this
drive.
.1.3.6.1.4.1.232.0.3010 cpqDa3PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive status.
The variable cpqDaPhyDrvBusNumber indicates the SCSI bus number associated with this
drive.

Operations and Maintenance Manual Page 89


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.3011 cpqDa3PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold associated with one of the Critical
p physical drive objects on
a drive array has been exceeded. The variable cpqDaPhyDrvBusNumber indicates the
SCSI bus number associated with the drive.
.1.3.6.1.4.1.232.0.3012 cpqDa3AccelStatusChange This trap signifies that the agent has detected a change in the cpqDaAccelStatus of an Critical
array accelerator cache board.
The status is represented by the variable cpqDaAccelStatus.
.1.3.6.1.4.1.232.0.3013 cpqDa3AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache board that has Critical
lost battery power.
If data was being stored in the accelerator memory when the server lost power, that data
has been lost.
.1.3.6.1.4.1.232.0.3014 cpqDa3AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the array Critical
accelerator cache board. The current battery status is indicated by the cpqDaAccelBattery
variable.
.1.3.6.1.4.1.232.0.3015 cpqDaCntlrStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
controller.
The variable cpqDaCntlrBoardStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.3016 cpqDaCntlrActive This trap signifies that the agent has detected that a backup array controller in a duplexed Warning
pair has switched
over to the active role. The variable cpqDaCntlrSlot indicates the active controller slot and
cpqDaCntlrPartnerSlot indicates the backup.
.1.3.6.1.4.1.232.0.3017 cpqDa4SpareStatusChange This trap signifies that the agent has detected a change in the status of a drive array spare Critical
drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
.1.3.6.1.4.1.232.0.3018 cpqDa4PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive status.

Operations and Maintenance Manual Page 90


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.3019 cpqDa4PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold associated with one of the Critical
p physical
drive objects on a drive array has been exceeded.
.1.3.6.1.4.1.232.0.3022 cpqDaTapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a tape drive. Critical
The variable cpqDaTapeDrvStatus indicates the current tape status.
The variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape drive.
.1.3.6.1.4.1.232.0.3023 cpqDaTapeDriveCleaningRequir The agent has detected a tape drive that needs to have a cleaning tape inserted and run. Major
ed This will cause the tape drive heads to be cleaned.
.1.3.6.1.4.1.232.0.3025 cpqDa5AccelStatusChange This trap signifies that the agent has detected a change in the status of an array accelerator Critical
cache board.
The status is represented by the variable cpqDaAccelStatus.
.1.3.6.1.4.1.232.0.3026 cpqDa5AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache board that has Critical
lost battery power.
If data was being stored in the accelerator cache memory when the server lost power, that
data has been lost.
.1.3.6.1.4.1.232.0.3027 cpqDa5AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the array Critical
accelerator cache board.
.1.3.6.1.4.1.232.0.3028 cpqDa5CntlrStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
controller. The variable
cpqDaCntlrBoardStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.3029 cpqDa5PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive status.
.1.3.6.1.4.1.232.0.3030 cpqDa5PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold associated with one of the Critical
p physical drive objects
on a drive array has been exceeded.

Operations and Maintenance Manual Page 91


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.3032 cpqDa2TapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a tape drive. Critical
The variable cpqDaTapeDrvStatus indicates the current tape status.
The variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape drive.
.1.3.6.1.4.1.232.0.3033 cpqDa6CntlrStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
controller.
The variable cpqDaCntlrBoardStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.3034 cpqDa6LogDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array logical Critical
drive. The variable
cpqDaLogDrvStatus indicates the current logical drive status.
.1.3.6.1.4.1.232.0.3035 cpqDa6SpareStatusChange This trap signifies that the agent has detected a change in the status of a drive array spare Critical
drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
.1.3.6.1.4.1.232.0.3036 cpqDa6PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive status.
.1.3.6.1.4.1.232.0.3037 cpqDa6PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold associated Critical
p with one of the physical drive objects on a drive array has been exceeded.
.1.3.6.1.4.1.232.0.3038 cpqDa6AccelStatusChange This trap signifies that the agent has detected a change in the status of an array accelerator Critical
cache board.
The status is represented by the variable cpqDaAccelStatus.
.1.3.6.1.4.1.232.0.3039 cpqDa6AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache board that has Critical
lost battery power.
If data was being stored in the accelerator cache memory when the server lost power, that
data has been lost.
.1.3.6.1.4.1.232.0.3040 cpqDa6AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated with the array Critical
accelerator cache board.

Operations and Maintenance Manual Page 92


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.3043 cpqDa6TapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a tape drive. Critical
The variable cpqDaTapeDrvStatus indicates the current tape status.
The variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape drive.
.1.3.6.1.4.1.232.0.3044 cpqDa6TapeDriveCleaningRequi The agent has detected a tape drive that needs to have a cleaning tape inserted and run. Major
red This will cause the tape drive heads to be cleaned.
.1.3.6.1.4.1.232.0.3046 cpqDa7PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a drive array Critical
physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive status.
.1.3.6.1.4.1.232.0.3047 cpqDa7SpareStatusChange This trap signifies that the agent has detected a change in the status of a drive array spare Critical
drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
.1.3.6.1.4.1.232.0.5001 cpqScsi2CntlrStatusChange The agent has detected a change in the controller status of a SCSI Controller. Critical
The variable cpqScsiCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.5002 cpqScsi2LogDrvStatusChange The agent has detected a change in the Logical Drive Status of a SCSI logical drive. Critical
The current logical drive status is indicated by the cpqScsiLogDrvStatus variable.
.1.3.6.1.4.1.232.0.5003 cpqScsi2PhyDrvStatusChange The agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5004 cpqTapePhyDrvStatusChange The agent has detected a change in the status of a Tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvCondition variable.
.1.3.6.1.4.1.232.0.5005 cpqScsi3CntlrStatusChange The Insight Agent has detected a change in the controller status of a SCSI Controller. Critical
The variable cpqScsiCntlrStatus indicates the current controller status.
.1.3.6.1.4.1.232.0.5006 cpqScsi3PhyDrvStatusChange The agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5007 cpqTape3PhyDrvStatusChange The agent has detected a change in the status of a Tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvCondition variable.
.1.3.6.1.4.1.232.0.5008 cpqTape3PhyDrvCleaningRequir The agent has detected a tape drive that needs to have a cleaning tape inserted and run. Major
ed This will cause the tape drive heads to be cleaned.

Operations and Maintenance Manual Page 93


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.5009 cpqTape3PhyDrvCleanTapeRepl The agent has detected that an autoloader tape unit has a cleaning tape that has been fully Major
ace used and therefore needs to be replaced with a new cleaning tape.
.1.3.6.1.4.1.232.0.5016 cpqTape4PhyDrvStatusChange The Storage Agent has detected a change in the status of a Tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5017 cpqScsi4PhyDrvStatusChange The Storage Agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5019 cpqTape5PhyDrvStatusChange The Storage Agent has detected a change in the status of a tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5020 cpqScsi5PhyDrvStatusChange The Storage Agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5021 cpqScsi3LogDrvStatusChange The Storage Agent has detected a change in the status of a SCSI logical drive. Critical
The current logical drive status is indicated in the cpqScsiLogDrvStatus variable.
.1.3.6.1.4.1.232.0.5022 cpqSasPhyDrvStatusChange The Storage Agent has detected a change in the status of a SAS or SATA physical drive. Critical
The current physical drive status is indicated in the cpqSasPhyDrvStatus variable.
.1.3.6.1.4.1.232.0.5023 cpqSasLogDrvStatusChange The Storage Agent has detected a change in the status of a SAS or SATA logical drive. Critical
The current logical drive status is indicated in the cpqSasLogDrvStatus variable.
.1.3.6.1.4.1.232.0.5024 cpqSasTapeDrvStatusChange The Storage Agent has detected a change in the status of a SAS tape drive. Critical
The current physical drive status is indicated in the cpqSasTapeDrvStatus variable.
.1.3.6.1.4.1.232.0.6001 cpqHe2CorrectableMemoryError The error has been corrected. The current number of correctable memory errors is Minor
reported
in the variable cpqHeCorrMemTotalErrs.
.1.3.6.1.4.1.232.0.6002 cpqHe2CorrectableMemoryLogDi The frequency of errors is so high that the error tracking logic has been temporarily Critical
sabled disabled.
.1.3.6.1.4.1.232.0.6003 cpqHeThermalTempFailed The temperature status has been set to fail. The system will be shut down due to this Critical
thermal condition.

Operations and Maintenance Manual Page 94


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.6004 cpqHeThermalTempDegraded The temperature status has been set to degraded.The server's temperature is outside of Critical
the normal operating range.
.1.3.6.1.4.1.232.0.6005 cpqHeThermalTempOk The temperature status has been set to ok. The server's temperature has returned to the Clear /
normal operating range. Normal
.1.3.6.1.4.1.232.0.6006 cpqHeThermalSystemFanFailed The system fan status has been set to fail. A required system fan is not operating normally. Critical
.1.3.6.1.4.1.232.0.6007 cpqHeThermalSystemFanDegrad The system fan status has been set to degrade. An optional system fan is not operating Critical
ed normally.
.1.3.6.1.4.1.232.0.6008 cpqHeThermalSystemFanOk The system fan status has been set to ok Any previously non-operational system fans have Clear /
returned to normal operation. Normal
.1.3.6.1.4.1.232.0.6009 cpqHeThermalCpuFanFailed The CPU fan status has been set to fail. A processor fan is not operating normally. The Critical
server will be shut down.
.1.3.6.1.4.1.232.0.6010 cpqHeThermalCpuFanOk The CPU fan status has been set to ok. Any previously non-operational processor fans Clear /
have returned to normal operation. Normal
.1.3.6.1.4.1.232.0.6011 cpqHeAsrConfirmation The server was shut down by the Automatic Server Recovery (ASR) feature. It has become Minor
operational again.
.1.3.6.1.4.1.232.0.6012 cpqHeThermalConfirmation The server was shut down due to a thermal anomaly. It has become operational again. Minor
.1.3.6.1.4.1.232.0.6013 cpqHePostError One or more POST errors occurred. Power On Self-Test (POST) errors occur during the Minor
server restart process.
.1.3.6.1.4.1.232.0.6014 cpqHeFltTolPwrSupplyDegraded The fault tolerant power supply sub-system condition has been set to degrade. Critical
.1.3.6.1.4.1.232.0.6015 cpqHe3CorrectableMemoryError A correctable memory error occurred. The error has been corrected. The current number Minor
of correctable
memory errors is reported in the variable cpqHeCorrMemTotalErrs.
.1.3.6.1.4.1.232.0.6016 cpqHe3CorrectableMemoryLogDi The frequency of errors is so high that the error tracking logic has been temporarily Critical
sabled disabled.
.1.3.6.1.4.1.232.0.6017 cpqHe3ThermalTempFailed The temperature status has been set to fail. The system will be shut down due to this Critical
thermal condition.

Operations and Maintenance Manual Page 95


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.6018 cpqHe3ThermalTempDegraded The server's temperature is outside of the normal operating range. The server will be shut Critical
down
.1.3.6.1.4.1.232.0.6019 cpqHe3ThermalTempOk The server's temperature has returned to the normal operating range. Clear /
Normal
.1.3.6.1.4.1.232.0.6020 cpqHe3ThermalSystemFanFailed The system fan status has been set to fail. A required system fan is not operating normally. Critical
The system
will be shut down
.1.3.6.1.4.1.232.0.6021 cpqHe3ThermalSystemFanDegra The system fan status has been set to degrade. An optional system fan is not operating Critical
ded normally.
.1.3.6.1.4.1.232.0.6022 cpqHe3ThermalSystemFanOk The system fan status has been set to ok. Any previously non-operational system fans Clear /
have returned to normal operation. Normal
.1.3.6.1.4.1.232.0.6023 cpqHe3ThermalCpuFanFailed A processor fan is not operating normally. The server will be shut down. Critical
.1.3.6.1.4.1.232.0.6024 cpqHe3ThermalCpuFanOk Any previously non-operational processor fans have returned to normal operation. Clear /
Normal
.1.3.6.1.4.1.232.0.6025 cpqHe3AsrConfirmation The server was shut down by the Automatic Server Recovery (ASR) feature. It has become Minor
operational again.
.1.3.6.1.4.1.232.0.6026 cpqHe3ThermalConfirmation The server was shut down due to a thermal anomaly. It has become operational again. Minor
.1.3.6.1.4.1.232.0.6027 cpqHe3PostError One or more POST errors occurred. Power On Self-Test (POST) errors occur during the Minor
server restart process.
.1.3.6.1.4.1.232.0.6028 cpqHe3FltTolPwrSupplyDegrade The fault tolerant power supply sub-system condition has been set to degrade. Critical
d
.1.3.6.1.4.1.232.0.6029 cpqHe3CorrMemReplaceMemMo The errors have been corrected, but the memory module should be replaced. Minor
dule
.1.3.6.1.4.1.232.0.6032 cpqHe3FltTolPowerRedundancy The Fault Tolerant Power Supplies have lost redundancy for the specified chassis. Critical
Lost

Operations and Maintenance Manual Page 96


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.6033 cpqHe3FltTolPowerSupplyInserte A Fault Tolerant Power Supply has been inserted into the specified chassis and bay Clear /
d location. Normal
.1.3.6.1.4.1.232.0.6034 cpqHe3FltTolPowerSupplyRemo A Fault Tolerant Power Supply has been removed from the specified chassis and bay Major
ved location.
.1.3.6.1.4.1.232.0.6035 cpqHe3FltTolFanDegraded The Fault Tolerant Fan condition has been set to degrade for the specified chassis and fan. Critical
.1.3.6.1.4.1.232.0.6036 cpqHe3FltTolFanFailed The Fault Tolerant Fan condition has been set to fail for the specified chassis and fan. Critical
.1.3.6.1.4.1.232.0.6037 cpqHe3FltTolFanRedundancyLos The Fault Tolerant Fans have lost redundancy for the specified chassis. Critical
t
.1.3.6.1.4.1.232.0.6038 cpqHe3FltTolFanInserted A Fault Tolerant Fan has been inserted into the specified chassis and fan location. Clear /
Normal
.1.3.6.1.4.1.232.0.6039 cpqHe3FltTolFanRemoved A Fault Tolerant Fan has been removed from the specified chassis and fan location. Major
.1.3.6.1.4.1.232.0.6040 cpqHe3TemperatureFailed The temperature status has been set to fail in the specified chassis and location. Critical
The system will be shut down due to this condition.
.1.3.6.1.4.1.232.0.6041 cpqHe3TemperatureDegraded The server's temperature is outside of the normal operating range. The server will be shut Critical
down
.1.3.6.1.4.1.232.0.6042 cpqHe3TemperatureOk The server's temperature has returned to the normal operating range. Clear /
Normal
.1.3.6.1.4.1.232.0.6043 cpqHe3PowerConverterDegrade The DC-DC Power Converter condition has been set to degrade for the specified chassis, Critical
d slot and socket.
.1.3.6.1.4.1.232.0.6044 cpqHe3PowerConverterFailed The DC-DC Power Converter condition has been set to fail for the specified chassis, slot Critical
and socket.
.1.3.6.1.4.1.232.0.6045 cpqHe3PowerConverterRedunda The DC-DC Power Converters have lost redundancy for the specified chassis. Critical
ncyLost
.1.3.6.1.4.1.232.0.6046 cpqHe3CacheAccelParityError A cache accelerator parity error indicates a cache module needs to be replaced. Critical

Operations and Maintenance Manual Page 97


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.6047 cpqHeResilientMemOnlineSpare The Advanced Memory Protection subsystem has detected a memory fault. The Online Major
Engaged Spare Memory has been activated.
.1.3.6.1.4.1.232.0.6048 cpqHe4FltTolPowerSupplyOk The fault tolerant power supply condition has been set back to the OK state for the Clear /
specified chassis and bay location. Normal
.1.3.6.1.4.1.232.0.6049 cpqHe4FltTolPowerSupplyDegra The fault tolerant power supply condition has been set to degrade for the specified chassis Critical
ded and bay location.
.1.3.6.1.4.1.232.0.6050 cpqHe4FltTolPowerSupplyFailed The fault tolerant power supply condition has been set to fail for the specified chassis and Critical
bay location.
.1.3.6.1.4.1.232.0.6051 cpqHeResilientMemMirroredMem The Advanced Memory Protection subsystem has detected a memory fault. Mirrored Major
oryEngaged Memory has been activated.
.1.3.6.1.4.1.232.0.6052 cpqHeResilientAdvancedECCMe The Advanced Memory Protection subsystem has detected a memory fault. Advanced ECC Major
moryEngaged has been activated.
.1.3.6.1.4.1.232.0.6053 cpqHeResilientMemXorMemoryE The Advanced Memory Protection subsystem has detected a memory fault. The XOR Major
ngaged engine has been activated.
.1.3.6.1.4.1.232.0.6054 cpqHe3FltTolPowerRedundancy The Fault Tolerant Power Supplies have returned to a redundant state for the specified Clear /
Restored chassis. Normal
.1.3.6.1.4.1.232.0.6055 cpqHe3FltTolFanRedundancyRe The Fault Tolerant Fans have returned to a redundant state for the specified chassis. Clear /
stored Normal
.1.3.6.1.4.1.232.0.6056 cpqHe4CorrMemReplaceMemMo The errors have been corrected, but the memory module should be replaced. Minor
dule
.1.3.6.1.4.1.232.0.6057 cpqHeResMemBoardRemoved An Advanced Memory Protection sub-system board or cartridge has been removed from Warning
the system.
.1.3.6.1.4.1.232.0.6058 cpqHeResMemBoardInserted An Advanced Memory Protection sub-system board or cartridge has been inserted into the Clear /
system. Normal
.1.3.6.1.4.1.232.0.6059 cpqHeResMemBoardBusError An Advanced Memory Protection sub-system board or cartridge bus error has been Critical
detected.

Operations and Maintenance Manual Page 98


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.6061 cpqHeManagementProcInReset The management processor is currently in the process of being reset because of a Minor
firmware update or some other event.
.1.3.6.1.4.1.232.0.6062 cpqHeManagementProcReady The management processor has successfully reset and is now available again. Clear /
Normal
.1.3.6.1.4.1.232.0.6063 cpqHeManagementProcFailedRe The management processor was not successfully reset and is not operational. Critical
set
.1.3.6.1.4.1.232.0.6069 cpqHe4FltTolPowerSupplyACpo The fault tolerant power supply AC condition has been set to “failed” for the specified Critical
werloss chassis and bay location.
.1.3.6.1.4.1.232.0.8001 cpqSs2FanStatusChange The agent has detected a change in the Fan Status of a storage system. The variable Critical
cpqSsBoxFanStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8002 cpqSsTempFailed The agent has detected that a temperature status has been set to fail. The storage system Critical
will be shut down.
.1.3.6.1.4.1.232.0.8003 cpqSsTempDegraded The agent has detected a temperature status that has been set to degrade. The storage Major
system's temperature is outside of the normal operating range.
.1.3.6.1.4.1.232.0.8004 cpqSsTempOk The temperature status has been set to OK. The storage system's temperature has Clear /
returned to normal operating range. Normal
.1.3.6.1.4.1.232.0.8005 cpqSsSidePanelInPlace The side panel status has been set to in place. The storage system's side panel has Clear /
returned to a properly installed state. Normal
.1.3.6.1.4.1.232.0.8006 cpqSsSidePanelRemoved The side panel status has been set to remove. The storage system's side panel is not in a Major
properly installed state.
This situation may result in improper cooling of the drives in the storage system due to
airflow changes caused by the missing side panel.
.1.3.6.1.4.1.232.0.8007 cpqSsPwrSupplyDegraded A storage system power supply status has been set to degrade. Critical
.1.3.6.1.4.1.232.0.8008 cpqSs3FanStatusChange The agent has detected a change in the Fan Status of a storagesystem. The variable Critical
cpqSsBoxFanStatus indicates the current fan status.

Operations and Maintenance Manual Page 99


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.8009 cpqSs3TempFailed The agent has detected that a temperature status has been set to fail. The storage system Critical
will be shut down.
.1.3.6.1.4.1.232.0.8010 cpqSs3TempDegraded The agent has detected a temperature status that has been set to degrade. The storage Major
system's temperature is outside of the normal operating range.
.1.3.6.1.4.1.232.0.8011 cpqSs3TempOk The temperature status has been set to OK. The storage system's temperature has Clear /
returned to normal operating range. It may be reactivated by the administrator. Normal
.1.3.6.1.4.1.232.0.8012 cpqSs3SidePanelInPlace The side panel status has been set to in place. The storage system's side Clear /
panel has returned to a properly installed state. Normal
.1.3.6.1.4.1.232.0.8013 cpqSs3SidePanelRemoved The side panel status has been set to remove. Major
The storage system's side panel is not in a properly installed state. This situation may result
in improper cooling of the drives in the storage system due to airflow changes caused by
the missing side panel.
.1.3.6.1.4.1.232.0.8014 cpqSs3PwrSupplyDegraded A storage system power supply status has been set to degrade. Critical
.1.3.6.1.4.1.232.0.8015 cpqSs4PwrSupplyDegraded A storage system power supply status has been set to degraded. Critical
.1.3.6.1.4.1.232.0.8016 cpqSsExFanStatusChange The agent has detected a change in the Fan Module Status of a storage system. The Critical
variable cpqSsFanModuleStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8017 cpqSsExPowerSupplyStatusCha The agent has detected a change in the power supply status of a storage system. The Critical
nge variable cpqSsPowerSupplyStatus indicates the status.
.1.3.6.1.4.1.232.0.8018 cpqSsExPowerSupplyUpsStatus The agent has detected a change status of a UPS attached to a storage system power Critical
Change supply. The variable cpqSsPowerSupplyUpsStatus indicates the status.
.1.3.6.1.4.1.232.0.8019 cpqSsExTempSensorStatusChan The agent has detected a change in the status of a storage system temperature sensor. Critical
ge The variable cpqSsTempSensorStatus indicates the status.
.1.3.6.1.4.1.232.0.8020 cpqSsEx2FanStatusChange The agent has detected a change in the fan module status of a storage system. The Critical
variable cpqSsFanModuleStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8021 cpqSsEx2PowerSupplyStatusCh The agent has detected a change in the power supply status of a storage system. The Critical
ange variable cpqSsPowerSupplyStatus indicates the status.

Operations and Maintenance Manual Page 100


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.8022 cpqSsExBackplaneFanStatusCh The agent has detected a change in the fan status of a storage system. The variable Critical
ange cpqSsBackplaneFanStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8023 cpqSsExBackplaneTempStatusC The agent has detected a change in the status of the temperature in a storage system. The Critical
hange variable cpqSsBackplaneTempStatus indicates the status.
.1.3.6.1.4.1.232.0.8024 cpqSsExBackplanePowerSupply The agent has detected a change in the power supply status of a storage system. The Critical
StatusChange variable cpqSsBackplaneFtpsStatus indicates the status.
.1.3.6.1.4.1.232.0.8025 cpqSsExRecoveryServerStatusC The agent has detected a change in the recovery server option status of a storage system. Major
hange The variable cpqSsChassisRsoStatus indicates the status.
.1.3.6.1.4.1.232.0.8026 cpqSs5FanStatusChange The agent has detected a change in the Fan Status of a storage system. The variable Critical
cpqSsBoxFanStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8027 cpqSs5TempStatusChange The agent has detected a change in the temperature status of a storage system. The Critical
variable cpqSsBoxTempStatus indicates the current temperature status.
.1.3.6.1.4.1.232.0.8028 cpqSs5PwrSupplyStatusChange The agent has detected a change in the power supply status of a storage system. The Critical
variable cpqSsBoxFltTolPwrSupplyStatus indicates the current power supply status.
.1.3.6.1.4.1.232.0.8029 cpqSs6FanStatusChange The agent has detected a change in the Fan Status of a storage system. The variable Critical
cpqSsBoxFanStatus indicates the current fan status.
.1.3.6.1.4.1.232.0.8030 cpqSs6TempStatusChange The agent has detected a change in the temperature status of a storage system. The Critical
variable cpqSsBoxTempStatus indicates the current temperature status.
.1.3.6.1.4.1.232.0.8031 cpqSs6PwrSupplyStatusChange The agent has detected a change in the power supply status of a storage system. The Critical
variable cpqSsBoxFltTolPwrSupplyStatus indicates the current power supply status.
.1.3.6.1.4.1.232.0.8032 cpqSsConnectionStatusChange Storage system connection status change. The agent has detected a change in the Major
connection status of a storage system.
.1.3.6.1.4.1.232.0.9001 cpqSm2ServerReset The Remote Insight/ Integrated Lights-Out firmware has detected a server reset. Critical
.1.3.6.1.4.1.232.0.9002 cpqSm2ServerPowerOutage The Remote Insight/ Integrated Lights-Out firmware has detected server power failure. Critical
.1.3.6.1.4.1.232.0.9003 cpqSm2UnauthorizedLoginAttem The Remote Insight/ Integrated Lights-Out firmware has detected unauthorized login Warning
pts attempts.

Operations and Maintenance Manual Page 101


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

.1.3.6.1.4.1.232.0.9004 cpqSm2BatteryFailed The Remote Insight battery has failed and needs to be replaced. Critical
.1.3.6.1.4.1.232.0.9005 cpqSm2SelfTestError The Remote Insight/ Integrated Lights-Out firmware has detected a Remote Insight “self- Critical
test” error.
.1.3.6.1.4.1.232.0.9006 cpqSm2InterfaceError The host OS has detected an error in the Remote Insight/ Integrated Lights-Out interface. Major
The firmware is not responding.
.1.3.6.1.4.1.232.0.9007 cpqSm2BatteryDisconnected The Remote Insight battery cable has been disconnected. Major
.1.3.6.1.4.1.232.0.9008 cpqSm2KeyboardCableDisconne The Remote Insight keyboard cable has been disconnected. Major
cted
.1.3.6.1.4.1.232.0.9009 cpqSm2MouseCableDisconnecte The Remote Insight mouse cable has been disconnected. Major
d
.1.3.6.1.4.1.232.0.9010 cpqSm2ExternalPowerCableDisc The Remote Insight external power cable has been disconnected. Major
onnected
.1.3.6.1.4.1.232.0.9011 cpqSm2LogsFull The Remote Insight/ Integrated Lights-Out firmware has detected the logs are full. Clear /
Normal
.1.3.6.1.4.1.232.0.9012 cpqSm2SecurityOverrideEngage The Remote Insight/ Integrated Lights-Out firmware has detected the security override Clear /
d jumper has been toggled to the engaged position. Normal
.1.3.6.1.4.1.232.0.9013 cpqSm2SecurityOverrideDisenga The Remote Insight/ Integrated Lights-Out firmware has detected the security override Clear /
ged jumper has been toggled to the disengaged position. Normal
.1.3.6.1.4.1.232.0.9015 cpqSm2NicLinkDown The Remote Insight/ Integrated Lights-Out firmware has detected the loss of network link. Major
.1.3.6.1.4.1.232.0.9016 cpqSm2NicLinkUp The Remote Insight/ Integrated Lights-Out firmware has detected the presence of network Clear/
link. Normal
1.3.6.1.4.1.8072.4.0.1 cpqNsNotifyStart NetSNMP: Agent started. Critical
1.3.6.1.4.1.8072.4.0.2 cpqNsNotifyShutdown NetSNMP: Agent shutting down. Critical
1.3.6.1.4.1.8072.4.0.3 cpqNsNotifyRestart NetSNMP: Agent restarted. Critical

Operations and Maintenance Manual Page 102


Proprietary and Confidential HP Servers

6.3 HP Server Alarms – Standard X733


The following table lists the alarms as they are sent for a system with the TOMIA proprietary OSS monitoring
mediator tool.
Table 26: HP Server Alarms - X733

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.1 shqNsNotifyStart NetSNMP: Agent started. Critical


1.3.6.1.4.1.8161.200.4.0.10001 shqMeRisingAlarm An alarm entry has crossed its rising threshold. The instances of those Critical
objects contained within the variable list are those of the alarm entry, which
generated this trap.
1.3.6.1.4.1.8161.200.4.0.10002 shqMeFallingAlarm An alarm entry has crossed its falling threshold. The instances of those Critical
objects contained within the variable list are those of the alarm entry, which
generated this trap.
1.3.6.1.4.1.8161.200.4.0.10003 shqMe2RisingAlarm An alarm entry has crossed its rising threshold. The instances of those Critical
objects contained within the variable list are those of the alarm entry, which
generated this trap.
1.3.6.1.4.1.8161.200.4.0.10004 shqMe2FallingAlarm An alarm entry has crossed its falling threshold. The instances of those Critical
objects contained within the variable list are those of the alarm entry, which
generated this trap.
1.3.6.1.4.1.8161.200.4.0.10005 shqMeRisingAlarmExtended An alarm entry has crossed its rising threshold. The instances of those Major
objects contained within the variable
list are those of the alarm entry, which generated this trap.
1.3.6.1.4.1.8161.200.4.0.10006 shqMeFallingAlarmExtended An alarm entry has crossed its falling threshold. The instances of those Major
objects contained within the variable list are those of the alarm entry, which
generated this trap.
1.3.6.1.4.1.8161.200.4.0.10007 shqMeCriticalRisingAlarmExtende An alarm entry has crossed its Critical rising threshold. The instances of Critical
d those objects contained within the
variable list are those of the alarm entry, which generated this trap.

Operations and Maintenance Manual Page 103


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.10008 shqMeCriticalFallingAlarmExtende An alarm entry has crossed its Critical falling threshold. The instances of Critical
d those objects contained within the variable list are those of the alarm entry,
which generated this trap.
1.3.6.1.4.1.8161.200.4.0.1001 shqSeCpuThresholdPassed This trap is sent when an internal CPU error threshold has been passed on Minor
a particular CPU, causing it to go DEGRADED.
1.3.6.1.4.1.8161.200.4.0.11001 shqHoGenericTrap Generic trap. Major
1.3.6.1.4.1.8161.200.4.0.11002 shqHoAppErrorTrap An application has generated an exception. Specific error information is Major
contained in the variable cpqHoSwPerfAppErrorDesc.
1.3.6.1.4.1.8161.200.4.0.11003 shqHo2GenericTrap Generic trap. Major
1.3.6.1.4.1.8161.200.4.0.11004 shqHo2AppErrorTrap An application has generated an exception. Specific error information is Major
contained in the variable cpqHoSwPerfAppErrorDesc.
1.3.6.1.4.1.8161.200.4.0.11005 shqHo2NicStatusOk This trap will be sent any time the status of a NIC changes to the OK Normal
condition.
1.3.6.1.4.1.8161.200.4.0.11006 shqHo2NicStatusFailed This trap will be sent any time the status of a NIC changes to the Failed Major
condition.
1.3.6.1.4.1.8161.200.4.0.11007 shqHo2NicSwitchoverOccurred This trap will be sent any time the configured redundant NIC becomes the Major
active NIC.
1.3.6.1.4.1.8161.200.4.0.11008 shqHo2NicStatusOk2 This trap will be sent any time the status of a NIC changes to the OK Normal
condition.
1.3.6.1.4.1.8161.200.4.0.11009 shqHo2NicStatusFailed2 This trap will be sent any time the status of a NIC changes to the Failed Major
condition.
1.3.6.1.4.1.8161.200.4.0.11010 shqHo2NicSwitchoverOccurred2 This trap will be sent any time the configured redundant NIC becomes the Major
active NIC.
1.3.6.1.4.1.8161.200.4.0.11011 shqHoProcessEventTrap A monitored process has either started or stopped running. Major
1.3.6.1.4.1.8161.200.4.0.11015 shqHoCrashDumpNotEnabledTra This trap is sent to the user to notify him that the Crash Dump is not Warning
p enabled.

Operations and Maintenance Manual Page 104


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.11016 shqHoBootPagingFileTooSmallTr This trap is sent when the paging file size of the boot volume or the target Warning
ap volume of the memory dump file is too small to hold a crash dump.
1.3.6.1.4.1.8161.200.4.0.14001 shqIdeDriveDegraded An IDE drive status has been set to DEGRADED. Critical
1.3.6.1.4.1.8161.200.4.0.14002 shqIdeDriveOk An IDE drive status has been set to OK. Normal
1.3.6.1.4.1.8161.200.4.0.14005 shqIdeLogicalDriveStatusChange This trap signifies that the agent has detected a change in the status of an Critical
IDE logical drive. The variable cpqIdeLogicalDriveStatus indicates the
current logical drive status.
1.3.6.1.4.1.8161.200.4.0.16001 shqFcaLogDrvStatusChange This trap signifies that the agent has detected a change in the status of Critical
External Array logical drive.
The variable cpqFcaLogDrvStatus indicates the current logical drive status.
1.3.6.1.4.1.8161.200.4.0.16002 shqFcaSpareStatusChange This trap signifies that the agent has detected a change in the status of Critical
External Array spare drive. The variable cpqFcaSpareStatus indicates the
current spare drive status. The variable cpqFcaSpareBusNumber indicates
the SCSI bus number associated with this drive.
1.3.6.1.4.1.8161.200.4.0.16003 shqFcaPhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
physical drive.
The variable cpaFcaPhyDrvStatus indicates the current physical drive
status.
The variable cpqFcaPhyDrvBusNumber indicates the SCSI bus number
associated with this drive.
1.3.6.1.4.1.8161.200.4.0.16004 shqFcaAccelStatusChange This trap signifies that the agent has detected a change in the Critical
cpqFcaAccelStatus of Array Accelerator Cache Board.
The status is represented by the variable cpqFcaAccelStatus.
1.3.6.1.4.1.8161.200.4.0.16005 shqFcaAccelBadDataTrap This trap signifies that the agent has detected Array Accelerator Cache Critical
Board that has lost battery power.
If data was being stored in the accelerator memory when the system lost
power, that data has been lost.

Operations and Maintenance Manual Page 105


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.16006 shqFcaAccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the Array Accelerator Cache Board.
1.3.6.1.4.1.8161.200.4.0.16007 shqFcaCntlrStatusChange This trap signifies that the agent has detected a change in the status of Critical
External Array Controller.
The variable cpqFcaCntlrStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.16014 shqFcaCntlrActive This trap signifies that the Storage Agent has detected that a backup array Warning
controller in a duplexed pair has switched over to the active role. The
variable cpqFcaCntlrBoxIoSlot indicates the new active controller index.
1.3.6.1.4.1.8161.200.4.0.16015 shqFcaHostCntlrStatusChange This trap signifies that the Insight Agent has detected a change in the Critical
status of a Fibre Channel Host Controller. The variable
cpqFcaHostCntlrStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.16016 shqFca2PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
physical drive. The variable cpaFcaPhyDrvStatus indicates the current
physical drive status.
1.3.6.1.4.1.8161.200.4.0.16017 shqFca2AccelStatusChange This trap signifies that the agent has detected a change in the status of an Critical
Array Accelerator Cache Board. The status is represented by the variable
cpqFcaAccelStatus.
1.3.6.1.4.1.8161.200.4.0.16018 shqFca2AccelBadDataTrap This trap signifies that the agent has detected an Array Accelerator Cache Critical
Board that has lost battery power. If data was being stored in the
accelerator memory when the system lost power, that data has been lost.
1.3.6.1.4.1.8161.200.4.0.16019 shqFca2AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the Array Accelerator Cache Board.
1.3.6.1.4.1.8161.200.4.0.16020 shqFca2CntlrStatusChange This trap signifies that the agent has detected a change in the status of Critical
External Array Controller. The variable cpqFcaCntlrStatus indicates the
current controller status.
1.3.6.1.4.1.8161.200.4.0.16021 shqFca2HostCntlrStatusChange This trap signifies that the agent has detected a change in the status of a Critical
Fibre Channel Host Controller. The variable cpqFcaHostCntlrStatus
indicates the current controller status.

Operations and Maintenance Manual Page 106


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.16022 shqExtArrayLogDrvStatusChange This trap signifies that the agent has detected a change in the status of an Critical
External Array logical drive. The variable cpqFcaLogDrvStatus indicates
the current logical drive status.
1.3.6.1.4.1.8161.200.4.0.16028 shqFca3HostCntlrStatusChange This trap signifies that the agent has detected a change in the status of a Critical
Fibre Channel Host Controller. The variable cpqFcaHostCntlrStatus
indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.18001 shqNicConnectivityRestored This trap will be sent any time connectivity is restored to a logical adapter. Normal
This occurs when the physical adapter in a single adapter configuration
returns to the OK condition or at least one physical adapter in a logical
adapter group returns to the OK condition. This can be caused by
replacement of a faulty cable or re-attaching a cable that was unplugged.
1.3.6.1.4.1.8161.200.4.0.18002 shqNicConnectivityLost This trap will be sent any time the status of a logical adapter changes to the Major
FAILED condition. This occurs when the adapter in a single adapter
configuration fails, or when the last adapter in a redundant configuration
fails. This can be caused by loss of link due to a cable being removed from
the adapter or the Hub or Switch. Internal adapter, Hub, or Switch failures
can also cause this condition.
1.3.6.1.4.1.8161.200.4.0.18003 shqNicRedundancyIncreased This trap will be sent any time a previously failed physical adapter in a Normal
connected logical adapter group returns to the OK condition. This trap is
not sent when a logical adapter group has connectivity restored from a
FAILED condition.
The cpqNicConnectivityRestored trap is sent instead. This can be caused
by replacement of a faulty cable or re-attaching a cable that was
unplugged.
1.3.6.1.4.1.8161.200.4.0.18004 shqNicRedundancyReduced This trap will be sent any time a physical adapter in a logical adapter group Major
changes to the FAILED condition, but at least one physical adapter
remains in the OK condition. This can be caused by loss of link due to a
cable being removed from the adapter or the Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.

Operations and Maintenance Manual Page 107


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.18005 shqNic2ConnectivityRestored This trap will be sent any time connectivity is restored to a logical adapter. Normal
This occurs when the physical adapter in a single adapter configuration
returns to the OK condition or at least one physical adapter in a logical
adapter group returns to the OK condition. This can be caused by
replacement of a faulty cable or re-attaching a cable that was unplugged.
1.3.6.1.4.1.8161.200.4.0.18006 shqNic2ConnectivityLost This trap will be sent any time the status of a logical adapter changes to the Major
FAILED condition. This occurs when the adapter in a single adapter
configuration fails, or when the last adapter in a redundant configuration
fails. This can be caused by loss of link due to a cable being removed from
the adapter or the Hub or Switch. Internal adapter, Hub, or Switch failures
can also cause this condition.
1.3.6.1.4.1.8161.200.4.0.18007 shqNic2RedundancyIncreased This trap will be sent any time a previously failed physical adapter in a Normal
connected logical adapter group returns to the OK condition. This trap is
not sent when a logical adapter group has connectivity restored from a
FAILED condition.
The cpqNicConnectivityRestored trap is sent instead. This can be caused
by replacement of a faulty cable or re-attaching a cable that was
unplugged.
1.3.6.1.4.1.8161.200.4.0.18008 shqNic2RedundancyReduced This trap will be sent any time a physical adapter in a logical adapter group Major
changes to the FAILED condition, but at least one physical adapter
remains in the OK condition. This can be caused by loss of link due to a
cable being removed from the adapter or the Hub or Switch. Internal
adapter, Hub, or Switch failures can also cause this condition.
1.3.6.1.4.1.8161.200.4.0.18009 shqNicVirusLikeActivityDetected This trap will be sent when the Virus Throttle Filter Driver detects virus like Critical
activity.
1.3.6.1.4.1.8161.200.4.0.18010 shqNicVirusLikeActivityStopped This trap will be sent when the Virus Throttle Filter Driver no longer detects Normal
virus like activity.

Operations and Maintenance Manual Page 108


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.18011 shqNic3ConnectivityRestored This trap will be sent any time the status of a logical adapter changes to the Clear/
Failed condition. This occurs when the adapter in a single adapter Normal
configuration fails, or when the last adapter in a redundant configuration
fails
1.3.6.1.4.1.8161.200.4.0.18012 shqNic3ConnectivityLost This trap will be sent any time connectivity is restored to a logical adapter. Major
This occurs when the physical adapter in a single adapter configuration
returns to the OK condition or at least one physical adapter in a logical
adapter group returns to the OK condition
1.3.6.1.4.1.8161.200.4.0.18013 shqNic3RedundancyIncreased This trap will be sent any time a previously failed physical adapter in a Clear/
connected logical adapter group returns to the OK condition. Normal
1.3.6.1.4.1.8161.200.4.0.18014 shqNic3RedundancyReduced This trap will be sent any time a physical adapter in a logical adapter group Major
changes to the Failed condition, but at least one physical adapter remains
in the OK condition
1.3.6.1.4.1.8161.200.4.0.19001 shqOsCpuTimeDegraded The Processor Time performance property is set to DEGRADED. Critical
1.3.6.1.4.1.8161.200.4.0.19002 shqOsCpuTimeFailed The Processor Time performance property is set to CRITICAL. Critical
1.3.6.1.4.1.8161.200.4.0.19003 shqOsCacheCopyReadHitsDegra The Cache CopyReadHits performance property is set to DEGRADED. Critical
ded
1.3.6.1.4.1.8161.200.4.0.19004 shqOsCacheCopyReadHitsFailed The Cache CopyReadHits performance property is set to CRITICAL. Critical
1.3.6.1.4.1.8161.200.4.0.19005 shqOsPageFileUsageDegraded The PagingFile Usage performance property is set to DEGRADED. Critical
1.3.6.1.4.1.8161.200.4.0.19006 shqOsPageFileUsageFailed The PagingFile Usage performance property is set to CRITICAL. Critical
1.3.6.1.4.1.8161.200.4.0.19007 shqOsLogicalDiskBusyTimeDegra The LogicalDisk BusyTime performance property is set to DEGRADED. Critical
ded
1.3.6.1.4.1.8161.200.4.0.19008 shqOsLogicalDiskBusyTimeFailed The LogicalDisk BusyTime performance property is set to CRITICAL. Critical
1.3.6.1.4.1.8161.200.4.0.2 shqNsNotifyShutdown NetSNMP: Agent shutting down. Critical

Operations and Maintenance Manual Page 109


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.2001 shqSiHoodRemoved The hood status has been set to REMOVED. The system's hood is not in Major
a properly installed state. This situation may result in improper cooling of
the system due to airflow changes caused by the missing hood.
1.3.6.1.4.1.8161.200.4.0.2008 shqSiHotPlugSlotBoardRemoved A Hot Plug Slot Board has been removed from the specified chassis and Warning
slot.
1.3.6.1.4.1.8161.200.4.0.2009 shqSiHotPlugSlotBoardInserted A Hot Plug Slot Board has been inserted into the specified chassis and Normal
slot.
1.3.6.1.4.1.8161.200.4.0.3 shqNsNotifyRestart NetSNMP: Agent restarted. Critical
1.3.6.1.4.1.8161.200.4.0.3001 shqDa2LogDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array logical drive. The variable cpqDaLogDrvStatus indicates the
current logical drive status.
1.3.6.1.4.1.8161.200.4.0.3002 shqDa2SpareStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array spare drive. The variable cpqDaSpareStatus indicates the
current spare drive status. The variable cpqDaSpareBusNumber indicates
the SCSI bus number associated with this drive.
1.3.6.1.4.1.8161.200.4.0.3003 shqDa2PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive. The variable cpaDaPhyDrvStatus indicates the
current physical drive status. The variable cpqDaPhyDrvBusNumber
indicates the SCSI bus number associated with this drive.
1.3.6.1.4.1.8161.200.4.0.3004 shqDa2PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold Critical
p associated with one of the physical drive objects on a drive array has been
exceeded. The variable cpqDaPhyDrvBusNumber indicates the SCSI bus
number associated with the drive.
1.3.6.1.4.1.8161.200.4.0.3005 shqDa2AccelStatusChange This trap signifies that the Insight Agent has detected a change in the Critical
cpqDaAccelStatus of array accelerator cache. The status is represented by
the variable cpqDaAccelStatus.

Operations and Maintenance Manual Page 110


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.3006 shqDa2AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache Critical
board that has lost battery power. If data was being stored in the
accelerator memory when the server lost power, that data has been lost.
1.3.6.1.4.1.8161.200.4.0.3007 shqDa2AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the array accelerator cache board. The current battery status is
indicated by the cpqDaAccelBattery variable.
1.3.6.1.4.1.8161.200.4.0.3008 shqDa3LogDrvStatusChange This trap signifies that the agent has detected a change in thestatus of a Critical
drive array logical drive.
The variable cpqDaLogDrvStatus indicates the current logical drive status.
1.3.6.1.4.1.8161.200.4.0.3009 shqDa3SpareStatusChange This trap signifies that the agent has detected a change in thestatus of a Critical
drive array spare drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
The variable cpqDaSpareBusNumber indicates the SCSI bus number
associated with this drive.
1.3.6.1.4.1.8161.200.4.0.3010 shqDa3PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive
status.
The variable cpqDaPhyDrvBusNumber indicates the SCSI bus number
associated with this drive.
1.3.6.1.4.1.8161.200.4.0.3011 shqDa3PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold Critical
p associated with one of the physical drive objects on a drive array has been
exceeded. The variable cpqDaPhyDrvBusNumber indicates the SCSI bus
number associated with the drive.
1.3.6.1.4.1.8161.200.4.0.3012 shqDa3AccelStatusChange This trap signifies that the agent has detected a change in the Critical
cpqDaAccelStatus of an array accelerator cache board. The status is
represented by the variable cpqDaAccelStatus.

Operations and Maintenance Manual Page 111


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.3013 shqDa3AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache Critical
board that has lost battery power. If data was being stored in the
accelerator memory when the server lost power, that data has been lost.
1.3.6.1.4.1.8161.200.4.0.3014 shqDa3AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the array accelerator cache board. The current battery status is
indicated by the cpqDaAccelBattery variable.
1.3.6.1.4.1.8161.200.4.0.3015 shqDaCntlrStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array controller.
The variable cpqDaCntlrBoardStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.3016 shqDaCntlrActive This trap signifies that the agent has detected that a backup array Warning
controller in a duplexed pair has switched over to the active role. The
variable cpqDaCntlrSlot indicates the active controller slot and
cpqDaCntlrPartnerSlot indicates the backup.
1.3.6.1.4.1.8161.200.4.0.3017 shqDa4SpareStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array spare drive. The variable cpqDaSpareStatus indicates the
current spare drive status.
1.3.6.1.4.1.8161.200.4.0.3018 shqDa4PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive. The variable cpaDaPhyDrvStatus indicates the
current physical drive status.
1.3.6.1.4.1.8161.200.4.0.3019 shqDa4PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold Critical
p associated with one of the physical drive objects on a drive array has been
exceeded.
1.3.6.1.4.1.8161.200.4.0.3022 shqDaTapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a Critical
tape drive.
The variable cpqDaTapeDrvStatus indicates the current tape status. The
variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape drive.

Operations and Maintenance Manual Page 112


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.3023 shqDaTapeDriveCleaningRequire The agent has detected a tape drive that needs to have a cleaning tape Major
d inserted and run.
This will cause the tape drive heads to be cleaned.
1.3.6.1.4.1.8161.200.4.0.3025 shqDa5AccelStatusChange This trap signifies that the agent has detected a change in the status of an Critical
array accelerator cache board. The status is represented by the variable
cpqDaAccelStatus.
1.3.6.1.4.1.8161.200.4.0.3026 shqDa5AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache Critical
board that has lost battery power. If data was being stored in the
accelerator cache memory when the server lost power, that data has been
lost.
1.3.6.1.4.1.8161.200.4.0.3027 shqDa5AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the array accelerator cache board.
1.3.6.1.4.1.8161.200.4.0.3028 shqDa5CntlrStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array controller. The variable cpqDaCntlrBoardStatus indicates the
current controller status.
1.3.6.1.4.1.8161.200.4.0.3029 shqDa5PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive. The variable cpaDaPhyDrvStatus indicates the
current physical drive status.
1.3.6.1.4.1.8161.200.4.0.3030 shqDa5PhyDrvThresHPassedTra This trap signifies that the agent has detected a factory threshold Critical
p associated with one of the physical drive objects on a drive array has been
exceeded.
1.3.6.1.4.1.8161.200.4.0.3032 shqDa2TapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a Critical
tape drive.
The variable cpqDaTapeDrvStatus indicates the current tape status.
The variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape
drive.

Operations and Maintenance Manual Page 113


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.3033 shqDa6CntlrStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array controller.
The variable cpqDaCntlrBoardStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.3034 shqDa6LogDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array logical drive. The variable cpqDaLogDrvStatus indicates the
current logical drive status.
1.3.6.1.4.1.8161.200.4.0.3035 shqDa6SpareStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array spare drive. The variable cpqDaSpareStatus indicates the
current spare drive status.
1.3.6.1.4.1.8161.200.4.0.3036 shqDa6PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive. The variable cpaDaPhyDrvStatus indicates the
current physical drive status.
1.3.6.1.4.1.8161.200.4.0.3037 shqDa6PhyDrvThresHPassedTra This trap signifies that the agent has detected that a factory threshold Critical
p associated with one of the physical drive objects on a drive array has been
exceeded.
1.3.6.1.4.1.8161.200.4.0.3038 shqDa6AccelStatusChange This trap signifies that the agent has detected a change in the status of an Critical
array accelerator cache board. The status is represented by the variable
cpqDaAccelStatus.
1.3.6.1.4.1.8161.200.4.0.3039 shqDa6AccelBadDataTrap This trap signifies that the agent has detected an array accelerator cache Critical
board that has lost battery power. If data was being stored in the
accelerator cache memory when the server lost power, that data has been
lost.
1.3.6.1.4.1.8161.200.4.0.3040 shqDa6AccelBatteryFailed This trap signifies that the agent has detected a battery failure associated Critical
with the array accelerator cache board.

Operations and Maintenance Manual Page 114


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.3043 shqDa6TapeDriveStatusChange This trap signifies that the agent has detected a change in the status of a Critical
tape drive.
The variable cpqDaTapeDrvStatus indicates the current tape status.
The variable cpqDaTapeDrvScsiIdIndex indicates the SCSI ID of the tape
drive.
1.3.6.1.4.1.8161.200.4.0.3044 shqDa6TapeDriveCleaningRequir The agent has detected a tape drive that needs to have a cleaning tape Major
ed inserted and run.
This will cause the tape drive heads to be cleaned.
1.3.6.1.4.1.8161.200.4.0.3046 shqDa7PhyDrvStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array physical drive.
The variable cpaDaPhyDrvStatus indicates the current physical drive
status.
1.3.6.1.4.1.8161.200.4.0.3047 shqDa7SpareStatusChange This trap signifies that the agent has detected a change in the status of a Critical
drive array spare drive.
The variable cpqDaSpareStatus indicates the current spare drive status.
1.3.6.1.4.1.8161.200.4.0.5001 shqScsi2CntlrStatusChange The agent has detected a change in the controller status of a SCSI Critical
Controller.
The variable cpqScsiCntlrStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.5002 shqScsi2LogDrvStatusChange The agent has detected a change in the Logical Drive Status of a SCSI Critical
logical drive.
The current logical drive status is indicated by the cpqScsiLogDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5003 shqScsi2PhyDrvStatusChange The agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5004 shqTapePhyDrvStatusChange The agent has detected a change in the status of a Tape drive. The current Critical
physical drive status is indicated in the cpqTapePhyDrvCondition variable.

Operations and Maintenance Manual Page 115


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.5005 shqScsi3CntlrStatusChange The Insight Agent has detected a change in the controller status of an Critical
SCSI Controller.
The variable cpqScsiCntlrStatus indicates the current controller status.
1.3.6.1.4.1.8161.200.4.0.5006 shqScsi3PhyDrvStatusChange The agent has detected a change in the status of a SCSI physical drive. Critical
The current physical drive status is indicated in the cpqScsiPhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5007 shqTape3PhyDrvStatusChange The agent has detected a change in the status of a Tape drive. The current Critical
physical drive status is indicated in the cpqTapePhyDrvCondition variable.
1.3.6.1.4.1.8161.200.4.0.5008 shqTape3PhyDrvCleaningRequire The agent has detected a tape drive that needs to have a cleaning tape Major
d inserted and run.
This will cause the tape drive heads to be cleaned.
1.3.6.1.4.1.8161.200.4.0.5009 shqTape3PhyDrvCleanTapeRepla The agent has detected that an autoloader tape unit has a cleaning tape Major
ce that has been fully used and therefore needs to be replaced with a new
cleaning tape.
1.3.6.1.4.1.8161.200.4.0.5016 shqTape4PhyDrvStatusChange The Storage Agent has detected a change in the status of a Tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5017 shqScsi4PhyDrvStatusChange The Storage Agent has detected a change in the status of a SCSI physical Critical
drive.
The current physical drive status is indicated in the cpqScsiPhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5019 shqTape5PhyDrvStatusChange The Storage Agent has detected a change in the status of a tape drive. Critical
The current physical drive status is indicated in the cpqTapePhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5020 shqScsi5PhyDrvStatusChange The Storage Agent has detected a change in the status of an SCSI Critical
physical drive. The current physical drive status is indicated in the
cpqScsiPhyDrvStatus variable.

Operations and Maintenance Manual Page 116


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.5021 shqScsi3LogDrvStatusChange The Storage Agent has detected a change in the status of an SCSI logical Critical
drive. The current logical drive status is indicated in the
cpqScsiLogDrvStatus variable.
1.3.6.1.4.1.8161.200.4.0.5022 shqSasPhyDrvStatusChange The Storage Agent has detected a change in the status of a SAS or SATA Critical
physical drive.
The current physical drive status is indicated in the cpqSasPhyDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5023 shqSasLogDrvStatusChange The Storage Agent has detected a change in the status of a SAS or SATA Critical
logical drive.
The current logical drive status is indicated in the cpqSasLogDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.5024 shqSasTapeDrvStatusChange The Storage Agent has detected a change in the status of a SAS tape Critical
drive.
The current physical drive status is indicated in the cpqSasTapeDrvStatus
variable.
1.3.6.1.4.1.8161.200.4.0.6001 shqHe2CorrectableMemoryError The error has been corrected. The current number of correctable memory Minor
errors is reported in the variable cpqHeCorrMemTotalErrs.
1.3.6.1.4.1.8161.200.4.0.6002 shqHe2CorrectableMemoryLogDis The frequency of errors is so high that the error tracking logic has been Critical
abled temporarily disabled.
1.3.6.1.4.1.8161.200.4.0.6003 shqHeThermalTempFailed The temperature status has been set to fail. Critical
The system will be shut down due to this thermal condition.
1.3.6.1.4.1.8161.200.4.0.6004 shqHeThermalTempDegraded The temperature status has been set to degrade. The server's temperature Critical
is outside of the normal operating range.
1.3.6.1.4.1.8161.200.4.0.6005 shqHeThermalTempOk The temperature status has been set to octet server's temperature has Normal
returned to the normal operating range.
1.3.6.1.4.1.8161.200.4.0.6006 shqHeThermalSystemFanFailed The system fan status has been set to failed. Critical
A required system fan is not operating normally.

Operations and Maintenance Manual Page 117


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.6007 shqHeThermalSystemFanDegrad The system fan status has been set to degrade. An optional system fan is Critical
ed not operating normally.
1.3.6.1.4.1.8161.200.4.0.6008 shqHeThermalSystemFanOk The system fan status has been set to ok Any previously non-operational Normal
system fans have returned to normal operation.
1.3.6.1.4.1.8161.200.4.0.6009 shqHeThermalCpuFanFailed The CPU fan status has been set to fail. Critical
A processor fan is not operating normally. The server will be shut down.
1.3.6.1.4.1.8161.200.4.0.6010 shqHeThermalCpuFanOk The CPU fan status has been set to ok. Any previously non-operational Normal
processor fans have returned to normal operation.
1.3.6.1.4.1.8161.200.4.0.6011 shqHeAsrConfirmation The server has previously been shut down by the Automatic Server Minor
Recovery (ASR) feature and has just become operational again.
1.3.6.1.4.1.8161.200.4.0.6012 shqHeThermalConfirmation The server has previously been shut down due to a thermal anomaly on Minor
the server and has just become operational again.
1.3.6.1.4.1.8161.200.4.0.6013 shqHePostError One or more POST errors occurred. Power On Self-Test (POST) errors Minor
occur during the server restart process.
1.3.6.1.4.1.8161.200.4.0.6014 shqHeFltTolPwrSupplyDegraded The fault tolerant power supply sub-system condition has been set to Critical
degrade.
1.3.6.1.4.1.8161.200.4.0.6015 shqHe3CorrectableMemoryError A correctable memory error occurred. The error has been corrected. The Minor
current number of correctable memory errors is reported in the variable
cpqHeCorrMemTotalErrs.
1.3.6.1.4.1.8161.200.4.0.6016 shqHe3CorrectableMemoryLog The frequency of errors is so high that the error tracking logic has been Critical
Disabled temporarily disabled.
1.3.6.1.4.1.8161.200.4.0.6017 shqHe3ThermalTempFailed The temperature status has been set to FAILED. The system will be shut Critical
down due to this thermal condition.
1.3.6.1.4.1.8161.200.4.0.6018 shqHe3ThermalTempDegraded The server's temperature is outside of the normal operating range. The Critical
server will be shut down.
1.3.6.1.4.1.8161.200.4.0.6019 shqHe3ThermalTempOk The server's temperature has returned to the normal operating range. Normal

Operations and Maintenance Manual Page 118


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.6020 shqHe3ThermalSystemFanFailed The system fan status has been set to FAILED. A required system fan is Critical
not operating normally. The system will be shut down
1.3.6.1.4.1.8161.200.4.0.6021 shqHe3ThermalSystemFanDegra The system fan status has been set to DEGRADED. An optional system Critical
ded fan is not operating normally.
1.3.6.1.4.1.8161.200.4.0.6022 shqHe3ThermalSystemFanOk The system fan status has been set to OK. Any previously non-operational Normal
system fans have returned to normal operation.
1.3.6.1.4.1.8161.200.4.0.6023 shqHe3ThermalCpuFanFailed A processor fan is not operating normally. The server will be shut down. Critical
1.3.6.1.4.1.8161.200.4.0.6024 shqHe3ThermalCpuFanOk Any previously non-operational processor fans have returned to normal Normal
operation.
1.3.6.1.4.1.8161.200.4.0.6025 shqHe3AsrConfirmation The server was shut down by the Automatic Server Recovery (ASR) Minor
feature. It is now operational again.
1.3.6.1.4.1.8161.200.4.0.6026 shqHe3ThermalConfirmation The server was shut down due to a thermal anomaly. It is now operational Minor
again.
1.3.6.1.4.1.8161.200.4.0.6027 shqHe3PostError One or more POST errors occurred. Power On Self-Test (POST) errors Minor
occur during the server restart process.
1.3.6.1.4.1.8161.200.4.0.6028 shqHe3FltTolPwrSupplyDegraded The Fault Tolerant Power Supply subsystem condition has been set to Critical
DEGRADED.
1.3.6.1.4.1.8161.200.4.0.6029 shqHe3CorrMemReplaceMem The errors have been corrected, but the memory module should be Minor
Module replaced.
1.3.6.1.4.1.8161.200.4.0.6032 shqHe3FltTolPowerRedundancyL The Fault Tolerant Power Supplies have lost redundancy for the specified Critical
ost chassis.
1.3.6.1.4.1.8161.200.4.0.6033 shqHe3FltTolPowerSupplyInserte A Fault Tolerant Power Supply has been inserted into the specified chassis Normal
d and bay location.
1.3.6.1.4.1.8161.200.4.0.6034 shqHe3FltTolPowerSupplyRemov A Fault Tolerant Power Supply has been removed from the specified Major
ed chassis and bay location.

Operations and Maintenance Manual Page 119


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.6035 shqHe3FltTolFanDegraded The Fault Tolerant Fan condition has been set to DEGRADED for the Critical
specified chassis and fan.
1.3.6.1.4.1.8161.200.4.0.6036 shqHe3FltTolFanFailed The Fault Tolerant Fan condition has been set to FAILED for the specified Critical
chassis and fan.
1.3.6.1.4.1.8161.200.4.0.6037 shqHe3FltTolFanRedundancyLost The Fault Tolerant Fans have lost redundancy for the specified chassis. Critical
1.3.6.1.4.1.8161.200.4.0.6038 shqHe3FltTolFanInserted A Fault Tolerant Fan has been inserted into the specified chassis and fan Normal
location.
1.3.6.1.4.1.8161.200.4.0.6039 shqHe3FltTolFanRemoved A Fault Tolerant Fan has been removed from the specified chassis and fan Major
location.
1.3.6.1.4.1.8161.200.4.0.6040 shqHe3TemperatureFailed The temperature status has been set to FAILED in the specified chassis Critical
and location. The system will be shut down due to this condition.
1.3.6.1.4.1.8161.200.4.0.6041 shqHe3TemperatureDegraded The server's temperature is outside of the normal operating range. The Critical
server will be shut down.
1.3.6.1.4.1.8161.200.4.0.6042 shqHe3TemperatureOk The server's temperature has returned to the normal operating range. Normal
1.3.6.1.4.1.8161.200.4.0.6043 shqHe3PowerConverterDegraded The DC-DC Power Converter condition has been set to DEGRADED for Critical
the specified chassis, slot, and socket.
1.3.6.1.4.1.8161.200.4.0.6044 shqHe3PowerConverterFailed The DC-DC Power Converter condition has been set to FAILED for the Critical
specified chassis, slot, and socket.
1.3.6.1.4.1.8161.200.4.0.6045 shqHe3PowerConverterRedundan The DC-DC Power Converters have lost redundancy for the specified Critical
cyLost chassis.
1.3.6.1.4.1.8161.200.4.0.6046 shqHe3CacheAccelParityError A cache accelerator parity error indicates a cache module needs to be Critical
replaced.
1.3.6.1.4.1.8161.200.4.0.6047 shqHeResilientMemOnlineSpare The Advanced Memory Protection subsystem has detected a memory fault. Major
Engaged The Online Spare Memory has been activated.
1.3.6.1.4.1.8161.200.4.0.6048 shqHe4FltTolPowerSupplyOk The Fault Tolerant Power Supply condition has been set back to the OK Normal
state for the specified chassis and bay location.

Operations and Maintenance Manual Page 120


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.6049 shqHe4FltTolPowerSupplyDegrad The Fault Tolerant Power Supply condition has been set to DEGRADED Critical
ed for the specified chassis and bay location.
1.3.6.1.4.1.8161.200.4.0.6050 shqHe4FltTolPowerSupplyFailed The Fault Tolerant Power Supply condition has been set to FAILED for the Critical
specified chassis and bay location.
1.3.6.1.4.1.8161.200.4.0.6051 shqHeResilientMemMirroredMem The Advanced Memory Protection subsystem has detected a memory fault. Major
ory Mirrored Memory has been activated.
Engaged
1.3.6.1.4.1.8161.200.4.0.6052 shqHeResilientAdvancedECCMe The Advanced Memory Protection subsystem has detected a memory fault. Major
moryEngaged Advanced ECC has been activated.
1.3.6.1.4.1.8161.200.4.0.6053 shqHeResilientMemXorMemory The Advanced Memory Protection subsystem has detected a memory fault. Major
Engaged The XOR engine has been activated.
1.3.6.1.4.1.8161.200.4.0.6054 shqHe3FltTolPowerRedundancy The Fault Tolerant Power Supplies have returned to a redundant state for Normal
Restored the specified chassis.
1.3.6.1.4.1.8161.200.4.0.6055 shqHe3FltTolFanRedundancy The Fault Tolerant Fans have returned to a redundant state for the Normal
Restored specified chassis.
1.3.6.1.4.1.8161.200.4.0.6056 shqHe4CorrMemReplaceMemMo The errors have been corrected, but the memory module should be Minor
dule replaced.
1.3.6.1.4.1.8161.200.4.0.6057 shqHeResMemBoardRemoved An Advanced Memory Protection subsystem board or cartridge has been Warning
removed from the system.
1.3.6.1.4.1.8161.200.4.0.6058 shqHeResMemBoardInserted An Advanced Memory Protection subsystem board or cartridge has been Normal
inserted into the system.
1.3.6.1.4.1.8161.200.4.0.6059 shqHeResMemBoardBusError An Advanced Memory Protection subsystem board or cartridge bus error Critical
has been detected.
1.3.6.1.4.1.8161.200.4.0.6061 shqHeManagementProcInReset The management processor is currently in the process of being reset Minor
because of a firmware update or some other event.

Operations and Maintenance Manual Page 121


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.6062 shqHeManagementProcReady The management processor was successfully reset. It is now available Normal
again.
1.3.6.1.4.1.8161.200.4.0.6063 shqHeManagementProcFailedRes The management processor was not successfully reset. It is not Critical
et operational.
1.3.6.1.4.1.8161.200.4.0.6069 shqHe4FltTolPowerSupplyACpow The fault tolerant power supply AC condition has been set to fail for the Critical
erloss specified chassis and bay location.
1.3.6.1.4.1.8161.200.4.0.8001 shqSs2FanStatusChange The agent has detected a change in the Fan Status of a storage system. Critical
The variable cpqSsBoxFanStatus indicates the current fan status.
1.3.6.1.4.1.8161.200.4.0.8002 shqSsTempFailed The agent has detected that a temperature status has been set to fail. The Critical
storage system will be shut down.
1.3.6.1.4.1.8161.200.4.0.8003 shqSsTempDegraded The agent has detected a temperature status that has been set to degrade. Major
The storage system's temperature is outside of the normal operating range.
1.3.6.1.4.1.8161.200.4.0.8004 shqSsTempOk The temperature status has been set to OK. The storage system's Normal
temperature has returned to normal operating range.
1.3.6.1.4.1.8161.200.4.0.8005 shqSsSidePanelInPlace The side panel status has been set to in place. The storage system's side Normal
panel has returned to a properly installed state.
1.3.6.1.4.1.8161.200.4.0.8006 shqSsSidePanelRemoved The side panel status has been set to remove. The storage system's side Major
panel is not in a properly installed state. This situation may result in
improper cooling of the drives in the storage system due to airflow changes
caused by the missing side panel.
1.3.6.1.4.1.8161.200.4.0.8007 shqSsPwrSupplyDegraded A storage system power supply status has been set to degrade. Critical
1.3.6.1.4.1.8161.200.4.0.8008 shqSs3FanStatusChange The agent has detected a change in the Fan Status of a storage system. Critical
The variable cpqSsBoxFanStatus indicates the current Fan Status.
1.3.6.1.4.1.8161.200.4.0.8009 shqSs3TempFailed The agent has detected that a temperature status has been set to FAILED. Critical
The storage system will be shut down.

Operations and Maintenance Manual Page 122


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.8010 shqSs3TempDegraded The agent has detected a temperature status that has been set to Major
DEGRADED. The storage system's temperature is outside of the normal
operating range.
1.3.6.1.4.1.8161.200.4.0.8011 shqSs3TempOk The temperature status has been set to OK. The storage system's Normal
temperature has returned to normal operating range. It may be reactivated
by the administrator.
1.3.6.1.4.1.8161.200.4.0.8012 shqSs3SidePanelInPlace The side panel status has been set to IN PLACE. The storage system's Normal
side panel has returned to a properly installed state.
1.3.6.1.4.1.8161.200.4.0.8013 shqSs3SidePanelRemoved The side panel status has been set to REMOVED. The storage system's Major
side panel is not in a properly installed state. This situation may result in
improper cooling of the drives in the storage system due to airflow changes
caused by the missing side panel.
1.3.6.1.4.1.8161.200.4.0.8014 shqSs3PwrSupplyDegraded A storage system power supply status has been set to degrade. Critical
1.3.6.1.4.1.8161.200.4.0.8015 shqSs4PwrSupplyDegraded A storage system power supply status has been set to degrade. Critical
1.3.6.1.4.1.8161.200.4.0.8016 shqSsExFanStatusChange The agent has detected a change in the Fan Module Status of a storage Critical
system.
The variable cpqSsFanModuleStatus indicates the current fan status.
1.3.6.1.4.1.8161.200.4.0.8018 shqSsExPowerSupplyUpsStatus The agent has detected a change status of a UPS attached to a storage Critical
Change system power supply. The variable cpqSsPowerSupplyUpsStatus indicates
the status.
1.3.6.1.4.1.8161.200.4.0.8019 shqSsExTempSensorStatus The agent has detected a change in the status of a storage system Critical
Change temperature sensor. The variable cpqSsTempSensorStatus indicates the
status.
1.3.6.1.4.1.8161.200.4.0.8020 shqSsEx2FanStatusChange The agent has detected a change in the fan module status of a storage Critical
system. The variable cpqSsFanModuleStatus indicates the current fan
status.

Operations and Maintenance Manual Page 123


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.8021 shqSsEx2PowerSupplyStatus The agent has detected a change in the power supply status of a storage Critical
Change system. The variable cpqSsPowerSupplyStatus indicates the status.
1.3.6.1.4.1.8161.200.4.0.8022 shqSsExBackplaneFanStatus The agent has detected a change in the fan status of a storage system. Critical
Change The variable cpqSsBackplaneFanStatus indicates the current fan status.
1.3.6.1.4.1.8161.200.4.0.8023 shqSsExBackplaneTempStatus The agent has detected a change in the status of the temperature in a Critical
Change storage system. The variable cpqSsBackplaneTempStatus indicates the
status.
1.3.6.1.4.1.8161.200.4.0.8024 shqSsExBackplanePowerSupply The agent has detected a change in the power supply status of a storage Critical
StatusChange system. The variable cpqSsBackplaneFtpsStatus indicates the status.
1.3.6.1.4.1.8161.200.4.0.8025 shqSsExRecoveryServerStatus The agent has detected a change in the recovery server option status of a Major
Change storage system. The variable cpqSsChassisRsoStatus indicates the status.
1.3.6.1.4.1.8161.200.4.0.8026 shqSs5FanStatusChange The agent has detected a change in the Fan Status of a storage system. Critical
The variable cpqSsBoxFanStatus indicates the current fan status.
1.3.6.1.4.1.8161.200.4.0.8027 shqSs5TempStatusChange The agent has detected a change in the temperature status of a storage Critical
system. The variable cpqSsBoxTempStatus indicates the current
temperature status.
1.3.6.1.4.1.8161.200.4.0.8028 shqSs5PwrSupplyStatusChange The agent has detected a change in the power supply status of a storage Critical
system. The variable cpqSsBoxFltTolPwrSupplyStatus indicates the
current power supply status.
1.3.6.1.4.1.8161.200.4.0.8029 shqSs6FanStatusChange The agent has detected a change in the Fan Status of a storage system. Critical
The variable cpqSsBoxFanStatus indicates the current fan status.
1.3.6.1.4.1.8161.200.4.0.8030 shqSs6TempStatusChange The agent has detected a change in the temperature status of a storage Critical
system.
The variable cpqSsBoxTempStatus indicates the current temperature
status.

Operations and Maintenance Manual Page 124


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.8031 shqSs6PwrSupplyStatusChange The agent has detected a change in the power supply status of a storage Critical
system.
The variable cpqSsBoxFltTolPwrSupplyStatus indicates the current power
supply status.
1.3.6.1.4.1.8161.200.4.0.8032 shqSsConnectionStatusChange Storage system connection status change. The agent has detected a Major
change in the connection status of a storage system
1.3.6.1.4.1.8161.200.4.0.9001 shqSm2ServerReset The Remote Insight/ Integrated Lights-Out firmware has detected a server Critical
reset.
1.3.6.1.4.1.8161.200.4.0.9002 shqSm2ServerPowerOutage The Remote Insight/ Integrated Lights-Out firmware has detected server Critical
power failure.
1.3.6.1.4.1.8161.200.4.0.9003 shqSm2UnauthorizedLoginAttemp The Remote Insight/ Integrated Lights-Out firmware has detected Warning
ts unauthorized login attempts.
1.3.6.1.4.1.8161.200.4.0.9004 shqSm2BatteryFailed The Remote Insight battery has failed and needs to be replaced. Critical
1.3.6.1.4.1.8161.200.4.0.9005 shqSm2SelfTestError The Remote Insight/ Integrated Lights-Out firmware has detected a Critical
Remote Insight self-test error.
1.3.6.1.4.1.8161.200.4.0.9006 shqSm2InterfaceError The host OS has detected an error in the Remote Insight/ Integrated Major
Lights-Out interface. The firmware is not responding.
1.3.6.1.4.1.8161.200.4.0.9007 shqSm2BatteryDisconnected The Remote Insight battery cable has been disconnected. Major
1.3.6.1.4.1.8161.200.4.0.9008 shqSm2KeyboardCableDisconnec The Remote Insight keyboard cable has been disconnected. Major
ted
1.3.6.1.4.1.8161.200.4.0.9009 shqSm2MouseCableDisconnected The Remote Insight mouse cable has been disconnected. Major
1.3.6.1.4.1.8161.200.4.0.9010 shqSm2ExternalPowerCable The Remote Insight external power cable has been disconnected. Major
Disconnected
1.3.6.1.4.1.8161.200.4.0.9011 shqSm2LogsFull The Remote Insight/ Integrated Lights-Out firmware has detected the logs Minor
are full.

Operations and Maintenance Manual Page 125


Proprietary and Confidential HP Servers

OID Alarm Name Description Severity

1.3.6.1.4.1.8161.200.4.0.9012 shqSm2SecurityOverrideEngaged The Remote Insight/ Integrated Lights-Out firmware has detected the Normal
security override jumper has been toggled to the engaged position.
1.3.6.1.4.1.8161.200.4.0.9013 shqSm2SecurityOverrideDisengag The Remote Insight/ Integrated Lights-Out firmware has detected the Normal
ed security override jumper has been toggled to the disengaged position.
1.3.6.1.4.1.8161.200.4.0.9015 shqSm2NicLinkDown The Remote Insight/ Integrated Lights-Out firmware has detected the loss Major
of network link.
1.3.6.1.4.1.8161.200.4.0.9016 shqSm2NicLinkUp The Remote Insight/ Integrated Lights-Out firmware has detected the Clear/
presence of network link Normal
1.3.6.1.4.1.8161.200.4.0.15003 shqClusterNodeDegraded This trap is sent if a node in the cluster becomes degraded. Major
1.3.6.1.4.1.8161.200.4.0.15004 shqClusterNodeFailed This trap is sent if a node in the cluster fails. Major
1.3.6.1.4.1.8161.200.4.0.15005 shqClusterResourceDegraded This trap is sent if a cluster resource becomes degraded. Major
1.3.6.1.4.1.8161.200.4.0.15006 shqClusterResourceFailed This trap is sent if a cluster resource fails. Major
1.3.6.1.4.1.8161.200.4.0.15007 shqClusterNetworkDegraded This trap is sent if a cluster network becomes degraded. Major
1.3.6.1.4.1.8161.200.4.0.15008 shqClusterNetworkFailed This trap is sent if a cluster network fails. Major

Operations and Maintenance Manual Page 126


Proprietary and Confidential HP Open View Alarms

7. HP Open View Alarms


In a service using the TOMIA Mediator Software package, the alarms in Table 27 may be generated.
When the system Watchdog detects that a process is not responding, the Watchdog sends the appropriate
alarm from the table below with the status listed as “NOT_RUNNING”. The Watchdog then proceeds to
resurrect the process. Upon resurrection of the process, the Watchdog sends the same alarm again, with the
status listed as “RUNNING”.
Table 27: HP OpenView NNM Alarms

Dec ID Hex ID Severity Alarm Text Description

50331904 3000100 Critical SHMON: NNM Process: OVsPMD Status is: ovspmd - Launches and manages all background services. ovspmd interacts
RUNNING/NOT_RUNNING. with the following user commands: ovstart, ovstop, ovstatus, ovpause, and
ovresume, and performs the appropriate actions on the background
services.
50331905 3000101 Major SHMON: NNM Process: ovwdb Status is: ovwdb - Controls the object database. The object database stores semantic
RUNNING/NOT_RUNNING. information about the objects.
50331906 3000102 Critical SHMON: NNM Process: pmd Status is: pmd - Receives and forwards events, and logs events to the event database.
RUNNING/NOT_RUNNING. pmd also forwards events from the network to other applications that have
connected to pmd using the SNMP API.
50331908 3000104 Critical SHMON: NNM Process: ovtrapd Status is: ovtrapd - Receives SNMP traps and forwards them to pmd.
RUNNING/NOT_RUNNING.
50331909 3000105 Minor SHMON: NNM Process: ovalarmsrv Status is: ovalarmsrv - Provides event information to Java-based Alarm Browsers.
RUNNING/NOT_RUNNING.
50331910 3000106 Minor SHMON: NNM Process: ovsessionmgr Status is: Ovsessionmgr - Manages users’ web sessions.
RUNNING/NOT_RUNNING.
50331911 3000107 Critical SHMON: NNM Process: ovactiond Status is: ovactiond - Receives events from pmd and executes commands.
RUNNING/NOT_RUNNING.

Operations and Maintenance Manual Page 127


Proprietary and Confidential HP Open View Alarms

Dec ID Hex ID Severity Alarm Text Description

50331912 3000108 Major SHMON: NNM Process: ovtopmd Status is: ovtopmd - Maintains the network topology database. The topology database
RUNNING/NOT_RUNNING. is a set of files that stores netmon polling status and information about
network objects, including their relationships and status. ovtopmd reads the
topology database at start-up.
50331913 3000109 Minor SHMON: NNM Process: snmpCollect Status is: snmpCollect - Collects MIB data and performs threshold monitoring.
RUNNING/NOT_RUNNING.
50331914 300010A Minor SHMON: NNM Process: ovrequestd Status is: ovrequestd - Executes the reports and data warehouse exports according to
RUNNING/NOT_RUNNING. a predefined schedule. Once a report is configured, ovrequestd starts
executing the exports and the reports.
50331915 300010B Minor SHMON: NNM Process: ovdbcheck Status is: ovdbcheck - is a background process that maintains the NNM data ware-
RUNNING/NOT_RUNNING. house embedded database.
50331916 300010C Minor SHMON: NNM Process: ovcapsd Status is: ovcapsd - Listens for new nodes and checks them for remote DMI
RUNNING/NOT_RUNNING. capabilities, web-manageability, and web server capabilities.
50331917 300010D Minor SHMON: NNM Process: ovas Status is: ovas - Maintains topology and node status information for NNM Dynamic
RUNNING/NOT_RUNNING. Views.
50331918 300010E Major SHMON: NNM Process: netmon Status is: netmon - Polls SNMP agents to discover your network, and then detects
RUNNING/NOT_RUNNING. topology, configuration, and status changes in the network.
50331919 300010F Minor SHMON: NNM Process: ovuispmd Status is: ovuispmd - Manages the NNM user interface services and distributes
RUNNING/NOT_RUNNING. relevant ovspmd requests to each instance of ovw that is running. ovuispmd
must be running for ovw to be started, and should be running whenever
ovspmd is running.
50331920 3000110 Critical SHMON: NNM Process: icmpCheck Status is: icmpCheck - Performs availability checks on the NNM using ICMP echo
RUNNING/NOT_RUNNING. (ping). Updates the trapSender with availability status by sending an internal
alarm to the head of the “traps for sending” queue.
50331921 3000111 Critical SHMON: NNM Process: trapSender Status is: TrapSender- sending alarm to target. If there is connectivity – sends traps to
RUNNING/NOT_RUNNING. target OSS. If there is no connectivity- keep the alarms in a queue and send
them when connection is resumed.

Operations and Maintenance Manual Page 128


Proprietary and Confidential HP Open View Alarms

Dec ID Hex ID Severity Alarm Text Description

50331922 3000112 Critical SHMON: NNM Process: trapMapper Status is: trapMapper - Maps the original traps to new ones based on the alarms
RUNNING/NOT_RUNNING. template table.
50331923 3000113 Critical SHMON: NNM Processes Status is: This trap reports on all the NNM processes
RUNNING/NOT_RUNNING

Operations and Maintenance Manual Page 129


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

8. Remote Power Distribution Unit (RMPDU) Agents


This following sections list the AC-RMPDU Sentry Remote Power Manager alarms.
* Sentry AC-RMPDU Alarms
* CyperPower RMPDU Alarms
* DC-RMPDU Alarms (Sentry 4820-XL-8)
* X733-Compliant RMPDU Alarms
Refer to the AC-RMPDU annex for additional information.

8.1 Sentry AC-RMPDU Alarms


The following tables describe the alarms sent by the Sentry RMPDU in a generic system without the TOMIA
proprietary OSS monitoring mediator tool.

Table 28: Sentry AC-RMPDU Alarms

OID Alarm Name Description Severity

.1.3.6.1.4.1.1718.3.100.0.1 TowerStatusEvent Tower status event. If enabled, this trap is sent when the towerStatus indicates an error Critical
state ('noComm'). This trap is repeated periodically while the towerStatus remains in an
error state. If the towerStatus returns to a "non-error" state ('normal'), this trap is sent
once more with the non-error towerStatus, and then stops being repeated. While the
towerStatus indicates an error state, all status and load traps are suppressed for input
feeds and outlets on the tower.
.1.3.6.1.4.1.1718.3.100.0.2 InfeedStatusEvent Input feed status event. If enabled, this trap is sent when the infeedStatus indicates an Critical
error state ('offError', 'onError', or 'noComm'). This trap is repeated periodically while the
infeedStatus remains in an error state. If the infeedStatus returns to a non-error state
('off' or 'on'), this trap is sent once more with the non-error infeedStatus, and then stops
being repeated.
While the infeedStatus indicates an error state, load traps are suppressed for the input
feed, and if the infeedCapabilities 'failSafe' bit is FALSE, all status and load traps are
suppressed for outlets on the input feed.

Operations and Maintenance Manual Page 130


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

OID Alarm Name Description Severity

.1.3.6.1.4.1.1718.3.100.0.3 InfeedLoadEvent Input feed load event. If enabled, this trap is sent when the infeedLoadStatus indicates Critical
an error state ('loadLow', 'loadHigh', 'overLoad', 'readError', or 'noComm'). This trap is
repeated periodically while the infeedLoadStatus remains in an error state. If the
infeedLoadStatus returns to a non-error state ('normal' or 'notOn'), this trap is sent once
more with the non-error infeedLoadStatus, and then stops being repeated.
.1.3.6.1.4.1.1718.3.100.0.4 OutletStatusEvent Outlet status event. If enabled, this trap is sent when the outletStatus indicates an error Critical
state ('offError', 'onError', 'noComm', 'offFuse', or 'onFuse'). This trap is repeated
periodically while the outletStatus remains in an error state. If the outletStatus returns to
a non-error state ('off' or 'on'), this trap is sent once more with the non-error outletStatus,
and then stops being repeated.
While the outletStatus indicates an error state, load traps are suppressed for the outlet.
.1.3.6.1.4.1.1718.3.100.0.5 OutletLoadEvent Outlet load event. If enabled, this trap is sent when the outletLoadStatus indicates an Critical
error state ('loadLow', 'loadHigh', 'overLoad', 'readError', or 'noComm'). This trap is
repeated periodically while the outletLoadStatus remains in an error state. If the
outletLoadStatus returns to a non-error state ('normal' or 'notOn'), this trap is
sent once more with the non-error outletLoadStatus, and then stops being repeated.
.1.3.6.1.4.1.1718.3.100.0.6 OutletChangeEvent Outlet change event. If enabled, this trap is sent when the outletStatus indicates an error Minor
state ('offError', 'onError', 'noComm', 'offFuse', or 'onFuse'). This trap is repeated
periodically while the outletStatus remains in an error state. If the outletStatus returns to
a non-error state ('off' or 'on'), this trap is sent once more with the non-error outletStatus,
and then stops being repeated.
While the outletStatus indicates an error state, load traps are suppressed for the outlet.
.1.3.6.1.4.1.1718.3.100.0.13 ContactClosureEvent Contact closure event. If enabled, this trap is sent when the contactClosureStatus Critical
indicates an error state ('alarm'). This trap is repeated periodically while the
contactClosureStatus remains in an error state. If the contactClosureStatus returns to a
"non-error" state ('normal'), this trap is sent once more with the non-error
contactClosureStatus, and then stops being repeated.

Operations and Maintenance Manual Page 131


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

8.2 CyperPower RMPDU Alarms


The following table describes the alarms generated by the CyberPower RMPDU in a generic system, without
the TOMIA proprietary OSS monitoring mediator tool
Table 29: CyberPower DC-RMPDU Alarms

ID Alarm Name Description Severity

1.3.6.1.4.3808.302 atsSourceFault Source fault occurs. CRITICAL


1.3.6.1.4.3808.306 atsInputHighVoltage The upper voltage limit has been crossed. CRITICAL
1.3.6.1.4.3808.314 atsCommunicationLost ATS communication lost. CRITICAL
1.3.6.1.4.3808.320 atsPowerSupplyFault Power supply failure. CRITICAL
1.3.6.1.4.3808.322 atsDevHardwareFault Device hardware failure. CRITICAL
1.3.6.1.4.3808.328 atsOverload The load has exceeded overload threshold. CRITICAL
1.3.6.1.4.3808.304 atsRedundancyFail Redundancy fails. WARNING
1.3.6.1.4.3808.308 atsInputLowVoltage The lower voltage limit has been crossed. WARNING
1.3.6.1.4.3808.310 atsInputHighFrequency The upper frequency limit has been crossed. WARNING
1.3.6.1.4.3808.312 atsInputLowFrequency The lower frequency limit has been crossed. WARNING
1.3.6.1.4.3808.316 atsLCDCommunicationLost LCD communication lost. WARNING
1.3.6.1.4.3808.318 atsDB9CommunicationLost DB9 communication lost. WARNING
1.3.6.1.4.3808.330 atsNearOverload The load has exceeded near overload threshold. WARNING
1.3.6.1.4.3808.332 atsLowLoad The load has been lower than low load threshold. WARNING

Operations and Maintenance Manual Page 132


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

8.3 DC-RMPDU Alarms (Sentry 4820-XL-8)


The following table describes the alarms generated by the DC-RMPDU:
Table 30: Sentry DC-RMPDU Alarms

OID Trap Name Description Severity

1.3.6.1.4.1.1718 sentry2ChainStart This event is sent when the Sentry has completed the application code boot process. This can occur from Normal
.2.100.0.1 either a power up or a resynchronization of the Sentry chain.
Correlation: N/A
1.3.6.1.4.1.1718 sentry2BoardTempe This event is sent when the value from a temperature sensor attached to a Sentry board is above a pre- Critical
.2.100.0.2 ratureHighError configured high threshold level. This trap is repeated periodically while the error condition exists.
Correlation: sentry2BoardTemperatureNormal
1.3.6.1.4.1.1718 sentry2BoardTempe This event is sent when the value from a temperature sensor attached to a Sentry board is below a pre- Critical
.2.100.0.3 ratureLowError configured low threshold level. This trap is repeated periodically while the error condition exists.
Correlation: sentry2BoardTemperatureNormal
1.3.6.1.4.1.1718 sentry2BoardTempe This event is sent when the value from a temperature sensor attached to a Sentry board returns to the normal Normal
.2.100.0.4 ratureNormal range within the pre-configured high and low threshold levels, after having been above or below the threshold
levels.
Correlation: This trap clears:
* sentry2BoardTemperatureLowError
* sentry2BoardTemperatureHighError
based on the Boar identification variables $1, $2, $3
1.3.6.1.4.1.1718 sentry2PortControlSt This event is sent if the control status of a Sentry port has changed one-or-more times since the last notification Normal
.2.100.0.5 atusChange period. For example, a Sentry port has been turned on, off, shutdown, or rebooted. The current control status at
the time of the notification is included.
Correlation: N/A
1.3.6.1.4.1.1718 sentry2PortModuleS This event is sent when the module status of a Sentry port indicates an error condition. This trap is repeated Critical
.2.100.0.6 tatusError periodically while the error condition exists.
Correlation: Clears by sentry2PortModuleStatusNormal alarm

Operations and Maintenance Manual Page 133


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

OID Trap Name Description Severity

1.3.6.1.4.1.1718 sentry2PortModuleS This event is sent when the module status of a Sentry port returns to normal after being in an error condition. Normal
.2.100.0.7 tatusNormal Correlation: This trap clears sentry2PortModuleStatusError based on the Boar identification variables $1, $2,
$3
1.3.6.1.4.1.1718 sentry2PortDeviceLo This event is sent when the value from the load sensor of a Sentry port is above a pre-configured high threshold Critical
.2.100.0.8 adHighError level. This trap is repeated periodically while the error condition exists.
Correlation: Cleared by trap sentry2PortDeviceLoadNormal
1.3.6.1.4.1.1718 sentry2PortDeviceLo This event is sent when the value from the load sensor of a Sentry port is below a pre-configured high threshold Major
.2.100.0.9 adLowError level. This trap is repeated periodically while the error condition exists.
Correlation: Cleared by trap sentry2PortDeviceLoadNormal
1.3.6.1.4.1.1718 sentry2PortDeviceLo This event is sent when the value from the load sensor of a Sentry port returns to the normal range within the Normal
.2.100.0.10 adNormal pre-configured high and low threshold levels, after having been above or below the threshold levels.
Correlation: This trap clears
* sentry2PortDeviceLoadHighError
* sentry2PortDeviceLoadLowError
based on the Boar identification variables $1, $2, $3
1.3.6.1.4.1.1718 sentry2BoardInputLo This event is sent when the value from the input load sensor of a Sentry board is above a pre-configured high Critical
.2.100.0.11 adHighError threshold level. This trap is repeated periodically while the error condition exists.
Correlation: Clears by sentry2BoardInputLoadNormal alarm
.1.3.6.1.4.1.171 sentry2BoardInputLo This event is sent when the value from the input load sensor of a Sentry board is below a pre-configured low Critical
8.2.100.0.12 adLowError threshold level. This trap is repeated periodically while the error condition exists.
Correlation: Clears by sentry2BoardInputLoadNormal alarm
.1.3.6.1.4.1.171 sentry2BoardInputLo This event is sent when the value from the input load sensor of a Sentry board returns to the normal range Normal
8.2.100.0.13 adNormal within the pre-configured high and low threshold levels, after having been above or below the threshold levels.
Correlation: This trap clears :
* sentry2BoardInputLoadLowError
* sentry2BoardInputLoadHighError
based on the Boar identification variables $1, $2, $3

Operations and Maintenance Manual Page 134


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

8.4 X733-Compliant RMPDU Alarms


The following table contains the alarms transmitted from the AC-RMPDU in a system compliant to the X733
standard, for connectivity to the TOMIA proprietary OSS monitoring mediator tool.
Table 31: RMPDU Alarms List - X733-Compliant

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.2.0.1 shTowerStatusEvent Tower status event. If enabled, this trap is sent when the towerStatus indicates an error Critical
state ('noComm'). This trap is repeated periodically while the towerStatus remains in an
error state. If the towerStatus returns to a "non-error" state ('normal'), this trap is sent once
more with the non-error towerStatus, and then stops being repeated.
While the towerStatus indicates an error state,
all status and load traps are suppressed for input feeds and outlets on the tower.
1.3.6.1.4.1.8161.200.2.0.2 shInfeedStatusEvent Input feed status event. If enabled, this trap is sent when the infeedStatus indicates an error Critical
state ('offError', 'onError', or 'noComm'). This trap is repeated periodically while the
infeedStatus remains in an error state. If the infeedStatus returns to a non-error state ('off'
or 'on'), this trap is sent once more with the non-error infeedStatus, and then stops being
repeated.
While the infeedStatus indicates an error state, load traps are suppressed for the input feed,
and if the infeedCapabilities 'failSafe' bit is FALSE, all status and load traps are suppressed
for outlets on the input feed.
1.3.6.1.4.1.8161.200.2.0.3 shInfeedLoadEvent Input feed load event. If enabled, this trap is sent when the infeedLoadStatus indicates an Critical
error state ('loadLow', 'loadHigh', 'overLoad', 'readError', or 'noComm'). This trap is repeated
periodically while the infeedLoadStatus remains in an error state.
If the infeedLoadStatus returns to a non-error state ('normal' or 'notOn'), this trap issent
once more with the non-error infeedLoadStatus, and then stops being repeated.
1.3.6.1.4.1.8161.200.2.0.4 shOutletStatusEvent Outlet status event. If enabled, this trap is sent when the outletStatus indicates an error Critical
state ('offError', 'onError', 'noComm', 'offFuse', or 'onFuse'). This trap is repeated
periodically while the outletStatus remains in an error state. If the outletStatus returns to a
non-error state ('off' or 'on'), this trap is sent once more with the non-error outletStatus, and
then stops being repeated.
While the outletStatus indicates an error state, load traps are suppressed for the outlet.

Operations and Maintenance Manual Page 135


Proprietary and Confidential Remote Power Distribution Unit (RMPDU) Agents

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.2.0.5 shOutletLoadEvent Outlet load event. If enabled, this trap is sent when the outletLoadStatus indicates an error Critical
state ('loadLow', 'loadHigh', 'overLoad', 'readError', or 'noComm'). This trap is repeated
periodically while the outletLoadStatus remains in an error state.
If the outletLoadStatus returns to a non-error state ('normal' or 'notOn'), this trap is sent
once more with the non-error outletLoadStatus, and then stops being repeated.
1.3.6.1.4.1.8161.200.2.0.6 shOutletChangeEvent Outlet change event. If enabled, this trap is sent when the outletStatus indicates an error Minor
state ('offError', 'onError', 'noComm', 'offFuse', or 'onFuse').
This trap is repeated periodically while the outletStatus remains in an error state. If the
outletStatus returns to a non-error state ('off' or 'on'), this trap is sent once more with the
non-error outletStatus, and then stops being repeated.
While the outletStatus indicates an error state, load traps are suppressed for the outlet.
1.3.6.1.4.1.8161.200.2.0.13 shContactClosureEvent Contact closure event. If enabled, this trap is sent when the contactClosureStatus indicates Critical
an error state ('alarm'). This trap is repeated periodically while the contactClosureStatus
remains in an error state. If the contactClosureStatus returns to a "non-error" state
('normal'), this trap is sent once more with the non-error contactClosureStatus, and then
stops being repeated.

Operations and Maintenance Manual Page 136


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

9. Cisco LAN Switch Unit (LSU) and Router Agent


Cisco Switches and Routers have their own proprietary agent. The Cisco Equipment agent generates SNMP
traps based on MIB files that are provided in both Router AND Switch Cisco_v2smi.zip file.
This section contains the following topics:
* Cisco System Alarms – Generic: alarms that are generated by this agent in a generic system without
TOMIA proprietary OSS monitoring mediator tool
* Cisco System Alarms X733-Compliant: This is relevant for a system integrated with the TOMIA
proprietary OSS monitoring mediator tool.

9.1 Cisco System Alarms – Generic


The following table contains the alarms received from a Cisco agent in a generic system, without the
integration of the TOMIA proprietary OSS monitoring mediator tool.

Table 32: Generic Cisco Alarms

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.13.3.0.1 ciscoEnvMonShutdownN A ciscoEnvMonShutdownNotification is sent if the environmental monitor detects a Major


otification testpoint reaching a Critical state and is about to initiate a shutdown. This notification
contains no objects so that it may be encoded and sent in the shortest amount of time
possible. Even so, management applications should not rely on receiving such a
notification as it may not be sent before the shutdown completes.
.1.3.6.1.4.1.9.9.13.3.0.2 ciscoEnvMonVoltageNotif A ciscoEnvMonVoltageNotification is sent if the voltage measured at a given testpoint is Major
ication outside the normal range for the testpoint (i.e. is at the warning, critical, or shutdown
stage). Since such a notification is usually generated before the shutdown state is
reached, it can convey more data and has a better chance of being sent than does the
ciscoEnvMonShutdownNotification. This notification is deprecated in favor of
ciscoEnvMonVoltStatusChangeNotif.

Operations and Maintenance Manual Page 137


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.13.3.0.3 ciscoEnvMonTemperatur ciscoEnvMonTemperatureNotification is sent if the temperature measured at a given Critical


eNotification testpoint is outside the normal range for the testpoint (i.e. is at the warning, critical, or
shutdown stage). Since such a Notification is
usually generated before the shutdown state is reached, It can convey more data and has
a better chance of being sent than does the ciscoEnvMonShutdownNotification. This
notification is deprecated in favour of ciscoEnvMonTempStatusChangeNotif.
.1.3.6.1.4.1.9.9.13.3.0.4 ciscoEnvMonFanNotificati ciscoEnvMonFanNotification is sent if any one of the fans in the fan array (where extant) Critical
on fails. Since such
a notification is usually generated before the shutdown state is reached, it can convey
more data and has a better
chance of being sent than does the ciscoEnvMonShutdownNotification. This notification
is deprecated in favour of ciscoEnvMonFanStatusChangeNotif.
.1.3.6.1.4.1.9.9.13.3.0.5 ciscoEnvMonRedundantS ciscoEnvMonRedundantSupplyNotification is sent if the redundant power supply (where Major
upplyNotification extant) fails. Since such
a notification is usually generated before the shutdown state is reached, it can convey
more data and has a better
chance of being sent than does the ciscoEnvMonShutdownNotification. This notification
is deprecated in favour of ciscoEnvMonSuppStatusChangeNotif.
.1.3.6.1.4.1.9.9.13.3.0.6 ciscoEnvMonVoltStatusC ciscoEnvMonVoltStatusChangeNotif is sent if there is change in the state of a device Major
hangeNotif being monitored
by ciscoEnvMonVoltageState_sh.
.1.3.6.1.4.1.9.9.13.3.0.7 ciscoEnvMonTempStatus ciscoEnvMonTempStatusChangeNotif is sent if there is change in the state of a device Critical
ChangeNotif being monitored
by ciscoEnvMonTemperatureState_sh.
.1.3.6.1.4.1.9.9.13.3.0.8 ciscoEnvMonFanStatusC ciscoEnvMonFanStatusChangeNotif is sent if there is change in the state of a device Critical
hangeNotif being monitored
by ciscoEnvMonFanState_sh.
.1.3.6.1.4.1.9.9.13.3.0.9 ciscoEnvMonSuppStatus A ciscoEnvMonSupplyStatChangeNotif is sent if there is change in the state of a device Major
ChangeNotif being monitored
by ciscoEnvMonSupplyState_sh.

Operations and Maintenance Manual Page 138


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.26.2.0.1 demandNbrCallInformatio This trap/inform is sent to the manager whenever a successful call clears, or a failed call Minor
n attempt is determined to have
ultimately failed. In the event that call retry is active, then this is after all retry attempts
have failed. However,
only one such trap is sent in between successful call attempts; subsequent call attempts
result in no trap.
.1.3.6.1.4.1.9.9.26.2.0.2 demandNbrCallDetails This trap/inform is sent to the manager whenever a call connects, or clears, or a failed Major
call attempt is determined to
have ultimately failed. In the event that call retry is active, then this is after all retry
attempts have failed. However,
only one such trap is sent in between successful call attempts; subsequent call attempts
result in no trap.
Whenever a call connects, demandNbrLastDuration_sh, demandNbrClearReason_sh,
and demandNbrClearCode_sh objects are not
included in the trap.
.1.3.6.1.4.1.9.9.26.2.0.3 demandNbrLayer2Chang This trap/inform is sent to the manager whenever the D-channel of an interface changes Warning
e state.
.1.3.6.1.4.1.9.9.26.2.0.4 demandNbrCNANotificati This trap/inform is sent to the manager whenever an incoming call request is rejected Minor
on with cause
'requested circuit/channel not available' (CNA), code number 44.
isdnSignalingIfIndex_sh is the ifIndex value of the interface associated with this signaling
channel.
ifIndex is the interface index of the requested bearer channel
.1.3.6.1.4.1.9.9.43.2.0.1 ciscoConfigManEvent Notification of a configuration management event as recorded in ccmHistoryEventTable. Warning

Operations and Maintenance Manual Page 139


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.46.2.0.1 vtpConfigRevNumberErro A configuration revision number error notification signifies that a device has incremented Warning
r its vtpConfigRevNumberErrors counter.
Generation of this notification is suppressed if the vtpNotificationsEnabled has the value
'false'.
The device must throttle the generation of consecutive vtpConfigRevNumberError
notifications so that there is at least a five-second gap between notification of this type.
When notifications are throttled, they are dropped, not queued for sending at a future
time.
(Note that 'generating' a notification means sending to all configured recipients.)
.1.3.6.1.4.1.9.9.46.2.0.2 vtpConfigDigestError A configuration digest error notification signifies that a device has incremented its Warning
vtpConfigDigestErrors counter.
Generation of this notification is suppressed if the vtpNotificationsEnabled has the value
'false'.
The device must throttle the generation of consecutive vtpConfigDigestError notifications
so that there is at least a five-second gap between notification of this type. When
notifications are throttled, they are dropped, not queued for sending at a future time.
(Note that 'generating' a notification means sending to all configured recipients.)
.1.3.6.1.4.1.9.9.46.2.0.3 vtpServerDisabled A VTP Server disabled notification is generated when the local system is no longer able Minor
to function as a VTP Server
because the number of defined VLANs is greater than vtpMaxVlanStorage.
Generation of this notification is suppressed if the vtpNotificationsEnabled has the value
'false'.
.1.3.6.1.4.1.9.9.46.2.0.4 vtpMtuTooBig A VTP MTU tooBig notification is generated when a VLAN's MTU size is larger than can Minor
be supported either:
- by one or more of its trunk ports: the included vtpVlanState has the value
'mtuTooBigForTrunk' and the included
vlanTrunkPortManagementDomain is for the first (or only) trunk port, or
- by the device itself: vtpVlanState has the value 'mtuTooBigForDevice' and any instance
of vlanTrunkPortManagementDomain is included.
Devices which have no trunk ports do not send vtpMtuTooBig notifications.
Generation of this notification is suppressed if the vtpNotificationsEnabled has the value
'false'.

Operations and Maintenance Manual Page 140


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.46.2.0.6 vtpVersionOneDeviceDet A VTP version one device detected notification is generated by a device when: Warning
ected a) a management domain has been put into version 2 mode (as accessed by
managementDomainVersionInUse).
b) 15 minutes has passed since a).
c) a version 1 PDU is detected on a trunk on the device that is in that management
domain which has a lower revision number than the current configuration.
.1.3.6.1.4.1.9.9.46.2.0.7 vlanTrunkPortDynamicSt A vlanTrunkPortDynamicStatusChange notification is generated by a device when the Warning
atusChange value of vlanTrunkPortDynamicStatus object has been changed.
.1.3.6.1.4.1.9.9.46.2.0.8 vtpLocalModeChanged A vtpLocalModeChanged notification is generated by a device when the value of the Warning
object managementDomainLocalMode is changed.
.1.3.6.1.4.1.9.9.46.2.0.9 vtpVersionInUseChanged A vtpVersionInUseChanged notification is generated by a device when the value of the Warning
object managementDomainVersionInUse is changed.
.1.3.6.1.4.1.9.9.46.2.0.10 vtpVlanCreated A vtpVlanCreated notification is generated by a device when a VLAN is created. Warning
.1.3.6.1.4.1.9.9.46.2.0.11 vtpVlanDeleted A vtpVlanDeleted notification is generated by a device when a VLAN is deleted. Warning
.1.3.6.1.4.1.9.9.46.2.0.12 vtpVlanRingNumberConfli A VTP ring number configuration conflict notification is generated if, and only at the time Warning
ct when, a device learns of a conflict between:
a) the ring number (vtpVlanPortLocalSegment) being used on a token ring segment
attached to the port identified by ifIndex, and
b) the VTP-obtained ring number (vtpVlanRingNumber) for the VLAN identified by
vtpVlanIndex.
When such a conflict occurs, the bridge port is put into an administrative down position
until the conflict is resolved
through local/network management intervention.
This notification is only applicable to VLANs of type 'tokenRing'.
.1.3.6.1.4.1.9.9.52.2.0.1 cieTestCompletion A cieTestCompletion trap is sent at the completion of a crypto session establishment if Warning
such a trap was requested when the sequence was initiated.

Operations and Maintenance Manual Page 141


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.61.2.0.1 caemTemperatureNotifica A caemTemperatureNotification is sent if the over temperature condition is detected in Critical
tion the managed system.
This is a replacement for the ciscoEnvMonTemperatureNotification trap because the
information 'ciscoEnvMonTemperatureStatusValue_sh' required by the trap is not
available in the managed system.
.1.3.6.1.4.1.9.9.61.2.0.2 caemVoltageNotification A caemVoltageNotification is sent if the over voltage condition is detected and Critical
ciscoEnvMonVoltageState_sh is not set to 'notPresent' in the managed system. This is a
replacement for the ciscoEnvMonVoltageNotification trap because the information
'ciscoEnvMonVoltageStatusValue_sh' required by the trap is not available in the
managed system.
.1.3.6.1.4.1.9.9.87.2.0.1 c2900AddressViolation The addressViolation notification is generated when an address violation is detected on a Minor
secured port. The generation of the addressViolation notification can be enabled or
suppressed using the object c2900ConfigAddressViolationAction.
The particular secured port is indicated by the value of c2900PortIfIndex_sh.
.1.3.6.1.4.1.9.9.87.2.0.2 c2900BroadcastStorm The broadcastStorm notification is generated upon detecting a port is receiving broadcast Minor
packets at a rate crossing the specified broadcast threshold. This trap is only for the
rising threshold. The particular port is indicated by the values of
c2900PortModuleIndex and c2900PortIndex, and the value of the threshold is given by
c2900PortBroadcastRisingThreshold_sh.
.1.3.6.1.4.1.9.9.87.2.0.3 c2900RpsFailed A redundant power system (RPS) is connected to the switch. The RpsFailed notification Major
is generated upon detecting RPS failure.
.1.3.6.1.4.1.9.9.96.2.1.1 ccCopyCompletion A ccCopyCompletion trap is sent at the completion of a config-copy request. The Minor
ccCopyFailCause is not instantiated, and hence not included in a trap, when the
ccCopyState_sh is success.
.1.3.6.1.4.1.9.9.106.2.0.1 cHsrpStateChange A cHsrpStateChange notification is sent when a cHsrpGrpStandbyState_sh transitions to Major
either active or standby state, or leaves active or standby state. There will be only one
notification issued when the state change is from standby to active and vice versa.

Operations and Maintenance Manual Page 142


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

.1.3.6.1.4.1.9.9.109.2.0.1 cpmCPURisingThreshold A cpmCPURisingThreshold notification is sent when configured rising CPU utilization Major
threshold (cpmCPURisingThresholdValue_sh)
is reached and CPU utilization remained above the threshold for configured
interval(cpmCPURisingThresholdPeriod) and such a notification is requested. The
cpmProcExtUtil5SecRev_sh and cpmProcessTimeCreated objects can be repeated
multiple times
in a notification indicating the top users of CPU.
.1.3.6.1.4.1.9.9.109.2.0.2 cpmCPUFallingThreshold A cpmCPUFallingThresholdTrap is sent when the configured falling threshold Clear / Normal
(cpmCPURisingThresholdValue_sh) is reached and CPU utilization remained under
threshold for configured interval (cpmCPUFallingThresholdPeriod) and such a notification
is requested.
.1.3.6.1.4.1.9.9.215.2.0.1 cmnMacChangedNotificat cmnMacMoveNotification is generated when a MAC address is moved between two Minor
ion interfaces.
.1.3.6.1.4.1.9.9.215.2.0.2 cmnMacMoveNotification cmnMacMoveNotification is generated when a MAC address is moved between two Minor
interfaces.
.1.3.6.1.4.1.9.9.215.2.0.3 cmnMacThresholdExceed cmnMacThresholdExceedNotif is sent when cmnUtilizationUtilization_sh exceeds or Minor
Notif equals to the cmnMACThresholdLimit_sh for a given entPhysicalIndex.

Operations and Maintenance Manual Page 143


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

9.2 Cisco System Alarms X733-Compliant


The following table contains the alarms received from a Cisco agent in a system compliant to the X733
standard. for connectivity to the TOMIA proprietary OSS monitoring mediator tool.

Table 33: X733-Compliant Cisco Alarms

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.1 ciscoColdStartSh A coldStart trap signifies that the SNMPv1 entity, acting in an agent role, is Warning
reinitializing itself and that its configuration may have been altered.
1.3.6.1.4.1.8161.200.5.0.2 ciscoWarmStartSh A warmStart trap signifies that the SNMPv1 entity, acting in an agent role, is Warning
reinitializing itself such that its configuration is unaltered.
1.3.6.1.4.1.8161.200.5.0.3 ciscoLinkDownSh A linkDown trap signifies that the SNMP entity, acting in an agent role, has Minor
detected that the ifOperStatus
object for one of its communication links is about to enter the down state from
some other state (but not from the notPresent state). This other state is indicated
by the included value of ifOperStatus.
1.3.6.1.4.1.8161.200.5.0.4 ciscoLinkUpSh A linkUp trap signifies that the SNMP entity, acting in an agent role, has detected Normal
that the ifOperStatus object for one of its communication links left the down state
and transitioned into some other state (but not into the notPresent state). This
other state is indicated by the included value of ifOperStatus.
1.3.6.1.4.1.8161.200.5.0.5 ciscoAuthFailureSh An authenticationFailure trap signifies that the SNMPv1 entity, acting in an agent Warning
role, has received a protocol message that is not properly authenticated. While all
implementations of the SNMPv1 must be capable of generating this trap, the
snmpEnableAuthenTraps object indicates whether this trap will be generated.
1.3.6.1.4.1.8161.200.5.0.6 ciscoModuleUpSh A moduleUp trap signifies that the agent entity has detected that the moduleStatus Normal
object in this MIB has transitioned to the OK(2) state for one of its modules. The
generation of this trap can be controlled by the sysEnableModuleTraps object in
this MIB.

Operations and Maintenance Manual Page 144


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.7 ciscoModuleDownSh A moduleDown trap signifies that the agent entity has detected that the Minor
moduleStatus object in this MIB has transitioned out of the OK(2) state for one of
its modules. The generation of this trap can be controlled by the
sysEnableModuleTraps object in this MIB.
1.3.6.1.4.1.8161.200.5.0.8 ciscoLS1010ChassisChangeNo Agent detects any hot-swap component change or changes in the chassis. Warning
tifiSh
1.3.6.1.4.1.8161.200.5.0.9 ciscoEnvMonshutdownNotifSh A ciscoEnvMonShutdownNotification is sent if the environmental monitor detects a Major
testpoint reaching a critical state and is about to initiate a shutdown. This
notification contains no objects so that it may be encoded and sent in the shortest
amount of time possible. Even so, management applications should not rely on
receiving such a notification as it may not be sent before the shutdown completes.
1.3.6.1.4.1.8161.200.5.0.10 ciscoEnvMonVoltageNotification A ciscoEnvMonVoltageNotification is sent if the voltage measured at a given Major
Sh testpoint is outside the normal range
for the testpoint (i.e., is at the warning, critical, or shutdown stage). Since such a
notification is usually generated before the shutdown state is reached, it can
convey more data and has a better chance of being sent than does the
ciscoEnvMonShutdownNotification. This notification is deprecated in favor of
ciscoEnvMonVoltStatusChangeNotif.
1.3.6.1.4.1.8161.200.5.0.11 ciscoEnvMonTemperatureNotif A ciscoEnvMonTemperatureNotification is sent if the temperature measured at a Critical
Sh given testpoint is outside the normal range for the testpoint (i.e., is at the warning,
critical, or shutdown stage). Since such a notification is usually generated before
the shutdown state is reached, it can convey more data and has a better chance of
being sent than does the ciscoEnvMonShutdownNotification. This notification is
deprecated in favor of ciscoEnvMonTempStatusChangeNotif.
1.3.6.1.4.1.8161.200.5.0.12 ciscoEnvMonFanNotificationSh ciscoEnvMonFanNotification is sent if any one of the fans in the fan array (where Critical
extant) fails. Since such a notification is usually generated before the shutdown
state is reached, it can convey more data and has a better chance of being sent
than does the ciscoEnvMonShutdownNotification. This notification is deprecated in
favor of ciscoEnvMonFanStatusChangeNotif.

Operations and Maintenance Manual Page 145


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.13 ciscoEnvMonRedundantSupply ciscoEnvMonRedundantSupplyNotification is sent if the redundant power supply Major


Sh (where applicable) fails. Since such a notification is usually generated before the
shutdown state is reached, it can convey more data and has a better chance of
being sent than does the ciscoEnvMonShutdownNotification. This notification is
replaced by ciscoEnvMonSuppStatusChangeNotification.
1.3.6.1.4.1.8161.200.5.0.14 ciscoEnvMonVoltStatusChange ciscoEnvMonVoltStatusChangeNotification is sent if there is a change in the state Major
Sh of a device being monitored by ciscoEnvMonVoltageState_sh.
1.3.6.1.4.1.8161.200.5.0.15 ciscoEnvMonTempStatusChang ciscoEnvMonTempStatusChangeNotification is sent if there is a change in the state Critical
eSh of a device being monitored by ciscoEnvMonTemperatureState_sh.
1.3.6.1.4.1.8161.200.5.0.16 ciscoEnvMonFanStatusChange ciscoEnvMonFanStatusChangeNotification is sent if there is a change in the state Critical
Sh of a device being monitored by ciscoEnvMonFanState_sh.
1.3.6.1.4.1.8161.200.5.0.17 ciscoEnvMonSuppStatusChang ciscoEnvMonSupplyStatChangeNotification is sent if there is a change in the state Major
eSh of a device being monitored by ciscoEnvMonSupplyState_sh.
1.3.6.1.4.1.8161.200.5.0.18 ciscoDemandNbrCallInformatio This trap/inform is sent to the manager whenever a successful call clears, or a Minor
nSh failed call attempt is determined to have ultimately failed. In the event that call retry
is active, then this is after all retry attempts have failed. However, only one such
trap is sent in between successful call attempts; subsequent call attempts result in
no trap.
1.3.6.1.4.1.8161.200.5.0.19 ciscoDemandNbrCallDetailsSh This trap/inform is sent to the manager whenever a call connects, or clears, or a Major
failed call attempt is determined to have ultimately failed. In the event that call retry
is active, then this is after all retry attempts have failed. However, only one such
trap is sent in between successful call attempts; subsequent call attempts result in
no trap. Whenever a call connects, demandNbrLastDuration_sh,
demandNbrClearReason_sh, and demandNbrClearCode_sh objects are not
included in the trap.
1.3.6.1.4.1.8161.200.5.0.20 ciscoDemandNbrLayer2Change This trap/inform is sent to the manager whenever the D-channel of an interface Warning
Sh changes state.

Operations and Maintenance Manual Page 146


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.21 ciscoDemandNbrCNANotificatio This trap/inform is sent to the manager whenever an incoming call request is Minor
nSh rejected with cause
'requested circuit/channel not available' (CNA), code number
44.isdnSignalingIfIndex_sh is the ifIndex value of the interface associated with this
signaling channel.
ifIndex is the interface index of the requested bearer channel.
1.3.6.1.4.1.8161.200.5.0.22 ciscoConfigManEventSh Notification of a configuration management event as recorded in Warning
ccmHistoryEventTable.
1.3.6.1.4.1.8161.200.5.0.23 ciscoVtpConfigRevNumberError A configuration revision number error notification signifies that a device has Warning
Sh incremented its vtpConfigRevNumberErrors counter. Generation of this notification
is suppressed if the vtpNotificationsEnabled has the value 'false'. The device must
throttle the generation of consecutive vtpConfigRevNumberError notifications so
that there is at least a five-second gap between notifications of this type. When
notifications are throttled, they are dropped, not queued, for sending at a future
time. (Note that 'generating' a notification means sending to all configured
recipients.)
1.3.6.1.4.1.8161.200.5.0.24 ciscoVtpConfigDigestErrorSh A configuration digest error notification signifies that a device has incremented its Warning
vtpConfigDigestErrors counter.Generation of this notification is suppressed if the
vtpNotificationsEnabled has the value 'false'. The device must throttle the
generation of consecutive vtpConfigDigestError notifications so that there is at least
a five-second gap between notifications of this type. When notifications are
throttled, they are dropped, not queued, for sending at a future time. (Note that
'generating' a notification means sending to all configured recipients.)
1.3.6.1.4.1.8161.200.5.0.25 ciscoVtpServerDisabledSh A vtpServerDisabled notification is generated when the local system is no longer Minor
able to function as a VTP Server because the number of defined VLANs is greater
than vtpMaxVlanStorage. Generation of this notification is suppressed if the
vtpNotificationsEnabled has the value 'false'.

Operations and Maintenance Manual Page 147


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.26 ciscoVtpMtuTooBigSh A vtpMtuTooBig notification is generated when a VLAN's MTU size is larger than Minor
can be supported either:
a) by one or more of its trunk ports: the included vtpVlanState has the value
'mtuTooBigForTrunk' and the included vlanTrunkPortManagementDomain is for the
first (or only) trunk port, or
b) by the device itself: vtpVlanState has the value 'mtuTooBigForDevice' and any
instance of vlanTrunkPortManagementDomain is included.
Devices which have no trunk ports do not send vtpMtuTooBig
notifications.Generation of this notification is suppressed if the
vtpNotificationsEnabled has the value 'false'.
1.3.6.1.4.1.8161.200.5.0.27 ciscoVtpVersionOneDevice A vtpVersionOneDeviceDetected notification is generated by a device when: Warning
DetectedSh a) A management domain has been put into version two mode (as accessed by
managementDomainVersionInUse).
b) 15 minutes has passed since a).
c) A version one PDU is detected on a trunk on the device that is in the
management domain with a lower revision number than the current configuration.
1.3.6.1.4.1.8161.200.5.0.28 ciscoVlanTrunkPortStatusChan A vlanTrunkPortDynamicStatusChange notification is generated by a device when Warning
ge the value of vlanTrunkPortDynamicStatus object has been changed.
Sh
1.3.6.1.4.1.8161.200.5.0.29 ciscoVtpLocalModeChangedSh A vtpLocalModeChanged notification is generated by a device when the value of Warning
the object managementDomainLocalMode is changed.
1.3.6.1.4.1.8161.200.5.0.30 ciscoVtpVersionInUseChanged A vtpVersionInUseChanged notification is generated by a device when the value of Warning
Sh the object managementDomainVersionInUse is changed.
1.3.6.1.4.1.8161.200.5.0.31 ciscoVtpVlanCreatedSh A vtpVlanCreated notification is generated by a device when a VLAN is created. Warning
1.3.6.1.4.1.8161.200.5.0.32 ciscoVtpVlanDeletedSh A vtpVlanDeleted notification is generated by a device when a VLAN is deleted. Warning

Operations and Maintenance Manual Page 148


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.33 ciscoVtpVlanRingNumberConfli A vtpVlanRingNumberConfigurationConflict notification is generated if, and only at Warning


ct the time when, a device learns of a conflict between:
Sh a) the ring number (vtpVlanPortLocalSegment) being used on a token ring segment
attached to the port identified by ifIndex, and
b) the VTP-obtained ring number (vtpVlanRingNumber) for the VLAN identified by
vtpVlanIndex.
When such a conflict occurs, the bridge port is put into an administrative down
position until the conflict is resolved through local/network management
intervention. This notification is only applicable to VLANs of type 'tokenRing'.
1.3.6.1.4.1.8161.200.5.0.34 ciscoCieTestCompletionSh A cieTestCompletion trap is sent at the completion of a crypto session Warning
establishment if such a trap was requested when the sequence was initiated.
1.3.6.1.4.1.8161.200.5.0.35 ciscoCaemTempeNotifSh A caemTemperatureNotification is sent if the over temperature condition is Critical
detected in the managed system. This is a replacement for the
ciscoEnvMonTemperatureNotification trap because the information
'ciscoEnvMonTemperatureStatusValue_sh' required by the trap is not available in
the managed system.
1.3.6.1.4.1.8161.200.5.0.36 ciscoCaemVoltageNotificationS A caemVoltageNotification is sent if the over voltage condition is detected and Critical
h ciscoEnvMonVoltageState_sh is not set to 'notPresent' in the managed system.
This is a replacement for the ciscoEnvMonVoltageNotification trap because the
information 'ciscoEnvMonVoltageStatusValue_sh' required by the trap is not
available in the managed system.
1.3.6.1.4.1.8161.200.5.0.37 ciscoC2900AddressViolationSh The addressViolation notification is generated when an address violation is Minor
detected on a secured port. The generation of the addressViolation notification can
be enabled or suppressed using the object c2900ConfigAddressViolationAction.
The particular secured port is indicated by the value of c2900PortIfIndex_sh.
1.3.6.1.4.1.8161.200.5.0.38 ciscoC2900BroadcastStormSh The broadcastStorm notification is generated upon detecting that a port is receiving Minor
broadcast packets at a rate crossing the specified broadcast threshold. This trap is
only for the rising threshold. The particular port is indicated by the values of
c2900PortModuleIndex and c2900PortIndex, and the value of the threshold is
given by c2900PortBroadcastRisingThreshold_sh.

Operations and Maintenance Manual Page 149


Proprietary and Confidential Cisco LAN Switch Unit (LSU) and Router Agent

OID Trap Name Description Severity

1.3.6.1.4.1.8161.200.5.0.39 ciscoC2900RpsFailedSh A redundant power system (RPS) is connected to the switch. The RpsFailed Major
notification is generated upon detecting RPS failure.
1.3.6.1.4.1.8161.200.5.0.40 ciscoCcCopyCompletionSh A ccCopyCompletion trap is sent at the completion of a config-copy request. The Minor
ccCopyFailCause is not instantiated, and hence not included in a trap, when the
ccCopyState_sh is a success.
1.3.6.1.4.1.8161.200.5.0.41 ciscoHsrpStateChangeSh A ciscoHsrpStateChange notification is sent when a ciscoHsrpGrpStandbyState_sh Major
transitions to either active or standby state, or leaves active or standby state. There
will be only one notification issued when the state change is from standby to active
and vice versa.
1.3.6.1.4.1.8161.200.5.0.42 ciscoCpmCPURisingThreshold A cpmCPURisingThreshold notification is sent when configured rising CPU Major
Sh utilization threshold (cpmCPURisingThresholdValue_sh) is reached and CPU
utilization remains above the threshold for configured interval
(cpmCPURisingThresholdPeriod) and such a notification is requested. The
cpmProcExtUtil5SecRev_sh and cpmProcessTimeCreated objects can be
repeated multiple times in a notification indicating the top users of CPU.
1.3.6.1.4.1.8161.200.5.0.43 ciscoCpmCPUFallingThreshold A cpmCPUFallingThreshold trap is sent when the configured falling threshold Normal
Sh (cpmCPURisingThresholdValue_sh) is reached and CPU utilization remains under
threshold for configured interval (cpmCPUFallingThresholdPeriod) and such a
notification is requested.
1.3.6.1.4.1.8161.200.5.0.44 ciscoCmnMacChangedNotificati A cmnMacChangeNotification is generated when a MAC address is moved Minor
onSh between two interfaces.
1.3.6.1.4.1.8161.200.5.0.45 ciscoCmnMacMoveNotification A cmnMacMoveNotification is generated when a MAC address is moved between Minor
Sh two interfaces.
1.3.6.1.4.1.8161.200.5.0.46 ciscoCmnMacThresholdExceed A cmnMacThresholdExceedNotification is sent when cmnUtilization_sh exceeds or Minor
Notification is equal to the
cmnMacThresholdLimit_sh for a given entPhysicalIndex.

Operations and Maintenance Manual Page 150


Proprietary and Confidential 3-COM LAN Switch Unit (LSU) Alarms

10. 3-COM LAN Switch Unit (LSU) Alarms


The table below lists the 3-COM LSU switching alarm traps used by TOMIA.

Table 34: 3-COM Switch Alarm Traps

OID Trap name Description Severity


Non redundant/
Redundant

1.3.6.1.4.1.43.45.1.6.10.2.1 h3cCfgManEventlog The object calculates the checksum on the Info/info


current configuration every 10 minutes.
If a trap has been sent with the same checksum
(even if it is different from the saved
configuration), do not send the trap again (until
the checksum changes).
1.3.6.1.4.1.43.45.1.5.25.19.2.1 h3cSysClockChangedNotification Clock changed notification is generated when Info/info
current local date and time for system was
manually changed. Value of h3cSysLocalClock
reflects new date and time.
1.3.6.1.4.1.43.45.1.5.25.19.2.2 h3cSysReloadNotification h3cSysReloadNotification is sent before Critical/Minor
corresponding entity is rebooted. It is also sent if
entity fails to reboot because the clock changed.
1.3.6.1.4.1.43.10.2.6.2.0.4 h3cEntityExtCpuUsageThresholdNotfi h3cEntityExtCpuUsageThresholdNotfication Minor/Minor
cation indicates that entity is overloaded.
1.3.6.1.4.1.43.10.2.6.2.0.5 h3cEntityExtMemUsageThresholdNotif h3cEntityExtMemUsageThresholdNotification Minor/Minor
ication indicates that entity is overloaded.
1.3.6.1.4.1.43.10.2.6.2.0.6 h3cEntityExtOperEnabled Indicates that entity is operable at present. Info/info
1.3.6.1.4.1.43.10.2.6.2.0.7 h3cEntityExtOperDisabled Indicates that entity is not operable at present. Minor/Minor
1.3.6.1.4.1.43.5.25.25.2.2 hwAggPortInactiveNotification Event is triggered whenever any port in Minor/Minor
aggregator is made inactive

Operations and Maintenance Manual Page 151


Proprietary and Confidential 3-COM LAN Switch Unit (LSU) Alarms

OID Trap name Description Severity


Non redundant/
Redundant

1.3.6.1.4.1.43.10.2.26.1.3.2 h3cSecureViolation Sent for security violation. The port on which the Info/info
violation occurred is the first object, and the
MAC address of the offending station is in the
second object. ifAdminStatus indicates if the port
has been disabled because of the violation. The
implementation may not send violation traps
from the same port at intervals of less than 5
seconds.
1.3.6.1.4.1.43.45.1.10.2.26.1.3.3 h3cSecureLoginFailure Sent when user network access authentication Info/Info
fails. The port on which the violation occurred is
the first object, and the MAC address of the
offending station is in the second object.
dot1xAuthSessionUserName is the identity
supplied during the user authentication.
1.3.6.1.4.1.43.45.1.6.7.1.0.1 hgmpMemberfailure For cluster member failure, send snmp trap to NA/Minor
network management.
1.3.6.1.4.1.43.45.1.6.7.1.0.2 hgmpMemberRecover For cluster member recovery, send snmp trap to NA/Info
network management.
1.3.6.1.4.1.43.45.1.6.7.1.0.3 hgmpMemberStatusChange For cluster member status change, send snmp NA/Info
trap to network management.

Operations and Maintenance Manual Page 152


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11. TOMIA Platform Agent Application Related Alarms


This section describes alarms that the TOMIA platform generates because of an application-related issue.

11.1 Generating Alarms


When the platform detects a fault, it generates an alarm. Each alarm comprises two SNMP traps; the first
transmitted at the detection of the fault (up), and the second transmitted when the fault clears (down).
The OID prefix for all these alarms: 1.3.6.1.4.1.8161.100.1.0.
Traps that raise alarms have one of the following suffixes:
* 21,23,25,27.
Traps that drop alarms have one of the following suffixes:
* 22,24,26,28.
The definitions of the OIDs are located in the MIB file.
The igalarmNumber variable distinguishes the alarms.
The combination of the OID suffix, igalarmNumber (varbind 8) and source IP/managedObjectInstance enables
raise/drop correlation.
No trap is sent when clearing the alarms listed in Table 41 and Table 42.
The following subsections list the alarms that the platform can generate in response to triggering events from
the service application:
* Application Server Hardware Capacity Alarms
* Generic Platform Agent Alarms
* Tomcat Alarms
* Reporting Tool Offline Database Alarms
* HP Server Serviceguard (Linux Cluster) Alarms
* HP Server Veritas (Linux Cluster) Alarms
* General Backup Alarms: Includes the following backup types:
▪ HP Data Protector

Operations and Maintenance Manual Page 153


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

▪ Oracle
* TimesTen Database Alarms
* GTP Gateway Alarms

Operations and Maintenance Manual Page 154


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.2 Application Server Hardware Capacity Alarms


Table 35 lists and describes the TOMIA hardware alarms.
Table 35: General Hardware Alarms

Dec ID Hex ID Severity Alarm Text Description Action

50331700 3000034 * If Local disk is full > 85% = Major SHMON: Service: local_disks State: $state The file system is about to Take measures
* if Local disk is full > 90% = Critical Info: $info become full. according to file
system content.
50331701 3000035 * Memory Usage free < 10% Major SHMON:Service: mem_usage State: $state The system is about to run Configure the
* Memory usage free < 5% Critical Info: $info out of memory. This is system to avoid
usually because this problem
process/es are consuming occurring again.
too much memory.
Memory = (used swap +
used real) /(total Swap +
total real)
50331702 3000036 * Average CPU over 30 minutes > 85% SHMON:Service: cpu_usage State: $state Info: Usually indicates that a Configure the
Major $info process is using too much system to avoid
* Average CPU over 30 minutes > CPU. this problem
90% Critical occurring again.

Operations and Maintenance Manual Page 155


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.3 Generic Platform Agent Alarms


Table 36 lists and describes the generic TOMIA platform agent alarms.
Table 36: Generic TOMIA Platform Agent Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331705 3000039 Critical SHMON: Service: disk_errors One or more errors on disk. Call GSOC.
50331706 300003A Critical SHMON: Service: disk_errors Errors on disk. Call GSOC.
50331707 300003B Critical SHMON:Service: disk_not_present_HDD0 First HDD (HDD0) not present Insert Disk
State: $state Info: $info
50331708 300003C Critical SHMON:Service: disk_not_present_HDD1 Second HDD (HDD1) not Insert Disk
State: $state Info: $info present
50331711 300003F Major SHMON: SHMON has configuration error SHMON self-test encountered Open SR and send the
a configuration error. $INFRAROOT/load_alarm/log/load_alarm.log
file to GSOC.
50331768 3000078 Critical OSS stopped on specific node OSS stopped on specific node. Call GSOC
50331710 300003D Major- SHMON: The storage unit MSA2000 has DDU Storage unit has faulty Open SR and send the
Critical faulty component component. $INFRAROOT/load_alarm/log/load_alarm.log
file to GSOC.
50331769 3000079 Major Oracle Hot Backup: Oracle Hot backup Oracle backup failed for 3 Contact GSOC
failed consecutive days.

50331770 300007A Major Snapshot Backup: Daily snapshot operation Daily snapshot creation Contact GSOC.
failed procedure failed. Snapshot file
system cannot be backed up

Operations and Maintenance Manual Page 156


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331771 300007B Major Snapshot Backup: Backup of snapshot file Backup software failed to Contact GSOC.
system failed backup snapshot based file
system
Action:
50331717 3000045 Major SHMON:Failed to monitor storage A fault occurred when trying To If this alarm is raised multiple times, contact
monitor Storage device.. TOMIA GSOC

Operations and Maintenance Manual Page 157


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.4 Tomcat Alarms


Table 37 lists and describes the Tomcat alarms, relevant for the Provisioning Server Unit (PSU).
Table 37: Tomcat Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331767 3000077 Critical Tomcat: Package $PACKAGE_NAME stopped Tomcat cluster stopped on specific node Call
GSOC

Operations and Maintenance Manual Page 158


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.5 Reporting Tool Offline Database Alarms

11.5.1 Alarm Classification


Table 38 lists and describes the types of reporting alarm.
Table 38: Reporting Alarm Types

Dec ID Hex ID Severity Alarm Name Description Action

50331713 3000041 Major SHMON: ETL: Failure occurred during ETL process A phase of the ETL process has failed. Call
GSOC
50331714 3000042 Major SHMON: ETL: Failure occurred during aggregation ETL section responsible for aggregation encountered a Call
calculation process. problem. GSOC

11.6 HP Server Serviceguard (Linux Cluster) Alarms


Note: This section is relevant to Serviceguard Linux clusters only. For a Veritas Linux cluster, refer to Section 11.7.
Table 39 lists and describes the Serviceguard Linux cluster alarms for use with HP servers.
Table 39: Serviceguard Linux Cluster Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331660 300000C Critical SG: Package $PACKAGE_NAME [failed to Linux cluster package failed to start, or stopped on Call GSOC.
start || stopped] specific node.
Normal: SG: Package
$PACKAGE_NAME started

Operations and Maintenance Manual Page 159


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331661 300000D Major SG: Package $PACKAGE_NAME started on Linux cluster package started on standby node. Fix problem and
standby node move back
Normal: Package $PACKAGE_NAME started resource group to
on primary node primary node.

50331662 300000E Critical SG: Package $PACKAGE_NAME stopped Linux cluster package stopped on specific node. Investigate the
cause for and fix
the problem

11.7 HP Server Veritas (Linux Cluster) Alarms


Note: This section is relevant to Veritas Linux clusters only. For a Serviceguard Linux cluster, refer to Section 11.6.
Table 40 lists and describes the Veritas Linux cluster alarms for use with HP servers.
Table 40: Veritas Linux Cluster Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331774 300007E Critical Oracle: Package $PACKAGE_NAME stopped Oracle cluster package stopped If the package did not start on
running on a specific server. another server, contact
GSOC.
50331775 300007F Critical SHMON:Storage volume manager status Fault detected in Veritas volume Contact GSOC
degraded manager in site aware configuration
50331776 3000080 Critical Campus cluster: one of the sites is down or One site in Veritas campus cluster lost Contact GSOC
unavailable connection with the other site.

Operations and Maintenance Manual Page 160


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.8 General Backup Alarms


In addition to the regular platform activity alarms, the system also generates alarms when an issue is
detected regarding the backup mechanisms. These alarms are described in this section.

11.8.1 HP Data Protector


The HP Data Protector Backup alarms described in Table 41 are relevant to the IntelliGate Backup Utility. No
trap is sent on the clearing of these alarms.

Table 41: HP Data Protector Backup Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331650 3000002 Major DP: Replace backup cartridge Data Protector backup cartridge full. Replace Data
Protector backup
cartridge.
50331703 3000037 Major SHMON: The Backup job $JOB_NAME hasn't run for the One of the backup jobs did not run in the past Use Open Data
past 3 days three days. Protector (backup
software) GUI and
refer to vendor
documentation.
50331704 3000038 Major SHMON: There were errors in session(s) $JOB_NAME1 Error in one of the backup jobs that ran on the Open Data
$JOB_NAME2 ... previous day. Protector (backup
software) GUI and
refer to vendor
documentation.
50331709 300003D Major SHMON: The File System $filesystem was not backed Error in one of backup jobs. Open Data
up in job $session_id on node $node One of the file systems monitored by Protector (backup
DataProtector was not backed up. software) GUI and
refer to vendor
documentation.

Operations and Maintenance Manual Page 161


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.8.2 Oracle
Table 42 lists the backup alarms for systems running Oracle. No trap is sent on the clearing of these alarms.
Table 42: Oracle Backup Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331760 3000070 Major RMAN Backup System: performed emergency Emergency archive deletion was performed Contact GSOC
archivelog files deletion on $dest due to very low disk space on Oracle archive
filesystem $dest. The backup may no longer
be viable.
Repeated occurrences may indicate lack of
disk space for archive files.
50331761 3000071 Major RMAN Backup System: emergency backup Emergency backup has started due to very Contact GSOC.
started for $sid low disk space on the Oracle archive file
systems.
Repeated occurrences may indicate lack of
disk space for archive files.
50331762 3000072 Major RMAN Backup System: restore validation of Last backup set is unusable. It cannot be Contact GSOC.
backup id $backupId of $sid failed used to restore the database.
Repeated occurrences may indicate problem
in the Oracle backup procedure.
50331763 3000073 Major RMAN Backup System: backup id $backupId Backup with ID $backupId has errors in Contact GSOC
generated errors, log file: $backupLog $backupLog file.
Repeated occurrences may indicate problem
in the Oracle backup procedure.
50331764 3000074 Major export: there was an error in an export for Export backup for instance $INSTANCE Contact GSOC
instance $INSTANCE on host $HOST produced an error.
Recurring alert can indicate a problem in the
export backup procedure

Operations and Maintenance Manual Page 162


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

Dec ID Hex ID Severity Alarm Message Description Action

50331712 3000040 Major/Critical SHMON: Oracle: tablespace is about to become Oracle tablespace is almost full: Contact GSOC
full * If tablespace is > 85% = Major
* if tablespace is > 90% = Critical
50331715 3000043 Major SHMON: Oracle: remote database unreachable Relevant for Geo-redundancy Contact GSOC
Remote Oracle database is unreachable.
Indicates problem in Oracle database or in
connection between server at current site
and server at remote site.
50331716 3000044 Major SHMON: Oracle: a fault occurred in replication Relevant for Geo-redundancy Contact GSOC
process Fault occurred in replication process between
current database and remote database.
50331780 3000084 Major DR: There was an error in a database export Relevant for Disaster Recovery (DR). Recurring alert can
An error occurred during an export of a indicate an issue in the
database used as part of a DR setup. DR export procedure.
Call GSOC.

Operations and Maintenance Manual Page 163


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.9 TimesTen Database Alarms


Table 43 lists and describes the TimesTen Database alarms.
Table 43: TimesTen Database Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

50331765 3000075 TimesTen TimesTen: running low on free Major The permanent or temporary Call GSOC
Database memory space in a TimesTen datastore is
running low
50331766 3000076 TimesTen TimesTen: a datastore seems to Critical An essential TimesTen datastore Call GSOC
Database be down appears to be down
50331772 300007C TimesTen SHMON: TimesTen: remote Major Relevant for Geo-redundancy Call GSOC
Database datastore unreachable Remote TimesTen datastore is
unreachable.
Indicates problem in TimesTen
datastore, or in connection
between current server and
remote server.
50331773 300007D TimesTen SHMON: TimesTen: datastore Major Relevant for Geo-redundancy Call GSOC
Database needs recovery TimesTen datastore is out of
sync with its counterparts, and
requires recovery

Operations and Maintenance Manual Page 164


Proprietary and Confidential TOMIA Platform Agent Application Related Alarms

11.10 GTP Gateway Alarms


Table 44 lists and describes the GTP Gateway alarms.
Table 44: GTP Gateway Alarms

Dec ID Hex. ID Module Severity Alarm Name Description Action

50331777 3000081 GTP Critical GTP_IPGW: errors on interface The GPT ip gateway monitoring script detected errors Call GSOC
Gateway on ip gw interface
50331778 3000082 GTP Critical GTP_IPGW: interface is not The GPT ip gateway monitor detected network failure. Call GSOC
Gateway running
50331779 3000083 GTP Critical GTP_IPGW: config error The GPT ip gateway monitor detected configuration Call GSOC
Gateway error.

Click here to enter text.

Operations and Maintenance Manual Page 165


Proprietary and Confidential Application Agent Alarms

1. Application Agent Alarms


When the application detects a fault, it generates an alarm. This section describes the alarms that are specific
to the GLR service.

1.1 Generating Alarms


Each alarm consists of two SNMP traps:
* Transmission on detecting a fault (up)
* Transmission on clearing a fault (down)
TOMIA utilizes the notificationIdentifier variable to correlate alarms.
Example:
The igAlarmUp trap notifies detection of fault (alarm up) with following OID:
1.3.6.1.4.1.8161.100.1.0.1
The igAlarmDown trap notifies detection of cleared fault (alarm down) with following OID:
1.3.6.1.4.1.8161.100.1.0.2

Note: Both traps have the same notification identifier number.


Both igAlarmUp and igAlarmDown have the same prefix:
1.3.6.1.4.1.8161.100.1.0.
Table 45 lists specific alarm ID pairs (OID suffixes) for TOMIA application agent and platform agent alarms.
Table 45: Specific Alarm ID Pairs for Application and Platform Agent Alarms

OID Trap Type Suffix Alarm Name Purpose

Raise/Drop Opposite Alarm


1.3.6.1.4.1.8161.100..1.0 1 igalarmUp RAISE igalarmDown

Operations and Maintenance Manual Page 166


Proprietary and Confidential Application Agent Alarms

OID Trap Type Suffix Alarm Name Purpose

Raise/Drop Opposite Alarm


2 igalarmDown DROP igalarmUp
3 igProcessUp DROP igProcessDown
4 igProcessDown RAISE igProcessUp
5 igProcessCommUp DROP igProcessCommDown
6 igProcessCommDown RAISE igProcessCommUp
7 igSMSCommUp DROP igSMSCommDown
8 igSMSCommDown RAISE igSMSCommUp
13 iGalive (Heartbeat) RAISE No end notification
21 iGCriticalUp RAISE iGCriticalDown
22 iGCriticalDown DROP iGCriticalUp
23 iGMajorUp RAISE iGMajorDown
24 iGMajorDown DROP iGMajorUp
25 iGMinorUp RAISE iGMinorDown
26 iGMinorDown DROP iGMinorUp
27 iGWarningUp RAISE iGWarningDown
28 iGWarningDown DROP iGWarningUp

The following parameters describe the problem and severity of a trap:


* igalarmNumber (variable): Unique number, with specific text (specificProblem) to describe the problem.
* perceivedSeverity (variable): Alarm severity. igAlarmUP and igAlarmDown traps contain same severity.
The following parameters provide alarm source identification:
* Trap source IP address provided by IP datagram

Operations and Maintenance Manual Page 167


Proprietary and Confidential Application Agent Alarms

* managedObjectInstance variable: Describes the specific process, and has the format CCC-OOOOO-
NN_PPP[PP].
Where
▪ CCC: Three letters for country code
▪ OOOOO: Five characters that describe the customer and server number
▪ NN: The process number
▪ PPP[PP]: The process name
The pduSequenceNo (variable) enumerates the number of the PDUs/SNMP Traps. Monitoring the numbers
allows for detection of missing SNMP traps:
A heartbeat message notifies that the system is up and running. The message comprises the following
separate messages for the same alarm number:
1. Keep Alive: Periodic informative trap (sent for example, every 15 minutes). OSS can be configured to
respond when the trap is not sent.
2. System is Up: Informs of system initialization.
3. System Going Down: Sent when operator initiates system shut down.
Heartbeat messages have no end notification, and use OID 1.3.6.1.4.1.8161.100.1.0.13.
Alarm Synchronization:
Alarm synchronization is the process of synchronizing the agent's view of uncleared, current alarms with the
SNMP manager of the OSS.
If the SNMP manager goes down, or communication with IntelliGate is lost, some of the traps sent during this
period can be missed. On restoring the connection, the SNMP manager requires update for this missing
information. The agent responds to a resend (SNMP set) command from the SNMP manager, by sending all
the new alarms.
In order to enable the synchronization process, the agent is equipped with a MIB variable named
trapTableActivation with a read/write access and an IpAddress type.
When the SNMP manager restores from a lost connection, it sets the trapTableActivation variable with its own
IP address value by sending a SNMP set command with OID=1.3.6.1.4.1.8161.100.2.1.0.

Operations and Maintenance Manual Page 168


Proprietary and Confidential Application Agent Alarms

Setting this variable with a specific IP address causes the agent to generate traps from the trap table, and to
send them to only the addressed SNMP manager.
The traps are sent in the order they were raised, oldest to newest.
Example: snmpset -r 0 -t 10 -c public -v 1 10.135.60.181:9610 1.3.6.1.4.1.8161.100.2.1.0 I 1
When resending the missed traps, regular trap sending stops to coordinate the monitoring system with the
NMS status. After the resending process ends, the system resumes sending new traps, including all the traps
that occurred in the interim period.

Note: The application implements only the set command. SNMP browsers cannot query the status of variables.

Operations and Maintenance Manual Page 169


Proprietary and Confidential Application Agent Alarms

1.2 Application Module Alarm Tables


The following tables list and describe the alarms for your service.

1.2.1 Monitor Alarms


Table 46 lists and describes the Monitor alarms.
Table 46: Monitor Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595288 12B0018 MON Critical ERR_VIPMON_MON_OP There was a failure to open Check that the Oracle Listener
EN_DB the specified database. service and the GIN service
are running. If not, start the
services. If services are
running, call GSOC.
19595318 12B0036 MON Major ALM_MON_MIN_CRASH The specified process Call GSOC.
_PERIOD_EXCEEDED crashed twice in less than the
minimum configured time.
19595319 12B0037 MON Warning ALM_MON_KEEP_ALIVE Keep alive. Alarm repeatedly displays to
indicate that the system
monitor is alive and operating
correctly.
19595322 12B003A MON Critical ALM_MON_PROCESS_D The specified process did not Call GSOC.
OES_NOT_STOP stop after setting of the
termination event.
19595420 12B009C MON Minor ALM_ACTIVE_SYSTEM_ The active system is up for Call GSOC.
UP specified Monitor Group.

Operations and Maintenance Manual Page 170


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595421 12B009D MON Critical ALM_STAND_BY_SYSTE The standby system is up for Failover occurred due to fault.
M_UP the specified Monitor Group. Investigate fault.
After repair, perform failure to
active system.
19595424 12B00A0 MON Major ALM_MONITOR_PREF_C Failure occurred during Call GSOC.
OUNTERS_TO_FILE_ME writing of counters to file.
CHANISM No data written tor report File.
19595455 12B00BF MON Critical ALM_MONITOR_PROCE MONITOR_PROCESS : Not Call GSOC.
SS_NOT_ALL_PROCESS All Monitored processes are
ES_RUNNING running - (%s)

1.2.2 GLR Alarms


Table 43 lists and describes the IRM alarms.
Table 47: IRM Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

33554505 2000049 GLR ALRTR_ALARM_TYPE_STA Info The GLR has started None
RTUP
33554506 200004A GLR ALRTR_ALARM_TYPE_KEE Info GLR Keep Alive message Call GSOC.
P_ALIVE
33554507 200004B GLR ALRTR_ALARM_TYPE_GLR Warning GLR OFF configuration was Check GLR
_IS_OFF detected configuration
33554509 200004D Database ALRTR_ALARM_TYPE_SUB Warning Subscriber table is empty Check that GLR
SCRIBERS_DB_IS_EMPTY receives traffic

Operations and Maintenance Manual Page 171


Proprietary and Confidential Application Agent Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

33554510 200004E Database ALRTR_ALARM_TYPE_SUB Warning The subscriber database capacity Call GSOC.
SCRIBERS_DB_NEAR_FULL is dangerously low
33554511 200004F Database ALRTR_ALARM_TYPE_SUB Critical The subscriber database is full Call GSOC.
SCRIBERS_DB_FULL
33554512 2000050 CDR ALRTR_ALARM_TYPE_CDR Warning CDR table is in dangerous Call GSOC.
_TABLE_NEAR_FULL capacity
33554513 2000051 CDR ALRTR_ALARM_TYPE_CDR Major CDR table is full Call GSOC.
_TABLE _FULL
33554514 2000052 TDR ALRTR_ALARM_TYPE_TDR Warning TDR table capacity is dangerously Call GSOC.
_TABLE_NEAR_FULL low
33554515 2000053 TDR ALRTR_ALARM_TYPE_TDR Major TDR table is full Call GSOC.
_TABLE _FULL
33554516 2000054 GLR ALRTR_ALARM_TYPE_COU Waning Counter tables capacity is Call GSOC.
NTER_TABLES_NEARLY_F dangerously low
ULL
33554517 2000055 GLR ALRTR_ALARM_TYPE_COU Major Counter tables are full Call GSOC.
NTER_TABLES_FULL
33554518 2000056 GLR ALRTR_ALARM_TYPE_COU Major Error on calling the counter Call GSOC.
NTERS_ software procedure
SP_ORACLE_ERROR
33554519 2000057 GLR ALRTR_ALARM_TYPE_COU Major General counter writing error Call GSOC.
NTERS_
SP_ORACLE_ERROR_FAIL
ED
33554520 2000058 GLR ALRTR_ALARM_TYPE_SUB Warning Subscriber database Clearing Call GSOC
_CLEARING_PARAMS_MIS parameters are not defined
SING

Operations and Maintenance Manual Page 172


Proprietary and Confidential Application Agent Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

33554521 2000059 GLR ALRTR_ALARM_TYPE_GT_ Major The default GT to the VLR, MSC Call GSOC
DEFS_ARE_MISSING or SGSN is missing.
33554522 200005A GLR ALRTR_ALARM_TYPE_CON Major The connection to the ORACLE Call GSOC
NECTION_2_ORCALE_DB_ database has failed
FAILED
33554523 200005B GLR ALRTR_ALARM_TYPE_CON Critical The connection to the TimesTen Call GSOC
NECTION_2_TIMESTEN_ subscriber database was lost
DISCONNECTED
33554524 200005C GLR ALRTR_ALARM_TYPE_APP Received
CONTEXTNOTSUPPORTED appContextNotSupported after
_AFTER_NEGOTITATION Phase negotiation procedure on
Msc
33554525 200005D GLR ALRTR_ALARM_TYPE_GEO Warning Geo start detection Alarm Active Call GSOC
_START_DETECTION side of the Geo started to get
traffic
33554526 200005E GLR ALRTR_ALARM_TYPE_TT_ Critical Geo start detection Alarm Active Call GSOC
REPLICATION_ERROR side of the Geo started to get
traffic
33554527 200005F GLR Manager ALRTR_ALARM_TYPE_NR_ Major Unsupported MAP message Check error log.
GLR_OPCODES

33554528 200006F GLR Manager ALRTR_ALARM_TYPE_CSF Critical The ratio of GLR CSFB failures to Call GSOC.
B total CSFB exceeds the configured
threshold.
33554616 20000B8 MAU Failed to allocate a new Critical Message load is too large, the Call GSOC.
session ID. Mau Handler system cannot send outgoing MAP
Session Ids pool is empty messages
33554618 20000BA MAU Mau Handler did not receive Major The system did not receive any Call GSOC.
MAP events MAP messages.

Operations and Maintenance Manual Page 173


Proprietary and Confidential Application Agent Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

67108865 4000001 FEP FEP_CLIENT_LATE_RESPO Major Number of FEP client late Call GSOC
NSE response timeouts has passed the
configured threshold
67108866 4000002 FEP FEP_CLIENT_QUEUE_FULL Major Quality of Service: Overflow in Call GSOC
FEP client message queue
67108867 4000003 FEP FEP_NO_ACTIVE_CLIENTS Critical Communication subsystem failure. Call GSOC
There are no active FEP clients
67108995 4000083 FEP FEP_MON_DEACTIVATE_S Critical FEP process is down, FEP monitor CALL GSOC
LK_ALM deactivates all SLK.

1.2.3 SIP Server Alarms


Table 48 lists and describes the SIP Server alarms.
Table 48: SIP Server Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

1073743106 40000502 SIP ALM_CC_APP_DISCONNECTED Critical SIP Call Control Application Call GSOC
server (MSC Service)
Disconnected.
1073743107 40000503 SIP ALM_REG_APP_DISCONNECTED Critical SIP Call Control Application (VLR Call GSOC
Server Service)
Disconnected.
1073743108 40000504 SIP ALM_SMS_APP_DISCONNECTED Critical SIP Call Control Application Call GSOC
Server (SMS Service)
Disconnected.
1073743109 40000505 SIP ALM_CC_APP_NOT_RESPONDING Major SIP Call Control Application Call GSOC
Server (MSC Service) not responding.

Operations and Maintenance Manual Page 174


Proprietary and Confidential Application Agent Alarms

Dec ID Hex. ID Module Alarm Name Severity Description Action

1073743110 40000506 SIP ALM_REG_APP_NOT_RESPONDING Major SIP Call Control Application (VLR Call GSOC
Server Service) not responding.
1073743111 40000507 SIP ALM_SMS_APP_NOT_RESPONDING Major SIP Call Control Application Call GSOC
Server (SMS Service) not responding.
1073743112 40000508 SIP ALM_MG_NOT_RESPONDING Major Media Gateway not responding. Call GSOC
Server

1.2.4 Billing Alarms


Table 49 lists and describes the Billing Module alarms.
Table 49: Billing Module Alarms

Dec ID Hex ID Module Alarm Name Severity Description Action

19595315 12B0033 BILLING ALM_BIL_RMV_FIL Critical Failure to remove CDR If the alarm does not drop after
E_FAILURE file/directory. 24 hours, call GSOC.
19595426 12B00A2 BILLING ALM_BIL_FAIL_WR Critical Billing fails to write CDRs to file. Call GSOC.
ITE_CDR_2_FILE No CDRs written to file system.

Operations and Maintenance Manual Page 175


Proprietary and Confidential Application Agent Alarms

1.2.5 Capture Module Alarms


Table 50 lists and describes the Capture Module alarms.
Table 50: Capture Module Alarms

Dec ID Hex ID Module Alarm Name Severity Description Action

19595312 12B0030 CAPTURE ALM_CAP_MAX_NON_MSG Critical Capture communication issues with Wait three minutes for alarm to
_REPLY other processes. drop and restore connectivity.
If the alarm does not drop, call
GSOC.
19595438 12B00AE CAPTURE ALM_CAP_COMMUNITY_G Critical Capture community gate map is Call GSOC.
ATE_MAP_MISSING missing in Capture Service table.
19595460 12B00C4 CAPTURE ALM_CAP_BILLING_DOWN Major Cannot create Billing CDR, because Call GSOC.
_WHEN_REQUIRED Billing process is down.
19595474 12B00D2 CAPTURE ALM_CAP_FAILED_CREAT Major Capture failed to create Capture Call GSOC.
E_CAP_EVT_LOG_DIR Event.

1.2.6 CDI Module Alarms


Table 51 lists and describes the CDI module alarms.
Table 51: CDI Module Alarms

Dec ID Hex ID Module Alarm Name Severity Description Action

19595267 12B0003 CDI ERR_CDI_CAR_DEV_INT_MPA Critical No files were delivered to the output directory Call GSOC
_FILE_TIME during the specified period.
19595268 12B0004 CDI ERR_CDI_CAR_DEV_INT_FLO Critical The CDI process is in the FLOW CONTROL state. Call GSOC.
W_CONTROL

Operations and Maintenance Manual Page 176


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Alarm Name Severity Description Action

19595436 12B00AC CDI ALM_CDI_CAR_DEV_INT_TOO Major Number of probe files in the input folder exceeded Check probe.
_MANY_FILES the maximum number allowed. Call GSOC.

1.2.7 CDR Module Alarms


Table 52 lists and describes the CDR Module alarms.
Table 52: CDR Module Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595298 12B0022 CDR-Collector Minor ALM_BAD_CDR CDRFWD: There is at least one Call GSOC.
invalid CDR.
19595269 12B0005 CDR-FORWARD Major ERR_CDR_FORWARD_AUDIT_RSP Received negative response for Call GSOC.
NS_GOT_NACK audit request for specified date.
19595428 12B00A4 CDR-FORWARD Major ALM_COMPRESS_FAILURE Failed to compress CDRs into Call GSOC.
CDR message, with reason.
19595429 12B00A5 CDR-FORWARD Major ALM_UNCOMPRESS_FAILURE Failed to decompress CDR Call GSOC.
message, with reason.

Operations and Maintenance Manual Page 177


Proprietary and Confidential Application Agent Alarms

1.2.8 Diameter Alarms


Table 53 lists and describes the Diameter Alarms.
Table 53: Diameter Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595468 12B00CC DIAM Critical ALM_DIAMETER_CLIENTS_ All connections to the Diameter clients Call GSOC
DOWN were lost. All Diameter services have
been disabled.
19595469 12B00CD DIAM Major ALM_DIAMETER_CCR_INIT_ Diameter: CCR INIT for Client ID, Call GSOC
FOR_NOT_CONNECTED_CL which is not connected.
IENT
19595470 12B00CE DIAM Minor ALM_DIAMETER_LACK_OF_ There was no activity in the specified Check the connections between the
ACTIVITY period (in seconds). Diameter server and the clients.
19595478 12B00D6 DIAM Critical ALM_DIAMETER_MSG_COP DIAMETER: CLR or ULR, with Check configuration of the Diameter
Y_ARRIVED unexpected message copy arrived. Edge Agent.
It may be sending incorrect requests
to the Diameter agent
19595479 12B00D7 DIAM Critical ALM_DIAMETER_JMS_FULL DIAMETER: Number of messages in Call GSOC
_QUEUE '%s' JMS queue reached the maximum
allowed
19595522 12B0102 DIAM Critical ALM_REQ_RES_COUNTER_ DA: Even though are incoming msgs check DA logs for internal error or
SET from network, no outgoing messages check if client/network all is good
to network from DA
19595523 12B0103 DIAM Critical ALM_TO_MANY_3002_ERR DA: All requests coming back with check for network issues or internal
ORS 3002 error in the DA stack
19595524 12B0104 ECDM Critical ALM_ECDM_SLOW_CONSU ECDM: client (consumer) pull check client connection and
MER messages slowly configuration

Operations and Maintenance Manual Page 178


Proprietary and Confidential Application Agent Alarms

1.2.9 Performance Alarms


Table 54 lists performance alarms.
Table 54: Performance Alarms

DEC ID HEX ID Severity Alarm Name Description Action

19595320 12B0038 Critical ALM_ALM_ALARM_MISSED ALM: The alarm (%1) was not configured at Call GSOC
ALARM_MAP table.
19595425 12B00A1 Major ALM_SAVE_PERFv_COUNTERS_TO_D PERF: Failed To Store The Performance-Counters in Call GSOC
ATA_BASE the Data-Base (Process Name = %1).

Operations and Maintenance Manual Page 179


Proprietary and Confidential Application Agent Alarms

1.2.10 Gateway Alarms


Table 55 lists and describes the Gateway alarms.
Table 55: Gateway Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595270 12B0006 GW Critical ERR_GTW_TRUNK_STATU Trunk Status was changed Check connectivity with
S on (AGSW#%1 TRUNK#%2). IntelliGate.
19595292 12B001C GW Critical ERR_GTW_CANT_FIND_AL Failure to establish Call GSOC
LOC connection to the specified
process.
19595384 12B0078 GW Critical ALM_IN_GW_INTERFACE_ Gateway is out of resources Call GSOC
NO_RESOURCE and cannot handle incoming
IN requests. New requests
are rejected
19595392 12B0080 GW Critical ALM_GW_GWNOTIF_MGR The queue of requests for Call GSOC
_POST_GW_NOTIFICATIO sending SMS is full. SMS is
N_JOB not sent.
19595453 12B00BD GATEWAY Major ALM_GW_CLI_DLVRY_MT_ Total failed CLI Delivery MT Call GSOC
THRESHOLD Calls exceeded the specified
number out of successful
calls.
19595399 12B0087 GATEWAY, Major ALM_DB_CHANGE_CAN_N DB change in specified table Call GSOC.
SHINA (v3), OT_REFRESH field not refreshed. Process
DIAMETER not restarted for changes to
(v3) take effect.

Operations and Maintenance Manual Page 180


Proprietary and Confidential Application Agent Alarms

1.2.11 Load Balancer Alarms


Table 56 lists and describes the Load Balancer alarms.
Table 56: Load Balancer Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595411 12B0093 Load Critical ALM_LB_SMS_UNAVAILABLE_ The specified number of notification Call GSOC.
Balancer CRITICAL services is not available for sending
SM to SMSC.
19595412 12B0094 Load Major ALM_LB_SMS_UNAVAILABLE_ The specified number of notification Call GSOC.
Balancer MAJOR services is not available for sending
SM to SMSC.
19595413 12B0095 Load Minor ALM_LB_SMS_UNAVAILABLE_ The specified number of notification Call GSOC.
Balancer MINOR services is not available for sending
SM to SMSC.
19595414 12B0096 Load Warning ALM_LB_SMS_UNAVAILABLE_I The specified number of notification
Balancer NFO services is not available for sending
SM to SMSC.

Operations and Maintenance Manual Page 181


Proprietary and Confidential Application Agent Alarms

1.2.12 MAP Interface Alarms


Table 57 lists and describes the MAP Interface alarms.
Table 57: MAP Interface Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595299 12B0023 MI Critical ALM_MI_LOST_CONN_TO_C The connection between the Call GSOC.
CS MAP Interface and the Signaling
Gateway Unit (SGU) was lost.

Operations and Maintenance Manual Page 182


Proprietary and Confidential Application Agent Alarms

1.2.13 MAP Probe Interface (MPI) Alarms


Table 58 lists and describes the MAP Probe Interface (MPI) alarms.
Table 58: MAP Probe Interface (MPI) Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595271 12B0007 MPI Critical ERR_BM_ALARM_HNDLR_ The specified probe failed Check connectivity
CONNECT_ALRM to establish a connection with IntelliGate.
with the MPI. If probe supplied by
TOMIA, call GSOC.
19595272 12B0008 MPI Critical ERR_PROBE_INTRFC_EM No records found for Check probe,
PTY_BLOCK processing. transmission and SS7
link.
Call GSOC
19595273 12B0009 MPI Critical ERR_SERVER_NOT_ENOU There are not enough Check probe.
GH_CONNECTIONS connections available to Call GSOC.
the Agilent server.
19595303 12B0027 MPI Critical ALM_TELESOFT_CLIENT_ There was a failure to Check probe and
NO_CONNECTION initialize the connection connection.
with the Telesoft probe Call GSOC.
(TDR socket).
19595304 12B0028 MPI Critical ALM_TELESOFT_CLIENT_ There was a failure to Check probe and
NO_TRAP_CONNECTION initialize the connection connection.
with the Telesoft probe. Call GSOC.
(TRAP socket).
19595305 12B0029 MPI Major ALM_TELESOFT_PARSER There is a non- Check probe and
_NON_OPERNATIONAL_LI operational Telesoft link. connection.
NK Call GSOC.

Operations and Maintenance Manual Page 183


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595306 12B002A MPI Major ALM_TELESOFT_CLIENT_ No TDRs were received Check probe
EXCEED_MAX_TDR_INTE for the maximum functionality.
RVAL configured TDR interval. Call GSOC.
19595307 12B002B MPI ALM_TELESOFT_PARSER Out of resources. Probe Check the MPI
_THREAD_POOL_POST_J transaction cannot be configuration. Call
OB_FAILED handled. GSOC.
19595308 12B002C MPI Critical ALM_TELESOFT_PARSER The SMS service was Check encryption
_THREAD_POOL_SERVIC disabled because configuration and
E_DISABLE encryption is not device.
functioning. Call GSOC.
19595324 12B003C MPI Major ALM_TELESOFT_PARSER Telesoft error: Check probe and
_TELESOFT_SOCKET There is congestion in the connection.
socket between the Call GSOC.
Telesoft process and the
probe.
19595388 12B007C MPI Major ALM_TELESOFT_PARSER Telesoft management Call GSOC.
_MANAGMENT_SOCKET socket trap.
19595396 12B0084 MPI Major ALM_TELESOFT_PARSER Telesoft MSU error. Call GSOC.
_MSU_ERROR
19595397 12B0085 MPI Major ALM_TELESOFT_PARSER Telesoft error: Call GSOC.
_MSU_SOCKET_OOS MSU socket alarm OOS.
19595401 12B0089 MPI Critical ALM_PROBE_CLIENT_NO_ Telesoft Check probe and
CONNECTION There was a failure to connection.
establish a connection Call GSOC.
with specified probe (TDR
socket).
19595415 12B0097 MPI Major ALM_TELESOFT_PARSER MPI: Telesoft error: Call GSOC.
_RX_OVERFLOW RX Overflow.

Operations and Maintenance Manual Page 184


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595416 12B0098 MPI Major ALM_TELESOFT_PARSER Telesoft error: Call GSOC.


_NO_SPACE Out of Space.
19595427 12B00A3 MPI Major ALM_TELESOFT_CARD_S Telesoft error: There was Call GSOC.
OCKET_ERR a hardware failure in one
(or more) of the probe
cards.
19595434 12B00AA MPI Critical ALM_DATACAST_SERVER DataCast Check probe.
_NOT_ENOUGH_CONNEC There are not enough Call GSOC.
TIONS connections to the
DataCast probe or no
connections at all.
19595435 12B00AB MPI Critical ALM_AGILENT_CDR_BUIL There were no files Check probe.
DER_READER_FILE_TIME transferred to the Reader Call GSOC.
directory during the
specified period.
19595446 12B00B6 MPI Critical ALM_INET_PROBE_INTER There were no files Check probe.
FACE_FILE_TIME transferred to the input Call GSOC.
directory during the
specified period.
19595451 12B00BB MPI Critical ALM_NET_TEST_READER There were no files Check probe.
_FILE_TIME transferred to the Net Call GSOC.
Test Reader directory
during the specified
period.
19595452 12B00BC MPI Warning ALM_SHINA_WRONG_MES SHINA component:
SAGES_FROM_GW Incorrect messages were
received from the
specified gateway.

Operations and Maintenance Manual Page 185


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595458 12B00C2 MPI Major ALM_TELESOFT_PARSER Telesoft error: Call GSOC.


_LAG_BUFF_DISCARD Telesoft Trap Lag Buffer
Discard.
19595472 12B00D0 MPI Warning ALM_MPI_BUILD_RECORD There was a failure to
S_FAILURE compose the specified
records during the
specified period.
19595473 12B00D1 MPI Critical ALM_MPI_JMS_CONNECTI There was a failure to Call GSOC.
ON_FAILURE connect with the JMS
brokers.
19595475 12B00D3 MPI Warning ALM_NETTEST_MAX_INPU The maximum allowed
T_TDR_FILES_EXCEEDED number of files for
processing in the input
directory has been
exceeded.
19595476 12B00D4 MPI Warning ALM_INET_MAX_INPUT_T MPI: Inet client.
DR_FILES_EXCEEDED The maximum allowed
number of files for
processing in the input
directory has been
exceeded.
19595477 12B00D5 MPI Critical ALM_MPI_JMS_FULL_QUE MPI: The number of
UE messages in the JMS
queue has reached the
maximum allowed value
[%1]

Operations and Maintenance Manual Page 186


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595480 12B00D8 MPI Warning ALM_SHPROBE_MAX_INP MPI: TOMIA probe client.


UT_TDR_FILES_EXCEEDE The maximum allowed
D number of files for
processing in the input
directory has been
exceeded.
19595494 12B00E6 MPI Warning ALM_LNX_TEKELEC_MAX MPI: The maximum Contact GSOC.
_INPUT_TDR_FILES_EXCE allowed of maximum files
EDED processing in the input
directory has been
exceeded.
19595495 12B00E7 MPI Critical ALM_LNX_TEKELEC_REA MPI: No files were Contact GSOC.
DER_FILE_TIME delivered to the MPI
directory during the
specified period.
19595502 12B00EE MPI Critical ALM_PROBE_JMS_NO_CO Raised when connection Contact GSOC
NNECTION to JMS broker is lost.
Cleared on connection to
JMS broker
(Relevant for probe
supplied by TOMIA only)
19595503 12B00EF MPI Critical ALM_PROBE_JMS_CLIENT Raised when connection Contact GSOC
_DISCONNECTED to JMS broker is lost.
Raised when receiving
event from AMQ that
client is disconnected.
Cleared on connection to
client.
(Relevant for probe
supplied by TOMIA only)

Operations and Maintenance Manual Page 187


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595504 12B00F0 MPI Critical ALM_PROBE_JMS_NO_TR Raised for no traffic in a Contact GSOC
AFFIC_IN_MAX_INTERVAL predefined interval.
The interval is configured
in the MPI.INI
configuration file
parameter::NO_RECORD
S_TIMEOUT
19595505 12B00F1 MPI Critical ALM_JDSU_MAX_INPUT_T Maximum files for Contact GSOC
DR_FILES_EXCEEDED processing in input
directory exceeded.
19595506 12B00F2 MPI Critical ALM_JDSU_READER_FILE No files were delivered to Contact GSOC
_TIME the Reader directory
during the specified
period.

Operations and Maintenance Manual Page 188


Proprietary and Confidential Application Agent Alarms

1.2.14 Network Trigger Originator (NETO) Alarms


Table 59 lists and describes the NETO alarms.
Table 59: NETO Alarms

Dec ID Hex. ID Module Severity Alarm Name Description Action

19595431 12B00A7 NETO Major ALM_NETO_LACK_OF_ Lack of activity for the Check MPI and probe. Call GSOC.
ACTIVITY specified number of
seconds.
19595432 12B00A8 NETO Major ALM_NETO_ERR_ISD_ The ISD ACK percentage Compare message parameters of
ACK_THRESHOLD error per VLR, out of all SA ISDs leaving the SGU. Check
ISD sent, exceeded the message with error against
specified number. successful message.
19595433 12B00A9 NETO Minor ALM_NETO_ERR_ACK_ The ISD ACK percentage Compare message parameters of
VS_SUCCESS_ACK_TH error, out of all SA ISD ISDs leaving the SGU. Check
RESHOLD sent exceeded the message with error against
specified number. successful message.

Operations and Maintenance Manual Page 189


Proprietary and Confidential Application Agent Alarms

1.2.15 Notification Alarms


Table 60 lists and describes the Notification alarms.

Note: Not all these alarms may apply to your specific system architecture. Refer to TOMIA Technical Support.
Table 60: Notification Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595274 12B000A NOTIF Critical ERR_NTF_SHORT_MSG_I Bind Failure. Try to ping the SMSC.
NTF_BIND_RESPOND_FAI No response received Try to telnet the SMSC address and port.
LER after three consecutive
If ping or Telnet fails, contact your IP Network
Bind requests.
administrator.
If ping and Telnet succeed, check the SMSC
thoroughly (look for changes in external interface
and firewall).
Call GSOC.
19595275 12B000B NOTIF Critical ERR_NTF_SHORT_MSG_I No response received Try to ping the SMSC.
NTF_UCP_OPEN_SESSIO after three consecutive Try to telnet the SMSC address and port.
N_RESP UCP Open Session
If ping or Telnet fails, contact your IP Network
requests.
administrator.
If ping and Telnet succeed, check the SMSC
thoroughly (look for changes in external interface
and firewall).
Call GSOC.
19595276 12B000C NOTIF Critical ERR_NTF_SHORT_MSG_I No response received If routing of network MSCs is functioning correctly,
NTF_SUBMIT_SM_RESPO after three consecutive call GSOC.
ND_FAILER Submit SM requests.

Operations and Maintenance Manual Page 190


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595277 12B000D NOTIF Critical ERR_NTF_SHORT_MSG_I Bind rejection. There Try to ping the SMSC.
NTF_BIND_RESPOND_ER was a failure to bind to Try to telnet the SMSC address and port.
ROR the SMSC after three
If ping or Telnet fails, contact your IP Network
consecutive attempts.
administrator.
If ping and Telnet succeed, check the SMSC
thoroughly (look for changes in external interface
and firewall).
Call GSOC.
19595278 12B000E NOTIF Critical ERR_NTF_SHORT_MSG_I There was a failure to Try to ping the SMSC.
NTF_CONNECT open the connection Try to telnet the SMSC address and port.
between the
If ping or Telnet fails, contact your IP Network
Notification module and
administrator.
the SMSC
If ping and Telnet succeed, check the SMSC
(TCP/IP level).
thoroughly (look for changes in external interface
and firewall).
Call GSOC.
19595279 12B000F NOTIF Critical ERR_NTF_SHORT_MSG_I The UCP failed to open Try to ping the SMSC.
NTF_UCP_OPEN_SESS_F a session with the Try to telnet the SMSC address and port.
AILER SMSC after three
If ping or Telnet fails, contact your IP Network
consecutive attempts.
administrator.
If ping and Telnet succeed, check the SMSC
thoroughly (look for changes in external interface
and firewall).
Call GSOC.
19595281 12B0011 NOTIF Critical ERR_WWARE_X25_OPEN There was a failure to Call GSOC.
_CONNECT open the X25
connection.

Operations and Maintenance Manual Page 191


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595282 12B0012 NOTIF Critical ERR_WWARE_X25_MAKE There was a failure to Check transmission.
_CALL call via X25. Call GSOC.
19595291 12B001B NOTIF Major ERR_NTF_SHORT_MSG_I Greeting message not Check SMSC configuration. Call GSOC.
NTF_GREETING_NOT_RE received.
CEIVED
19595294 12B001E NOTIF Major ALM_NTF_PROGENY_EN An error of the Call GSOC.
GINE_SOCK_ERROR specified type occurred
on the socket.
19595311 12B002F NOTIF Critical ALM_GENERIC_UCMP_C There was a failure to Wait several minutes for alarm to drop and
ONNECTION_FAILURE open the UCMP restoration of connectivity.
connection with the If alarm does not drop, call GSOC.
specified destination.
19595323 12B003B NOTIF Critical ALM_SMS_FAILED_TO_C There was a failure to Call GSOC.
ONNECT_TO_ORS connect with the
Outgoing (XML)
Request Server.
19595443 12B00B3 NOTIF Critical ALM_NTF_FAIL_ADD_DB_ There was a failure to Call GSOC.
RTI_INTERFACE add DB to Roaming
Tariff (RTI) interface.
19595444 12B00B4 NOTIF Critical ALM_NTF_FAIL_RMV_DB_ There was a failure to Call GSOC.
RTI_INTERFACE remove DB from the
Roaming Tariff (RTI)
interface.
19595445 12B00B5 NOTIF Critical ALM_NTF_FAIL_QUERY_ There was a failure in Call GSOC.
RTI_INTERFACE the Send request to the
Roaming Tariff (RTI)
interface.

Operations and Maintenance Manual Page 192


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595456 12B00C0 NOTIF Critical ALM_SMSC_CCS_NO_AC There were no Call GSOC.


TIVITY requests transferred
from the
SMSC/Signaling
gateway Unit (SGU)
during the specified
period.
19595461 12B00C5 NOTIF Major ALM_NTF_BILLING_DOW The Billing process is Call GSOC.
N_WHEN_REQUIRED down. A Billing CDR
cannot be created.
19595462 12B00C6 NOTIF Critical ALM_NTF_FAIL_CONNEC There was a failure to Check configured URL of SMTP server.
T_EMAIL_SERVER connect to the Try to connect manually with the SMTP server.
specified email server.
19595463 12B00C7 NOTIF Critical ALM_NTF_EMAIL_MAX_E Errors occurred during Check the SMTP server configuration.
RR_REACHED a Send email request. Check for errors in the log.
The alarm is raised
when the count for
number of errors
reaches a predefined
threshold.
This alarm is raised per
email server.
19595464 12B00C8 NOTIF Critical ALM_NTF_FAIL_DISCONN There was a failure to Check SMTP server configuration. Check for errors
ECT_EMAIL_SERVER disconnect from the in the log.
email server: '%1
19595471 12B00CF NOTIF Critical ALM_EXT_SYS_NO_ACTI There were no Call GSOC.
VITY requests transferred to
the external system
during the specified
period.

Operations and Maintenance Manual Page 193


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

33554802 2000172 NOTIF Critical NM_CONNECTION_FAILE NM failed to connect to Call GSOC.


D SMSC ?

1.2.16 Service Broker (Proxy) Alarms


Table 61 lists and describes the Service Broker alarms.
Table 61: Service Broker (Proxy) Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595439 12B00AF PROXY Critical ALM_VSSP_MAX_PO Number of Jobs exceeded, Check machine CPU resources.
ST_JOB_FAIL_EXCEE failed in post job. Stress rate of IGU may be too
DED high.
19595440 12B00B0 PROXY Major ALM_VSSP_LACK_OF Lack of activity for Call GSOC.
_ACTIVITY specified period (in sec)
19595441 12B00B1 PROXY Info ALM_PROXY_MAX_O Maximum number of Check the SGU. Check the
PEN_SESSION_EXCE allowed sessions connection between the SGU
EDED exceeded. and the SSP.
19595466 12B00CA PROXY Critical ALM_VCSCF_MAX_P Number of Jobs exceeded, Check machine CPU resources.
OST_JOB_FAIL_EXCE failed in post job Stress rate of IGU may be too
EDED high.
19595484 12B00DC PROXY Major ALM_EXTERNAL_VSS No response for %d times Check if the External SCP is up.
P_DOES_NOT_RESP with SCP (GT %s)
OND_TO_IDP

Operations and Maintenance Manual Page 194


Proprietary and Confidential Application Agent Alarms

1.2.17 TOMIA IN Adapter Alarms


Table 62 lists and describes the TOMIA IN Adapter alarms.
Table 62: TOMIA IN Adapter Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595390 12B007E SHINA Critical ALM_CCS_IN_LOST_C The connection with the Reset all existing sessions.
ONNECTION Signaling Gateway Unit (SGU) Check the SGU.
has been lost.
19595391 12B007F SHINA Major ALM_SHINA_LACK_OF_ There has been a lack of Check the connection
ACTIVITY activity for the specified period between the SGU and the
(in seconds). SSP.
19595402 12B008A SHINA Critical ALM_SHINA_MAX_POS The system is out of Check the resources on the
T_JOB_FAIL_EXCEEDE resources. CPU machine.
D The stress rate of IGU may be
too high.
19595410 12B0092 SHINA Info ALM_CCS_IN_ACTIVE_ The SGU IN reached the traffic Check the trace level of the
PAUSE capacity limit. Existing SGU. Check the traffic
sessions will continue to ba capacity of the network.
handled, but new sessions will
be rejected.
19595418 12B009A SHINA Major ALM_UN_MATCH_FCI_ The system cannot send FCI Call GSOC.
BCSM_REQ on BSCM request.
19595419 12B009B SHINA Critical ALM_SHINA_CLIENTS_ All connections to the SHINA Call GSOC.
DOWN clients have been lost.
All IN Services have been
disabled.

Operations and Maintenance Manual Page 195


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595450 12B00BA SHINA Major ALM_SHINA_IDP_FOR_ IDP for a client (Client ID) that Call GSOC.
NOT_CONNECTED_CLI is not connected.
ENT

1.2.18 UIR Alarms


Table 63: UIR Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

19595289 12B0019 UIR Critical ERR_SIR_MNGR_PURGE_FETC Failure in fetching purge parameters. Call GSOC.
H Continue without purge handling.
19595313 12B0031 UIR Critical ALM_UIR_DELETED_JOBS Message Queue is overloaded Call GSOC.
(specified number of messages).
19595430 12B00A6 UIR Minor ALM_UIR_TABLE_EXCEED_MAX UIR table exceeds maximum capacity. Call GSOC.
_USERS
19595449 12B00B9 UIR Critical ALM_UIR_DB_ACCESS_FAILURE UIR: DB Failure in access to <%s> Contact the GSOC.
Table

Operations and Maintenance Manual Page 196


Proprietary and Confidential Application Agent Alarms

1.2.19 Service Module Infrastructure Alarms


Table 64 lists and describes the service module Infrastructure alarms.
Table 64: Service Module Infrastructure Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

33554432 2000000 N/A Critical FAILED_TO_CONNECT_TO_THE_D One of the system components failed Check the status of
ATABASE_ID to connect to the database. the database.
33554433 2000001 ITCM Critical ITCM_EXCEEDED_NUMBER_OF_C The ITCM (Integrated Trigger Context Check the status of
ONTEXTS_ID Manager) alerts that there are too the Oracle service.
many contexts in the system
33554434 2000002 ITCM Critical ITCM_THREADPOOL_QUEUE_OVE The ITCM alerts that its processor is Call GSOC.
RLOAD_ID overloaded.
33554437 2000005 MON Cleared KEEP_ALIVE_ID Keep Alive alarm informs the OSS No action required.
Keep Alive Trap (for CCCOOOOO_ that the system is up and running.
P[P0…P9]).
33554442 200000A UIR Major UIR_TIME_OUT_ID The UIR (User Information Relay the request.
Repository) does not provide the
requested information on time (times
out).
33554447 200000F UMCL Critical UMCL_MAX_QUEUE_SIZE_ID The UMCL alarm’s top watermark Call GSOC.
reached.
UMCL manages communication
between modules.
33554435 2000003 OTA Critical No connection to ?, where ? = Integration framework adapter has Call GSOC
XMLRpc: lost connection with RPC
No connection to OTA SERVER
[server name]

Operations and Maintenance Manual Page 197


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

33554450 2000012 MON Warning RESTART_ID This alarm is raised after IntelliGate Refer to logs
(?#). After? more restarts a failover will restarts.
occur. Application does not drop alarm.
33554451 2000013 N/A Critical FAILOVER ID The alarm is raised after IntelliGate Call GSOC.
IntelliGate performed a failover! starts up on standby node.

33554671 20000EF Integration Critical NO_CONNECTION_NOT_CRITICAL_ Connection lost with specified Call GSOC
Module ID module.
Connection lost to?
33554755 2000143 CLOVER_MO Major CLOVER_ETL_HANDSETS_PROCES CloverETL Handsets process failed: ? No action required.
DULE S
33554798 200016E INFRA Critical JMS_NO_CONSUMERS_LEFT_FOR JMS No Consumers Left For Check that all
Connection ? entities in the
system are up and
running
33554799 200016F INFRA Critical JMS_NO_PRODUCER_LEFT_FOR JMS No Producers Left For Check that all
Connection ? entities in the
system are up and
running
33554800 2000170 INFRA Major JMS_QUE_LEVEL_IS_UP_TO JMS Queue Capacity Level is ? For Check the
QUE/TOPIC ? performance of the
consumer of this
Queue.
33554801 2000171 INFRA Critical REDIS_CLUSTER_IN_FAULT_STATE The Redis Cluster is in Fault State Check Redis
Cluster Nodes
33554808 2000178 INFRA Critical REDIS_OUT_OF_MEMORY The Redis is out of memory!! need to No action required.
allocate more memory

Operations and Maintenance Manual Page 198


Proprietary and Confidential Application Agent Alarms

Dec. ID Hex. ID Module Severity Alarm Name Description Action

33554809 2000179 INFRA Major REDIS_HIGH_MEMORY_USAGE Redis memory is in high usage. need No action required.
to allocate more memory
33554811 200017B INFRA Major JMS_NUM_OF_EXPIRED_MSGS_EX JMS the number of expired messages No action required.
CEEDED_THRESHOLD exceeded the threshold for connection
?
33554812 200017C INFRA Critical SOAP_DB_CONNECTION SOAP-EI failed to connect to the No action required.
database ? (?).
33554813 200017D INFRA Critical SOAP_CONNECTION_STATUS The SOAP-EI connection pool with No action required.
the SOAP Access-GW had fluctuated
more than ? times during the last ?
minutes (?).
33554814 200017E INFRA Critical SOAP_ERRED_TRANSACTION The SOAP-EI received at least ? No action required.
SOAP transactions errors in the last ?
minutes. SOAP RC = ? (?).
33554815 200017F INFRA Critical SOAP_TIMEDOUT_TRANSACTION The SOAP-EI didn’t receive a reply for No action required.
at least ? SOAP transactions
submissions in the last ? minutes (?).
33554816 2000180 INFRA Critical SOAP_FAILED_TRANSACTION The SOAP-EI failed to send at least ? No action required.
transactions in the last ? minutes.
SOAP RC = ? (?).
33554817 2000181 INFRA Critical NO_SOAP_CONNECTIONS There are no active connections with No action required.
the SOAP Access-GW. (?)
33554836 2000194 INFRA Critical NO_CONNECTION_ALL_DA No connection to all DAs No action required.

Operations and Maintenance Manual Page 199


Proprietary and Confidential Application Agent Alarms

1.2.20 GTP Agent Alarms


Table 65 lists and describes the GTP Agent alarms.
Table 65: GTP Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595507 12B00F1 GTP AGENT Critical ALM_GTP_AGENT_CLIENT_N GTP Agent Lost Connection to check client connection
O_CONNECTION Client and configuration
19595508 12B00F2 GTP AGENT Critical ALM_GTP_AGENT_T1_EXPIR GTP Agent: timer %s expired check client connection
ED [%d] times, max timer and configuration
expiration set to [%d]
19595509 12B00F3 GTP AGENT Critical ALM_GTP_AGENT_T2_FROM_ GTP Agent: timer %s expired check client connection
SGSN_EXPIRED [%d] times, max timer and configuration
expiration set to [%d]
19595510 12B00F4 GTP AGENT Critical ALM_GTP_AGENT_T2_FROM_ GTP Agent: timer %s expired check client connection
GGSN_EXPIRED [%d] times, max timer and configuration
expiration set to [%d]
19595511 12B00F5 GTP AGENT Critical ALM_GTP_AGENT_T3_EXPIR GTP Agent: timer %s expired check client connection
ED [%d] times, max timer and configuration
expiration set to [%d]
19595512 12B00F8 ttttt tttt ALM_MAX_INPUT_TDR_FILES MPI: %1 client. Maximum files ttttt
_EXCEEDED for processing in input directory
exceeded.
19595513 12B00F9 MPI tttttt ALM_READER_FILE_TIME MPI: No files were delivered to No action
the %1 Reader directory during
the specified period.
19595514 12B00FA GTP AGENT Critical ALM_GTP_AGENT_T4_EXPIR GTP Agent: timer %s expired check client connection
ED [%d] times, max timer and configuration
expiration set to [%d]

Operations and Maintenance Manual Page 200


Proprietary and Confidential Application Agent Alarms

Dec ID Hex ID Module Severity Alarm Name Description Action

19595515 12B00FB ttttt tttttt ALM_COMPRESSION_FAILED MPI: Compression FAILED for ttttt
file [%s]
19595516 12B00FC ttttt tttttt ALM_DECOMPRESSION_FAIL MPI: Decompression FAILED ttttt
ED for file [%s]
19595517 12B00FD ttttt tttttt ALM_COMMPROVE_CLIENT_ MPI: Failed to initialize ttttt
NO_CONNECTION connection with Commprove
Probe
19595518 12B00FE ttttt tttttt ALM_AXIAL_SERVER_NOT_E MPI: Not Enough connections ttttt
NOUGH_CONNECTIONS to the Axial Probe
19595519 12B00FF MPI Minor ALM_FIRST_CAPACITY_THRE MPI: FIRST incoming capacity Check capacity
SHOLD_MET Threshold reached
19595520 12B0100 MPI Major ALM_SECOND_CAPACITY_TH MPI: SECOND incoming Check capacity
RESHOLD_MET capacity Threshold reached!!
19595521 12B0101 MPI Critical ALM_LAST_CAPACITY_THRE MPI: LAST incoming capacity Check capacity
SHOLD_MET Threshold reached!!
19595798 12B0216 ALM Minor ALM_DIAMETER_CLIENT_LAT Diameter: Client is not Contact the GSOC.
E_RESPONDING responding for more than
configured consecutive times

Operations and Maintenance Manual Page 201


Proprietary and Confidential Network Interface Module System Log Files

2. Network Interface Module System Log Files


The IntelliGate network interface (and platform core) generates the following types of log files.
* Error log file: Contains all the system errors. The file provides information about the source of the error,
the time it occurred and enables the tracking of problems through the system.
Normally, the log files reside in the Debug directory.
The full path to the directory is determined during system installation.

2.1.1 FEP and MAU Error Log Structure


Table 66 lists and describes the structure of the FEP and MAU error log.
Table 66: FEP and MAU Error Log Structure

Parameter Example

Date and Time (µ-second resolution) 21/10/2007 05:25:48.341372


Process ID 5259
Thread ID 3076389088
Severity S_MAJOR
Source code file from where the error originated MultiKeyDma.cpp
Line in the source code that caused the error 33
Error description The Rule Gt=306949999997x., Ssn=8, Np=1, T
t=300 already exists - ignore it

Operations and Maintenance Manual Page 202


Proprietary and Confidential Service Module System Log Files

3. Service Module System Log Files


The Gateway Location Register service module, system log environment, generates several types of log files.
The log files have different purposes, but all monitor the current state and health of the system, and enable
effective troubleshooting when necessary.
The log files produced by the service module are as follows:
* alarmer.txt: A history of all the traps sent to the OSS. Refer to Section 3.1.3
* wrapper.log: A history of system startup and shutdown. Refer to Section 3.1.4.
* cluster.log: (For redundant systems only) A list of cluster-related events. Refer to Section 3.1.6.
* redundancy_log.txt: (For redundant systems only) A list of events relating to redundancy activities. Refer
to Section 3.1.6.
* db_log.txt: A list of database events, used by TOMIA personnel in advanced debugging
* dmp_log.txt: A history of errors generated by the service module, used by TOMIA personnel in advanced
debugging
* stack_log.txt: A file used by TOMIA personnel in advanced debugging. Refer to Section 3.1.7.
* error_log.txt: A list of errors produced by the service module, used by TOMIA personnel in advanced
debugging. Refer to Section 3.1.5.

3.1 Gateway Location Register Service Logging Environment


TOMIA uses the Log4j utility to configure and manage service module:
* Error and alarm logs
* SDRs (not described in this document)
* Counters (not described in this document)
Log4j is an open source tool developed to put log statements into an application. Its speed and flexibility
allows log statements to remain in shipped code, while enabling logging at runtime without modifying the
application binary. The file structure can vary, based on the different professional services offered by TOMIA.
The main concepts of Log4j are as follows:

Operations and Maintenance Manual Page 203


Proprietary and Confidential Service Module System Log Files

* Public Class Logger: This logger is responsible for handling the majority of log operations.
* Public Interface Appender: This appender is responsible for controlling the output of log operations.
* Public Abstract Class Layout: This layout is responsible for formatting the output for the appender.
The Log4j package enables statements to remain in shipped code without incurring a loss in performance.
The user can control logging behavior via an editable a configuration file.
Logging equips the developer with detailed context for application failures. On the other hand, testing
provides quality assurance and confidence in the application. Logging and testing are complementary. When
logging is correctly used, it can prove to be an essential tool. For this reason, we recommend that the user not
change any logging parameters without the knowledge and assistance of TOMIA personnel.
Inheritance in loggers is an important feature of log4j. Using a logger hierarchy, the user can control the type
of log statements that are output. This helps to reduce the volume of logged output and minimize the cost of
logging.
Examples of logging are trace statements, dumping of structures and the familiar System.out.println or printf
debug statements. Log4j offers a hierarchical method of inserting logging statements within a Java program.
Multiple output formats and multiple levels of logging information are available.

3.1.1 Mechanism for Maintenance of Error and Alarm Log Files


All the error and alarm log files reside in the Logs directory. The full path to this directory is determined
during system installation.
Default location is: /starhome/igate/XXX-YYY-ZZZ/logs, where:
* XXX is the country name
* YYY is the Operator name
* ZZZ is the Service name
When the system boots up, it begins writing to each of the files. When the file reaches a maximum size of
10MB, or the system initializes after shutdown, the system proceeds as follows:
* Renames filename as filename01
* Creates new filename file.
* Renames filename01 as filename02.

Operations and Maintenance Manual Page 204


Proprietary and Confidential Service Module System Log Files

Therefore, filename contains the most recent data, followed historically by filename01. filename99 contains
the oldest data.
The system can store 100 files of each type. When filename98 becomes filename99, the contents of
filename99 are lost.

3.1.2 Log File Configuration


The application process takes the parameters for creating the log file from the Log4j file in the CONF
directory.

Note: Only TOMIA personnel may change configuration parameters.


All configuration data reproduced here is for information purposes only. TOMIA will not be responsible for
damage incurred due to unauthorized changes to these or any system configuration files.

Operations and Maintenance Manual Page 205


Proprietary and Confidential Service Module System Log Files

Table 67 lists and describes the log configuration parameters.


Table 67: Log Configuration Parameters

Parameter Name Description Value Default

File This parameter holds the file name. Filename.sfx


Append This parameter indicates whether a new file will be started on system start up or whether to append True/False True
to the existing file.
Threshold DEBUG/
MaxFileSize This parameter indicates the maximum file size. Size, in MB 10MB
MaxBackupIndex This parameter indicates the number of historical files to hold, from 1 to 99. Number 100
BufferedIO This parameter indicates whether the log entries will be written directly to the file, or written first to a True/False False
buffer, and then collectively to file.

3.1.3 Alarm Log (alarmer.txt) File


The alarm log is the history of traps that the system sent to the OSS. The name of the alarm file is
alarmer.txt. Each trap has a separate row in the log, with commas separating the fields of information.
To access the alarmer.txt file:
1. Navigate to the following directory:
/starhome/igate/xxx-yyy-zzz/logs>%
2. At the command line, type more alarmer.txt, and then press Enter.
Table 68 lists and describes the parameters of the alarm log.
Table 68: Alarm Log Structure

Parameter Description

Date and Time YYYY/MM/DD HH:MM:SS

Operations and Maintenance Manual Page 206


Proprietary and Confidential Service Module System Log Files

Parameter Description

Action and Cause Raise or drop of alarm, name or module/device


Severity Critical, Major, Minor or Warning
Trap Description Description of the trap

Operations and Maintenance Manual Page 207


Proprietary and Confidential Service Module System Log Files

3.1.4 Wrapper Log (wrapper.log) File


The wrapper is a watchdog that keeps all the processes running. If a process goes down, the wrapper sends
an alarm trap and then restarts the process.
The wrapper.log file lists the following actions of the wrapper:
* Detecting a terminating process
* Sending a trap
* Rebooting the Java process.
The following sample from a wrapper log includes line numbers (on the left) for informational purposes.
1. STATUS | wrapper | 2008/08/14 22:27:40 | INT trapped. Shutting down.
2. STATUS | wrapper | 2008/08/14 22:27:51 | <-- Wrapper Stopped
3. STATUS | wrapper | 2008/08/14 22:27:56 | --> Wrapper Started as Daemon
4. STATUS | wrapper | 2008/08/14 22:27:57 | Launching a JVM...
5. INFO | jvm 1 | 2008/08/14 22:27:57 | Wrapper (Version 3.2.3)
http://wrapper.tanukisoftware.org
6. INFO | jvm 1 | 2008/08/14 22:27:57 | Copyright 1999-2006 Tanuki Software, Inc. All Rights
Reserved.
7. INFO | jvm 1 | 2008/08/14 22:27:57 |
8. INFO | jvm 1 | 2008/08/14 22:27:58 | IG version: 2.5.0.00195 (23/06/2008 19:04:25)
9. INFO | jvm 1 | 2008/08/14 22:27:58 | JVM version: 1.5.0_08-b03
10. INFO | jvm 1 | 2008/08/14 22:28:04 | Oracle cache is 160M
11. INFO | jvm 1 | 2008/08/14 22:28:11 | IG is ready

Operations and Maintenance Manual Page 208


Proprietary and Confidential Service Module System Log Files

Table 81 lists and describes each line.


Table 69: Wrapper Log File Sample Description

Line Number Description

1 Wrapper detects terminating process and sends trap to OSS.


2 Wrapper stops.
3 Wrapper restarts as Daemon (running in the background)
4 Wrapper launches Java process.
5 Wrapper identifies itself for debugging purposes.
6 Wrapper software copyright
7
8 version
9 Java machine version
10 Declaration that Oracle is ready, and size of available memory cache.
11 Declaration that the service is up and running

3.1.5 Error Log (error_log.txt) File


The error_log.txt file contains a list of errors produced by the Service Module for advanced debugging
purposes.
To access the error_log.txt file:
1. Navigate to the following directory:
/starhome/igate/xxx-yyy-zzz/logs>%
2. At the command line, type more error_log.txt, and then press Enter.
3. Review the file for errors (search for the word “ERROR”).

Operations and Maintenance Manual Page 209


Proprietary and Confidential Service Module System Log Files

3.1.6 Redundancy Logs


These logs only exist in a system with redundant architecture, for example, a system with redundant
Application Units (APUs).
To access the redundancy log files:
1. Navigate to the following directory:
/starhome/igate/xxx-yyy-zzz/logs>%
2. Type more cluster.log at the command line, and press Enter:
Cluster.log: The monitoring script writes Cluster.log. The script logs the basic states of the cluster (start,
monitoring, halting). Review the log file to determine failovers.
The following sample log file records a transfer event in which the IntelliGate Service, IG, is moved to the
standby server.
Dec5 00:12:06|CCC-OOO-m-n-AAA|Monitor|-------------------
Dec5 00:12:06|CCC-OOO -m-n-AAA|Monitor|Starting IG monitor...
Dec5 00:12:06|CCC-OOO -m-n-AAA|Monitor|IG moved to standby node: CCC-OOO -m-n-aps
Dec5 00:12:19|CCC-OOO -m-n-AAA|Monitor|IG monitor is started
Dec5 00:12:19|CCC-OOO -m-n-AAA|Monitor|Starting to monitor...

Each line starts with a time stamp, followed by the server name, the monitor and a description of the
action.
The server name is composed of the following fields:
CCC-OOO-m-n-AAA
Where:
▪ CCC is the country code.
▪ OOO is the operator code.
▪ m is the site number.
▪ n is the server number.
▪ AAA is the server type: aps= application server (APU).

Operations and Maintenance Manual Page 210


Proprietary and Confidential Service Module System Log Files

3. Type more redundancy_log.txt at the command line, and press Enter:


redundancy_log.txt: Each time the IntelliGate system restarts; it logs the timestamp and removes old
timestamps that are outside of the failover time window. When the application initiates a failover, the
system empties this file. Review the file and look for new entries.
For example:
2012-12-10 08:03:31,319 INFO [com.starhome.resources.monitoring.redundancy.RedundancyHandler]
The Redundancy-Handler is up and waiting for alarms.
2012-12-10 14:43:21,373 WARN [com.starhome.resources.monitoring.redundancy.RedundancyHandler]
The Redundancy-Handler is shutting down...
2012-12-10 14:44:05,614 FATAL [com.starhome.resources.monitoring.redundancy.RedundancyHandler]
Initiating a system failover because the restarts number has exceeded the limit

Note: Do not delete redundancy_log.txt under any circumstances. To reset the file, clear the list.
Check that the redundancy_log.txt file also exists on the secondary machine.

3.1.7 stack_log.txt
TOMIA personnel use this file for advanced debugging purposes.
To access the stack_log.txt file:
1. Navigate to the following directory:
/starhome/igate/xxx-yyy-zzz/logs>%
2. Type more stack_log.txt at the command line, and press Enter:
The text file is blank during normal operations. If new content exists (determined by the date), the
operator can send the complete text file to the TOMIA GSOC for analysis.

Operations and Maintenance Manual Page 211


Proprietary and Confidential Service Module System Log Files

Operations and Maintenance Manual Page 212

Das könnte Ihnen auch gefallen