Beruflich Dokumente
Kultur Dokumente
Notices
This publication is provided "as is" without warranty of any kind, either expressed or implied. Use of this publication is at your own risk and Hewlett-Packard Company shall have no liability for damages of any kind. While reasonable precautions have been taken in the preparation of this document, Hewlett-Packard Company assumes no responsibility for errors or omissions. This document may contain technical inaccuracies or typographical errors. This document may be modified without notice. The names of products and services included herein are trademarks of their respective owners. The products described in this publication may also be protected by one or more US patents, foreign patents and/or pending applications, copyright and/or other intellectual property rights.
Intended Audience
This document is intended for the following audiences:
It should be noted that anyone intending to develop event reduction beyond that of configuring the supplied Composer correlators or de-duplications should have the appropriate training. This white paper assumes the reader is familiar with the NNM product and has read the Managing Your Networks manual ($OV_WWW/htdocs/C/manuals/Managing_Your_Network.pdf). In particular the section on Event Reduction Capabilities needs to be read to become familiar with some of the newer product features. Also, the HP OpenView Correlation Composers Guide manual($OV_WWW/htdocs/C/manuals/COMPOSER.pdf) needs to be read to become familiar with the Correlation Composer concepts.
De-duplication
The purpose of de-duplication is to simply remove multiple occurrences of the same event from the alarm browsers. The most recent occurrence of an identical event appears in the browser with all other occurrences correlated underneath the most recent. De-duplication works well in removing unnecessary noise from the event browser; it also provides a better organization events by grouping identical events under the single most recent occurrence. All occurrences can be seen by drilling down from the top most event; hence all the event information remains accessible to the operator. A good example of a de-duplication provided by NNM is OV_Node_Added. Most operators dont want to see all the nodes that are added during a discovery or polling cycle; particularly as those events get scattered throughout the browser. It makes finding a particular OV_Node_Added very difficult. By de-duplicating this event only the most recent OV_Node_Added appears in the alarm browsers and all the other OV_Node_Added are correlated underneath; making it easier to find a particular OV_Node_Added event. For de-duplication to work the notion of event equality must be configured. Minimally, for two events to be considered identical they must have the same trap or notification OID. Additional qualifiers to event equality are source ($r) and any varbind ($NUM). The de-duplication configuration file is supplied as: UNIX: $OV_CONF/dedup.conf Windows: %OV_CONF%\dedup.conf
Each line in the file specifies the fields of the event to be compared for duplication. For more information on the format of the de-duplication file refer to the dedup.conf man page. For a detailed list of de-duplicated events provided with NNM please refer to Appendix A of this document.
Composer correlators
The premise of Correlation Composer is that many event reductions have the same general logic template (pattern) and fall into one of the following categories: Suppress Enhance Rate Repeated Transient Multiple Source
The event logic or flow aspects of these correlators can be generalized and so what remains to implement a correlation is to configure one of the templates into a specific instance. An example from the correlators provided with NNM is Multiple Reboots. Managed devices may be rebooted several times by an administrator within a period of time; the only relevant operator information is if the device continues to reboot and/or stays down. Multiple Reboots is a Composer correlator instance of the Rate template that is configured to receive coldstart and warmstart traps. If 4 such events come within a 5 minute period then a new reboot trap is issued; otherwise the coldstart and warmstart traps are ignored. The instance data in this case are the incoming event signatures and the time interval and event count that trigger the new event to be sent. Composer is implemented as an ECS super circuit that contains sub-circuits for Suppress, Rate, et al. Composer also provides a UI for creating and configuring the correlator instances. As with all other circuits, the Composer super circuit is managed (enabled/disabled) from the ECS Configuration window. To start the Composer UI for to creating or modifying a correlator select the Composer circuit in the ECS Configuration window and click on Modify. For the complete details on how to use the Composer UI to create and modify correlators refer to the Composer manual ($OV_WWW/htdocs/C/manuals/COMPOSER.pdf). Also the on-line help of the Composer UI provides information on how to configure each template. For all the details on the Composer correlators provided with the NNM product please refer to the Event Reduction Capabilities chapter of the Managing Your Networks manual.
Before getting into the details of event reduction its important to have a basic understanding of the event flow within NNM and in general how the various processes operate on events.
Genannosrvr
The above diagram illustrates the event flow at a high level. The Post Master Daemon (PMD) is the first process to receive events from the SNMP stack (ovtrapd). The event flow between the major components is as follows: 1. Events are first written (logged) to the Binary Event Store (BES) by OVEvent. 2. OVEvent sorts out the logonly/ignore events and sends all other events to ECS for correlation 3. ECS performs the correlations on the event flow as defined by the circuits and Composer rules and releases the correlated events back to OVEvent 4. OVEvent then supplies the correlated events to its subscribers (netmon, xnmevents, ovalarmsrv) 5. ovalarmsrv manages the window of events that the browsers present (i.e. the most recent 3500). On this window of events, ovalarmsrv performs de-duplication and process the pattern delete action even
OVEvent
OVEvent and ECS are the two stacks in NNMs postmaster daemon process. For the most part these stacks function as separate processes and can be though of as separate modules where the
communication between them is high bandwidth. OVEvent serves the following major roles in the event processing path: Logs events into the event database Writes the correlation entries into the correlation logs Producer to all subscribers of the RAW, CORRELATED and ALL event streams
Most events are processed in two passes through OVEvent. In the first pass the events are sorted according to LOGONLY, IGNORE and NORMAL. LOGONLY and NORMAL events are written to the event database and then sent on to ECS for correlation. In the second pass the NORMAL and LOGONLY events are sorted for the subscribers and OVEvent processes the subscription filters and notifies all event subscribers. LOGONLY events are put on the CORRELATED flow but are not displayed by the browsers. OVEvent also performs the actual correlation logging requests (from ECS and ovalarmsrv) and notifies the subscribers when events are correlated.
ECS
The ECS stack is the correlation engine that performs the correlation logic defined by the circuits and the Composer correlators. The following details the event flow through the ECS engine. 1. Events are first evaluated to see if they match the input signature for any of the active circuits or correlators. 2. Events that dont match any signature are returned immediately to OVEvent. Events that do match are held and in the case of Composer are evaluated against the Advanced filters.
3. Composer events that pass the advanced filter then have the logic of the correlator executed. All actions from all correlators for the matching event are executed. 4. After processing the event is either held, released or dropped depending on what the correlator has specified. 5. If multiple correlators have the event held then the holding period becomes the longest such period specified by the correlators. 6. Once released the callback actions are performed and the events are returned back to OVEvent.
ovalarmsrv
ovalarmsrv is the UI server that maintains the window of the currently viewable events. It subscribes to OVEvent to receive all events from the CORRELATED stream. Because ovalarmsrv manages the viewable window of events it was the appropriate point for doing de-duplication. ovalarmsrv reads the dedup.conf file to build the list of events that are to be de-duplicated.
All new events that come from the CORRELATED stream are checked to see if they are de-duplicate candidates. If the event is a candidate and there is already an active candidate in the viewable window, then ovalarmsrv builds a correlation request to have the most recent de-duplicated candidate suppress the currently active candidate.
The following list of mechanisms is a rank order of complexity in terms of developing an event reduction; the simplest to develop being first. 1. Log Only or Ignore 2. De-duplicate 3. Composer correlator 4. ECS Circuit Log only and de-duplication are mechanisms that operate on a single event type independent of other events. Composer correlators and ECS Circuits are more powerful in that they can be designed and developed to identify a pattern of events and reduce that pattern to a single root cause. The rationale for having this range of mechanisms is provide some scale of effort to developing reductions (i.e. simple things should be simple to do). If the event being considered for reduction are independent and of no use to the operators in real time, then the simplest and most efficient mechanism is to configure that event to be LOGONLY. A good example of this is in NNM is SNMP_Authen_Failure. This trap is configured as a LOGONLY trap and a report can be scheduled to run at various intervals to produce a list of hosts and frequencies of an authentication failure for security monitoring. If the event being considered for reduction is frequent but the operators do occasionally require real time to access to the event data then de-duplication is the most appropriate. De-duplication will leave only the most recent occurrence of the event at the top level in the browser with all duplicates correlated underneath. This mechanism also provides a better organization to the events in the alarm browsers as the duplicate events are collected under one top-level event as opposed to appearing through out the browser. If the event(s) being considered for reduction are not independent and are symptomatic of a more fundamental problem then a correlator is the most appropriate choice. The point at which ECS Circuits are more appropriate over Composer correlators is harder to define. In general, ECS circuits will continue to be a part of complex solutions like managing FrameRelay or MPLS. This is mostly due to it
being more general and complex solutions will require that generality even at the expense of more time to develop. Correlation Composer is expected to be adopted by a wider audience of users as compared to that of ECS designer. Also the logic of correlator being developed should fit well into one or a combination of the Composer templates. The Composer templates have encapsulated the common logic uses cases such as transient, rate, etc. If the correlation requires significant logic and state beyond the Composer templates then it is more of a candidate for an ECS circuit. Practical experience in developing event reductions shows a valuable design pattern for any correlator is to combine de-duplication with the correlator. The nature of a correlator is to hold onto an event(s) for some period, do an analysis and then release the events correlated under some root cause. Often times the result of using just a correlator will produce a repeated pattern of root cause events in the browser; all basically indicating the same problem. Extending the window of time in the correlator can reduce the frequency of these patterns but this can also slow down the event system by holding onto events. The better solution in this case is to have the suppressor event (root cause) be de-duplicated. This allows the correlator to release the correlations more frequently and the browser is kept free from noise by having all occurrences of the root cause de-duplicated under the most recent. This type of solution also reduces the net amount of processing required by PMD and ovalarmsrv. An example of using this technique is with OV_IF_Intermittent. This is the root cause event of the OV_Connector_IntermittentStatus correlator and it is also de-duplicated.
Analyzing Events
Before investing any effort in developing a correlation it is extremely important to get an accurate big picture view of the events being processed by the NNM management system. To help in the analysis of events two scripts were developed (processEvents & processCorrEvents). These scripts are delivered with the product and are in the support directory. The procedure for analyzing events is as follows.
It is also recommended to get snapshot samples of the correlation logs. This will indicate how much event reduction is currently happening and will serve as a baseline for measuring any new or modified correlation developed.
Development Tips
2. Verify there are no clashes with existing correlators Review the table in Appendix A to verify the new correlator will not interfere with any existing correlator; either by having the same input events or releasing any new event that may be feed into an existing correlator. 3. Test in isolation first to validate functionality Disable all other rules and circuits and test the functionality of the new correlator by sending the appropriate input events to the new correlator. See the <<ecsevgen.exe>> documentation for doing this. Validate the results of the correlator by using the browser. If the expected results are not being returned then you may need to turn on tracing. See the section on trouble shooting for tracing ECS. A good practice to follow if the new correlator has external functions or perl scripts is to put some tracing capability in the functions and scripts. This allows the developer to trace the progress of the new correlator without having to get too involved with the ECS tracing. 4. Test coexistence Verify the new correlator will still function properly with the product correlators enabled. If there are coexistence problems then one at a time disable the product correlators to isolate the failure. Once isolated careful inspection of the rules along with ECS tracing will most likely be required to understand the problem. 5. Test performance Verify the new correlator does not seriously impact the behavior of the systems ability to handle a storm of events while the new rule is enabled. There are various ways to do this but repeatedly doing the following is a commonly practiced way to simulate a storm: ovtopofix S down sleep 120 ovtopofix S up This should be done with all product correlators enabled. 6. Version all working copies of the Composer.fs to avoid loosing work Once the new correlator is developed and tested then save a copy of the test systems Composer fact store for versioning ($OV_CONF/ecs/circuits/Composer.fs). The only backup copy provided by the system is under $OV_NEW_CONF/OVEVENTMIN/ecsCircuits/Composer.fs. This backup copy contains just the product correlators. 7. Merge (csmerge) the new correlators with NNM product Composer.fs If the new correlator was developed on top of the product Composer.fs then merging is not necessary. If new correlators are developed separately then they will need to be merged together to have a single fact store. The merge tool csmerge should be used when combing the rules of different fact stores.
Trouble Shooting
And also turn the tracing off in the ECS stack of PMD. ecsmgr i 1 trace 0 pmdmgr Secss\;T0x0 The following is example output of Composer tracing the multiple reboot: TRACE [interpreter]: Composer : 19700101000000.000000Z : "eventid(0:34237)" : OV_MultipleReboots : Incoming Alarm passed Alarm signature for this correlator
TRACE [interpreter]: Composer : 19700101000000.000000Z : "eventid(0:34237)" : OV_MultipleReboots : Alarm passed both primary and advanced filter for Correlator TRACE [interpreter]: Composer : 19700101000000.000000Z : "eventid(0:34237)" : OV_MultipleReboots : Executing logic for the Correlator - starting TRACE [interpreter]: Composer : 19700101000000.000000Z : "eventid(0:34237)" : OV_MultipleReboots : The Correlator has decided the following - :Event will be output. As stated before the output from PMD tracing is extremely verbose and quite a lot of it wont make sense in the context of tracing a correlator. To see just those trace messages relevant to a particular Composer correlator, the pmd.trc0 file should be grepd for the lines that have Composer in them as well as the name of the correlator. The above output was obtained by doing: grep Composer pmd.trc0 | grep OV_MultipleReboots
Additional Tips
If there are changes to the functions being called or new ones added then these will be the most likely places to look for the problem To quickly determine any new perl script callouts use the following command: grep 'perl' $OV_CONF/ecs/circuits/Composer.fs | grep '^(1' | \ cut f 2 -d ' ' There are no perl scripts used in the Composer.fs provided with NNM so the default results are empty.
Appendix A
The following table lists all events that currently participate in the NNM product correlators and/or deduplication. Event Name De-Duplicated ECS Suppressed OV_IF_Up OV_IF_Down OV_IF_Unknown OV_IF_Intermittent OV_Node_Up OV_Node_Down OV_Node_Unknown OV_Node_Added OV_Segment_Normal OV_Segment_Major OV_Segment_Critical OV_Network_Normal OV_Network_Critical OV_Station_Normal OV_Station_Marginal OV_Station_Major OV_Station_Critical OV_RemoteManager_Up OV_RemoteManager_Down coldStart X X X X X X X X X X X X X X X X X X X X X X X X X X X X Suppressor X X X X Composer Suppressed Suppressor
warmStart OV_Multiple_Reboot OV_HSRP_UP OV_HSRP_State_Transition OV_HSRP_Marginal OV_HSRP_Warning OV_HSRP_Unknown OV_HSRP_Major OV_HSRP_Down OV_Chassis_Cisco OV_Chassis_Temperature OV_Chassis_FanFailure OV_Chassis_PowerSupply OV_Bad_Subnet_Mask OV_Duplicate_IP_addr OV_DuplicateIfAlias OV_IPV6_addrUp OV_IPV6_addrDown OV_Lic* (All OV licensing traps) RMON_Rise_Alarm X X X X X X X
X X
X X X X