Beruflich Dokumente
Kultur Dokumente
Bachelor's Project
Acknowledgement
I would like to thank my family, my friends and my colleagues for their insight, support and wisdom. I am truly grateful for being surrounded by such brilliant people.
Declaration
I hereby declare that I have completed this project independently and that I have
listed all the literature and publications used.
I have no objection to usage of this work in compliance with the act 60 Zkon
. 121/2000Sb. (copyright law), and with the rights connected with the copyright act
including the changes in the act.
In . . . . . . . . . . . . . . . . . . . . . . . on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abstrakt
elem tto bakalsk prce je provst analzu dostupnch softwarovch produkt pro systmovou sprvu, a ji komernch i otevench, dle analyzovat monosti
integrace se servery spolenosti Oracle (Sun) a implementace integranho een do
vybranho nstroje.
Soust analzy je t teoretick st zamen na uitenost systmov sprvy,
pouvan metody zskvn dat a protokoly, kter jsou pi monitorovn a sprv
server pouvany.
Abstract
Objective of this bachelor's project is to analyze available systems management
products, both commercial and open--source. It analyzes integration possibilities
against servers made by Oracle (Sun) and a result of this project is an integration
into a selected software.
As a part of analysis there is also a theory focused on benets of systems management, available methods of data acquisition and protocols that are used for monitoring
and managing servers.
Contents
1
Introduction
1
3
3
5
11
11
12
13
13
14
14
15
16
16
17
19
19
19
20
21
21
21
22
23
23
25
29
29
30
31
33
34
35
37
39
39
7 Zenoss integration
7.1
Choosing an approach
7.2
Development environment
7.3
Important design decisions
7.3.1
Event classes
7.3.2
Per-trap mapping vs. defaultmapping
7.4
Development steps
7.4.1
Compiling MIBs
7.4.2
Creating Event classes
7.4.3
Creating Event mappings
7.4.4
Adding products
7.4.5
Final modications
7.5
Testing
7.6
Future extension
41
41
41
42
42
42
43
44
44
45
50
50
51
51
Conclusion
53
CD Contents
57
1 Introduction
Systems management has become a very important topic in almost every organisation depending on IT services. It encompasses entire life cycle of IT infrastructure,
including i.e. tracking and documenting requirements, purchasing and renewing
equipment, license management, fault and risk monitoring etc. While systems management has beenin some wayalways present in IT departments of mid-size to big
enterprises, approach to systems management was often dened in a company-specic way, with no standardization.
However, many companies now span a number of countries or even continents.
For all but the biggest companies, it would be very inecient to invest in development of complete in-house solution for systems managementthese companies rely
on third party solutions, that oer cheaper, well tested and supported alternative.
Decentralization of IT resources is a very important factor for the need of systems
management. It has become quite common to have more than one datacenter, often
in remote locations, possibly quite far apart from each other so that in case of an
accident at or near one of them, the operations of a company can continue relatively
uninterrupted (in this case, by accident we mean either a natural phenomenalike
ooding, storm, reor an act of ill willsuch as a terrorist attack). Because the IT
support may not be alway present on site, an advanced warning of some components'
possible failures is very important. Some, albeit not all system management software
suites can even tie individual systems, groups of systems or even components to a
service, so when a failure is imminent, one can see which services are in jeopardy.
Businesses of today rely on IT more than ever before. Even a minute long outage
can cost thousands of dollars in eect. Therefore, some companies (notably telecommunication companies, banks, etc.) build systems with certain level of redundancy,
so in the case of failure of one system, other system takes over in a reasonable amount
of time, so the interruption is barely noticeable. System management is necessary in
this case as it provides information about the nature of failure and it helps selecting
and migrating to a dierent system.
Computing power (in the sense of CPU processing speeds, RAM and storage sizes,
etc.) keeps growing and its price is falling. However the workload is so variable that
computing power may not load processing node enough so that its power consumption
is actually higher than the outcome of its work.
This led to a rebirth of one IT industryvirtualization. To a certain level, virtualization has been possible on various levels since 1967, in this case on IBM CP-40.
However, the main reason back then was to enable various software to run unmodied or simultaneously (computers were batch oriented and most software was not
designed for any level of multitasking). Now, the reason for virtualization is consolidation, power consumption reduction and control of expenses.
Availability of relatively cheap but powerful commodity hardware has led to a
new architecture of ITinstead of renting a dedicated machine (although this is still
possible), one can rent virtual machines, running on possibly very dierent set of
hardware. With properly setup infrastructure (ber channel or iSCSI disk arrays,
virtualization software supporting live migration etc.), it is possible to achieve a very
high availability and reliability.
However, cheaper systems are being built from cheaper components that are
prone to failure more often than never, thus the need for proper monitoring is high.
With proper software, migrating of virtual machines in case of a hardware malfunction can be automated.
Power consumption monitoring is a very important part of systems management.
With power becoming more expensive, a careful monitoring of power consumption
with relation to tasks performed is required to manage the costs of ones IT operations
or to properly bill the customers (the latter applies specically to cloud computing
customers).
This bachelors project will focus on one area of systems managementsystems
health monitoring. With above in mind, we can try to focus on a clear design, that will
allow implementing above described features or connecting with existing features in
place.
Objective is to design and implement a Zenoss extension (also known as ZenPack)
that will allow to discover, monitor and report system health status of some Oracle
Sun servers to user. Zenoss was chosen because it is a very advanced integration platform, with advanced features such as graphing, so a future extensions like recording
and analyzing power consumption trends can be implemented. Selection was done in
unpublished work by the author, available separately [1].
CA Unicenter NSM
HP Operations Manager
IBM Director
IBM Tivoli Enterprise Console
IBM Tivoli NetCool OMNIbus
All of these products can do passive monitoringlisten for events, either received
using SNMP traps, system logs or some other mechanism (like direct database entry,
command line tool execution etc.).
The Tivoli Enterprise Console, also known as TEC is one of the oldest systems
management package. It relies on Tivoli Management Framework which provides
also way how to install other extensions and patches. TEC itself has rather simple
GUI written in Java, but the backend consists of many helper programs usually written in C. TEC is used to do passive monitoring onlyit waits for events and those
events get processed using internal engine (some of its parts are based on Prolog language). This software package however requires preinstalled database system to be
present.
Nagios
OpenNMS
Zabbix
Zenoss
Figure 2.3
Nagios is the oldest and most mature open--source product. It is very scalable, well
documented, but its web GUI lacks some modern featureswhich of course means it
is very fast, albeit sometimes not very user friendly.
It is written mainly in C, which is another cause of high speed. Monitoring data
can be obtained by running checks either built-in or user supplied scripts called
plugins whose exit code and (optionally) any output is processed and evaluated by
Nagios.
Checks can be run either locally or remotely using a tool called NRPE (Nagios
Remote Plugin Executor). In addition to having Nagios to run a check actively (see
subsection 4.2.1 at page 21), one can also feed data into Nagios asynchronously (see
subsection 4.2.2 at page 22). For more information please see www.nagios.org or
[2].
OpenNMS is another network monitoring/management software package. While
Nagios achieves portability across dierent platform by using C as its programming
language, OpenNMS is written in Java, which makes it too very portable. It requires
With the model built, you can use Zenoss' integrated availability and performance monitoring features to monitor and report on all aspects of your IT
infrastructure. Zenoss also provides events and fault management features
that tie into the CMDB. These features help drive operational eciency and
productivity by automating many of the notication, alerts, escalation, and
remediation tasks you perform each day.
Zenoss is written in Python and is based on Zope application platform and like most
previously mentioned software products, it requires databasespecically MySQL.
Figure 2.6
10
servers
routers
racks
switches
11
12
13
SNMP is a datagram protocol and therefore there is a possibility of the data being
lost en route. This is especially important when using passive monitoringnetwork
elements such as routers can cause UDP packets to be lost and in the case of fatal error
(by fatal error an error causing powering o of the monitored device) the notication
may not be received at all, causing the error to be found due to some other malfunction
(typically a segment of network being down, possibly a service like database or web
server being inaccessible).
OID
varbind
table
scalar
index
14
Management information is viewed as a collection of managed objects, residing in a virtual information store, termed the Management Information Base
(MIB). Collections of related objects are dened in MIB modules. These modules are written using an adapted subset of OSI's Abstract Syntax Notation
One, ASN.1 [10]. It is the purpose of this document, the Structure of Management Information (SMI), to dene that adapted subset, and to assign a set of
associated administrative values.
3.1.3.1 ASN.1
Abstract Syntax Notation One is one of many approaches on data structure description. What makes it stand out is that it allows specication of the structure, but it
also describes its encoding and decoding into various formats (ranging from binary
formats to XML).
ASN.1 is an international standard adopted by Internation Telecommunication
Union (ITU) and by ISO/IEC. It has been standardized as [1013]. Due to its versatility, ASN.1 and its hierarchical data model is used other application protocols as well,
including internet telephony (H.323) and directory services (LDAP).
15
16
# ssh root@myhost
Password:
Waiting for daemons to initialize...
Daemons ready
Sun(TM) Integrated Lights Out Manager
Version 3.0.6.1.d r48331
Copyright 2009 Sun Microsystems, Inc. All rights
reserved.
Use is subject to license terms.
-> show /SYS product_name
/SYS
Properties:
product_name = SPARC-Enterprise-T5220
Figure 3.1
17
Although the output is optimized for human reading and not for programmatic analysis, there are well established tools that can parse this output (expect [17]), and feed
the resulting data to a system management software.
This technique applies not only to system controller, but to BIOSes and even operating system command line utilities. There are a few Zenoss extensionsZenPacks,
that use the technique of parsing text output to deliver information on processes, CPU
load, storage status and more.
# cat /proc/partitions
major minor #blocks name
8
8
8
8
0
1
2
5
312571224
309917916
1
2650693
sda
sda1
sda2
sda5
18
Description of these protocols is beyond the scope of this project, for further information please consult the references. In case of proprietary software, details about the
usage of these protocols may not be fully known, therefore their use as an communication protocol with custom software may be very challenging.
local only
in-band communication
out-of-band communication
side-band communication
By local communication a non-network communication with monitored system is usually meant. This may involve connecting serial console (e.g. laptop with serial line)
or display, keyboard and mouse manually. Watching status LEDs in person can be
also used for quick system status checking. For the purpose of this project, we will
not consider this as a viable method of system monitoring. All other communication
channels are described below.
19
20
This implies that operating system on the monitored device has to support management trac handling (usually, this is accomplished by running a so-called agent).
Also, it means that management trac occupies (at least partially) useful bandwidth
and that the agent will use some CPU cycles.
On the other hand, using this type of communication poses no additional requirements on the existing network infrastructureno additional cabling is required and
no changes to network switches and routers needs to be made. Especially when dealing with many servers, savings on network infrastructure may be signicant.
One signicant drawback of this approach is that without operating system running, management may not be possible (although servers with Wake--on--LAN capability can be at least turned on remotely).
21
therefore it is very important to develop and enforce security guidelines with same
strictness as guidelines applying to operating system and network security.
In conclusion, drawback of this approach is higher network infrastructure costs,
but for setups requiring additional features like storage redirection etc., this approach
is benecial.
22
23
Huge advantage is that very little network trac is generated, and also this
method is very CPU usage friendly (neither agent/system controller nor monitoring
station are processing huge amounts of data).
This method may not be supported by all devices.
In-band
Out-of-band
Side-band
OS Independent
no
yes
yes
Communication port
shared
separate
shared
yes
no
no
none
yes, cabling
yes, setup
Display/storage redirection
needs OS support
yes
yes
Power management
limited
yes
yes
Table 4.1
24
Feature
Active
Passive
Combination
Comm. initiator
management host
monitored device
both
Network trac
high
low
medium
Reliability
high
lower
highest
yes
no
yes
medium
very high
very low
high
lower
present
absent
functioning
about to malfunction
malfunctioning
unknown
Very closely related term is sensor. Sensors are usually connected with components, although they may be connected with a whole system. There are fundamentally two types of sensors:
25
26
Among virtual sensors are those whose condition is base on state of other sensors
(e.g. power sensor measuring in Watts will be calculated from appropriate voltage
and current sensors) or based on a condition detected by software. For example:
non--critical
critical
non--recoverable
When a non--critical threshold is being crossed, usually a notication is generated,
but the condition is not severe and it won't impact function of the system. Staying
beyond critical threshold may potentially aect reliability and endurance may be affected. Non--recoverable threshold crossing usually signals something has gone very
wrong and the system is immediately shutdown (although this can be modied and
sometimes disabled).
Also, thresholds can be low and highfor example, temperature sensor measuring ambient temperature has a all six thresholds dened (high temperature is not
desired equally as freezing temperatures).
Discrete sensors have only a certain set of states they can have. Here is an incomplete list of discrete values certain sensors can have:
disabled
memory error detected
OK/fail
present/absent
Both kinds of sensors have so-called assertions and deassertions. These two are opposite to each other. Assertion means that the sensor assumes some state (usually
27
error state), deasertion means that the sensor leaves the state that was previously
asserted.
However, this may sometimes be trickylets see an example. We have a sensor
HDD0 (the names are usually longer, but for the sake of example lets keep this one)
that has the following states:
Device Present
Device Absent
Hot Spare
Rebuild In Progress
and for all of the, both assertion and deassertion is enabled. In this particular example, having the sensor in Device Present Assert means that the particular device
is present. Similarly, Device Absent Assert will mean that the device has been removed.
There is however one more approachhave the device in Device Absent Deassert and Device Absent Deassert and Device Present Deassert. Both mean
the same thing as the ones in previous paragraphthe device has been inserted (is
no longer absent) and device has been removed (and is no longer present) respectively. Any integration dealing with sensor must be aware of this and preferably
should translate incoming notications into one common format and discard the less
common and more confusing one.
28
The work will be done primarily on latest available servers (i.e. not End--of--Life ones).
Although it may seem as a waste of time to target also servers no longer in production,
it is author's belief that these servers may still be present especially in educational
institutions, where they performance is still sucient and having an open--source tool
for monitoring will be more than benecial.
29
30
platforms and the latter being used on servers with UltraSPARC T1 processorthese
processors have the ability to run several threads in parallel, also called Chip Multithreading, hence the abbreviation CMT).
ALOM had only command line interface and they can send e-mail to administrator in the event of malfunction, newer version of ALOM--CMT also support SNMP
protocol. There is no web GUI, though. ALOM is primarily out--of--band (using serial line or its own network port), but it can be congured from within Solaris using
scadm(1M) command. Features are pretty much standard:
power control
serial console redirection
logical domains (on CMT machines, [27])
environment monitoring
listing, disabling and enabling components
eLOM on the other side can be found only on older x86 platforms. It oers command line interface, SNMP interface and web interface. In addition to features listed
with ALOM (except the logical domains), eLOM has these additional features:
31
# ssh root@alom-server
Copyright 2008 Sun Microsystems, Inc.
Use is subject to license terms.
2009/10/29 16:06
Figure 6.1
serial line
telnet (may be disabled for security reasons)
secure shell
internally over OS tool (e.g. scadm(1M))
6.3 SNMP
SNMP interface is arguably the most used interface for system management. Both
eLOM and ILOM support SNMP from the very rst versions, ALOM--CMT started
to support SNMP directly relatively late.
However, either due to absence of SNMP interface (ALOM--CMT prior to v1.4) or
due to simple wish to monitor the system in--band, there are so-called agents. There
are currently two:
Monitoring Agent for Sun Fire and Netra Systems (MASF) [28]
32
# ssh root@elom-host
root@elom-host's password:
Sun(TM) Embedded Lights Out Manager
Copyright 2004-2006 Sun Microsystems, Inc. All rights reserved.
Version 2.91
Hostname: SUNSP0016365B97FB
IP address: 10.18.141.146
MAC address: 00:16:36:5B:97:FB
System serial number: 0624QC0029
/SP -> show /SP/SystemInfo/ProductInfo
/SP/SystemInfo/ProductInfo
Targets:
Properties:
ProductManufacturer = Sun Microsystems
ProductProductName = Sun Fire X2200 M2
ProductPartlNumber = 1S39U9ZST61
ProductSerialNumber = 0624QC0029
AssetTag =
Target Commands:
show
Figure 6.2
33
# ssh root@sparc-ilom
Password:
Waiting for daemons to initialize...
Daemons ready
Sun(TM) Integrated Lights Out Manager
Version 3.0.6.1.d r48331
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Warning: password is set to factory default.
-> show /SYS
...
Properties:
type = Host System
ipmi_name = /SYS
keyswitch_state = Normal
product_name = SPARC-Enterprise-T5220
product_part_number = 602-3821-08
product_serial_number = BEL07513TT
product_manufacturer = SUN MICROSYSTEMS
fault_state = OK
power_state = On
...
34
ENTITY-MIB
SUN-PLATFORM-MIB
SUN-ILOM-PET-MIB
SUN-HW-TRAP-MIB
SUN-HW-MONITORING-MIB
SUN-ASR-NOTIFICATION-MIB
In the following paragraphs, we will look into these MIBs in higher detail.
various components of the server, including details about count and type of processors,
DIMM modules manufacturer etc.
SUN-PLATFORM-MIB is a MIB that extends ENTITY-MIB with details about
operational state and also it contains tables that identify and list system sensors, together with their thresholds and current values. Also, this MIB in particular denes
some notications, that can be used to dynamically modify the model of monitored
system and/or it can be translated and displayed to user. However, these traps do not
carry all the information (like the type of sensor issuing the warning), so additional
action is required to get such information (typically, this is done using regular expression that looks for a certain pattern of sensor names). Using regular expressions is
quick and functional way, but author believes the correct approach is to poll the agent
or system controller for a correct sensor type based on received OIDs present in the
notications. These two MIBs are supported in MASF (SPARC) and all ILOMs and
eLOMs.
SUN-ILOM-PET-MIB is one of the MIBs that doesn't use typical Sun (Oracle)
OID tree, but it instead uses a tree wiredformgmt (Wired for Management). This
is an OID tree reserved by Intel [31]for so-called PETs (Platform Event Traps). These
largely correspond with IPMI and ofter carry similar date. However, such trap generated carries a computed specic type (a number that identies the type of trap or
35
notications that is being sent). Most NMSes can't deal with dynamic specic types,
they expect these numbers to be assigned statically and dened in the MIBand that
is the purpose of this MIB. However, in case there is another PET MIB by a dierent vendor, they will share the OID tree and the numbers will collide. Not only will
the names and descriptions of most or all notications dierent, but some may have
totally dierent meaning.
SUN-HW-TRAP-MIB was designed relatively recently with a single purposeeliminate the need to do a regular expression matching or polling agent when a trap is
received. Hence, a direct display of these traps is preferred.
SUN-HW-MONITORING-MIB was designed to remove a dependency on ENTITY-MIB
and to provide some more information about the monitored system. It features data
like cumulative state, which is computed on the monitored host side. The advantage
of this approach is mainly saving the network tracNMS may poll only few values in the MIB and get a full tree only in case something goes wrong. This MIB is
implemented only in the Hardware Management Agent.
SUN-ASR-NOTIFICATION-MIB is currently implemented by ASR agent. Description from [32]:
ASR is a secure, scalable, customer--installable software feature of warranty
and SunSpectrum support that provides auto-case generation when specic
hardware faults occur. ASR is designed to enable faster problem resolution by
eliminating the need to initiate contact with Sun for hardware failures, reducing both the number of phone calls needed and overall phone time required.
ASR also simplies support operations by utilizing electronic diagnostic data.
In case there is an error detected (hardware error), the ASR agent sends details
about the error, together with unique identier of the system to Oracle, where the
data is ltered and entered as a Service Request on behalf of the customer. This saves
time and communication eorts. In addition, ASR generates a SNMP notication to
inform the customer about Service Request being created on his behalf.
6.3.1.2 Notications
It is not feasible to describe every single notication declared in all MIBs, as that
would make this document extensively long and also very quickly outdated. In this
section, we will describe the basic principles behind notications in Oracle (Sun)
MIBs.
36
notication. Its sole purpose is to inform NMS that a conguration change has occurred and that it should reread all data.
SUN-PLATFORM-MIB has at present twelve notications dened. These notications were designed to work in cooperation with ENTITY-MIB, and as such each
notication carries an OID that points to the ENTITY-MIB and contains some additional information. However, this is not practical for integrations that only translate
notications, so there are additional varbind sunPlatNoticationAdditionalInfo that contain a human--readable text of the event that occurred.
SUN-ILOM-PET-MIB was already briey described. What is interesting about
the notications is that they contain only one varbind, but with a string of encoded
binary data. Among them there is also a sensor name, which is often decoded from
the trap and the rest is discarded as the meaning of the notication is already given
by the specication.
SUN-HW-TRAP-MIB is the only MIB designed solely for the purpose of sending
traps. As of now, it has seventy three notications dened. Names of the notications
contain both the type of sensor on which the event occurred, but also which threshold
was crossed. In the additional varbinds there is the full name of the sensor, threshold
value and current value. Example:
sunAsrSrCreatedTrap
sunAsrSrCreationInProgressTrap
sunAsrSrUpdatedTrap
sunAsrSrDelayedTrap
sunAsrSrFailureTrap
37
With these notications, NMS can display appropriate messages when a service request gets created, is being created, has been updated, is delayed or has failed, respectively.
entPhysicalTable
entLogicalTable
entLPMappingTable
entAliasMappingTable
entPhysicalContainsTable
38
sunHwMonInventoryTable
sunHwNumericVoltageSensorTable
sunHwDiscreteVoltageSensorTable
sunHwNumericCurrentSensorTable
sunHwDiscreteCurrentSensorTable
sunHwNumericPowerDeviceSensorTable
sunHwDiscretePowerDeviceSensorTable
sunHwNumericCoolingDeviceSensorTable
sunHwDiscreteCoolingDeviceSensorTable
sunHwNumericTemperatureSensorTable
sunHwDiscreteTemperatureSensorTable
sunHwNumericProcessorSensorTable
sunHwDiscreteProcessorSensorTable
sunHwNumericMemorySensorTable
sunHwDiscreteMemorySensorTable
sunHwNumericHardDriveSensorTable
sunHwDiscreteHardDriveSensorTable
sunHwNumericIOSensorTable
sunHwDiscreteIOSensorTable
39
sunHwNumericSlotOrConnectorSensorTable
sunHwDiscreteSlotOrConnectorSensorTable
sunHwNumericOtherSensorTable
sunHwDiscreteOtherSensorTable
sunHwMonIndicatorTable
sunHwMonTotalPowerConsumption
As one can see, this MIB is more ne grained that ENTITY-MIB. In addition to these
tables, certain values of interest are also directly available as scalars, which radically
simplies writing management extensions. There are quite a few scalars, only some
are listed below (for a full list and description see the MIB itself, it is well commented):
sunHwMonProductName
sunHwMonProductType
sunHwMonCumulativeSensorAlarmStatus
sunHwMonIndicatorServiceName
sunHwMonIndicatorServiceCurrentStatus
6.4 IPMI
IPMI is supported only in eLOM and ILOM. Utilities that access system controllers
over IPMI (e.g. ipmitool(1M), [33]) can use two connection methods:
40
However, web interface is used quite often, it oers a quick way how to check
server status, server components and also to upgrade rmware remotely without having to run TFTP server.
7 Zenoss integration
Since we now have all management protocols, approaches and Oracle Sun servers
available interfaces described, we can start designing and implementing Zenoss integration. As resources materials [3641]were invaluable and provided all information
needed for designing and implementing the integration.
41
42
Curry's [40]development tree was stored outside of Zenoss and versioned in Mercurial repository.
Zenoss integration
43
44
$ zenmib -v 10
/Events/Oracle
/Events/Oracle/Voltage
/Events/Oracle/Temperature
/Events/Oracle/Electrical Current
/Events/Oracle/Fan Speed
/Events/Oracle/Other
/Events/Oracle/Power Supply
/Events/Oracle/Fan
/Events/Oracle/Processor
/Events/Oracle/Memory
/Events/Oracle/Hard Drive
Zenoss integration
45
/Events/Oracle/IO
/Events/Oracle/Slot or Connector
/Events/Oracle/Component
/Events/Oracle/FRU
/Events/Oracle/Power Consumption
These can be created from GUI by following the Events menu item in the left navigation bar and the by clicking Add New Organizer from the menu on the left from
Subclasses.
However, it is also possible to do this using a tool zendmd, which is essentially
a Python interpreter with preloaded Zenoss classes [44](this is just a skeleton script,
full can be found on CD in directory scripts as le createEventClasses.py):
import Globals
from transaction import commit
from Products.ZenUtils.ZenScriptBase import ZenScriptBase
dmd = ZenScriptBase(connect=True).dmd
event_classes = [
'/Events/Oracle',
'/Events/Oracle/Voltage',
...
]
for ec in event_classes:
dmd.Events.manage_addOrganizer(ec)
commit()
As a result, we now have all event classes we need in place and can proceed to
the event mappings creation.
46
However, if we do that for just one notication we can observe the following attributes are present (lled values are in parentheses) and the rest is to be lled manually:
NameAn identier for this event class mapping. Not important for matching events.
Event Class KeyMust match the incoming event's eventClassKey eld
for this mapping to be considered as a match for events.
SequenceSequence number of this mapping, among mappings with an
identical event class key property. Go to the Sequence tab to alter its position.
RuleProvides a programmatic secondary match requirement. It takes a
Python expression. If the expression evaluates to True for an event, this
mapping is applied.
RegexThe regular expression match is used only in cases where the rule
property is blank. It takes a Perl Compatible Regular Expression (PCRE).
If the regex matches an event's message eld, then this mapping is applied.
TransformTakes Python code that will be executed on the event only if
it matches this mapping. For more details on transforms, see the section
titled Event Class Transform.
ExplanationFree-form text eld that can be used to add an explanation
eld to any event that matches this mapping.
ResolutionFree-form text eld that can be used to add a resolution eld
to any event that matches this mapping.
Although we possibly could enter all mappings by using GUI, this would be error
prone and not very ecient. Luckily, as Zenoss is based on Zope, every GUI action
has a corresponding Python function that can be called.
Zenoss integration
47
To manipulate event classes, we rst need to get the class that represents them.
This is doable by the following method:
dmd.Events.getOrganizer(name)
eventClassKey and id shall be set to the translated name of the SNMP notication.
example shall be set to snmp trap <name>.
transform shall contain Python code that will modify received event text, severity and possibly set other values so clearing will work.
explanation and resolution may contain text explaining nature of the
event.
Transform eld, corresponding to the transform attribute will contain dierent
Python code for notications from dierent MIBs. Some of them may be dropped
automatically:
48
Most of the traps from SUN-HW-TRAP-MIB will have processing similar to this
(please note, that although MIBs do specify an user friendly mapping of integers to
names, Zenoss does not use these mappings):
Other notications will have similar processing. How do we put all this together?
Lets put together a algorithm:
1.
2.
3.
4.
When this is done, one may end up with a following script. Of course, this is not a
complete script, full version is present on the CD. First, we need to prepare a list of
notication, together with their Event Classes:
Zenoss integration
49
denitions = []
# No /Events/Oracle needed, that is added automatically
# Sun HW Trap MIB - threshold notications
for sensor_short, sensor_type, zen_group in [
('Voltage', 'Voltage', '/Voltage'),
('Temp', 'Temperature', '/Temperature'), ...
]:
for thr_value, severity, threshold_type in [
('Fatal', 5, 'non-recoverable'),
('Crit', 4, 'critical'),
('NonCrit', 3, 'non-critical')]:
name = 'sunHwTrap' + sensor_short + thr_value +
'ThresholdExceeded'
organizer = zen_group
transform = hw_thr_assert % {
'severity' : severity,
'type' : sensor_type,
'threshold_type' : threshold_type}
d = {
'name' : name,
'organizer' : organizer,
'transform' : transform}
denitions.append(d)
Here, the hw_thr_assert and hw_thr deassert are strings that contain the
template for transformation script to be input into Zenoss.
When we have the denitions array lled up with transformation rules, we
can cycle through them and create mappings in Zenoss:
50
import Globals
from transaction import commit
from Products.ZenUtils.ZenScriptBase import ZenScriptBase
dmd = ZenScriptBase(connect=True).dmd
commit()
Zenoss integration
51
correctly. Hence, a walkthrough the generated mappings is recommended and modifying the generated code to make it more ecient for given purpose is encouraged.
Small modications were needed especially with the notications that cover more
than one event (sunHwTrapHardDriveStatus) and most SUN-PLATFORM-MIB
notications.
7.5 Testing
Optimal approach for testing would be to create an automation that would simulate
failures on physical machines, which would in turn respond with notication. A semi-manual checking would then be required to conrm that the integration works as
expected.
However, due to time constraints and unavailability of all testing machines, a
dierent approach was chosen. One server (Oracle Sun SPARC Enterprise T5220
Server) was congured to send notications from system controller and MASF agent
to the same IP address running Zenoss with this integration. Hard drives, power supplies and fans were the removed and the reinstalled to verify that traps are received
and cleared.
52
functional polling and to function properly, a model will need to be updated anyway
from time to time, just to make sure that a SNMP notication wasn't lost en route.
Graphing and reporting. Based on data obtained by previous two extensions, it
would be possible to implement graphing and reporting, showing for example temperature trends, and more importantly power consumption.
8 Conclusion
This project was partially research and partially implementation oriented. As a result, a brief yet hopefully useful description of system management motivations, technologies and software was given.
In addition, a basic but functional integration into open--source system management tool was developed and tested (albeit only in limited way), by which this project
fullled its assignment.
Author implemented a new and previously unknown (or at least not publicly described) way how to create Event Class mappings programatically.
However, from the former idea of a complete monitoring solution that would do
polling, graphing and notications simultaneously was not realized. Nonetheless,
even though this solution does not use all features of Zenoss, there is a room for
improvement, as described earlier.
53
54
References
[1] O. Jakubk, Selecting open-source system management solution for integrating
with Sun servers (unpublished, 2009). Available on CD.
[2] E. Galstad Nagios Core Version 3.x Documentation. (2009).
[3] Zabbix SIA, Zabbix 1.8 manual.
[4] Zenoss, Inc., Zenossgetting started (Zenoss, Inc., 2009).
[5] Wikipedia, Simple network management protocol (2010).
[6] M. Rose and K. McCloghrie, RFC1155: Structure and identication of management information for TCP/IP-based internets (IETF, 1990).
[7] K. McCloghrie and M. Rose, RFC1156: Management Information Base for network management of TCP/IP-based internets (IETF, 1990).
[8] J. Case, M. Fedor, M. Schostall, and J. Davin, RFC1157: Simple Network Management Protocol (SNMP) (IETF, 1990).
[9] K. McCloghrie, D. Perkins, and J. Schoenwaelder, RFC2578: Structure of Management Information Version 2 (SMIv2) (IETF, 1999).
[10] ITU, Abstract Syntax Notation One: Specication of basic notation (ITU, 2002a).
[11] ITU, Abstract Syntax Notation One: Information object specication (ITU,
2002b).
[12] ITU, Abstract Syntax Notation One: Constraint specication (ITU, 2002c).
[13] ITU, Abstract Syntax Notation One: Parameterization of ASN.1 specications
(ITU, 2002d).
[14] Intel, HP, NEC, and Dell, Intelligent Platform Management Interface Specication (Intel, 2009). Second generation, v2.0.
[15] DMTF, Inc., Web-based enterprise management (wbem) faqs (DMTF, Inc., 2010).
[16] The Open Group OpenPegasus. (2010). www.openpegasus.org.
[17] D. Libes, The expect home page (Don Libes, 2009). http://expect.nist.gov/.
[18] R. Gerhards, RFC5424: The Syslog Protocol (IETF, 2009).
[19] R. Thurlow, RFC5531 RPC: Remote Procedure Call Protocol Specication Version
2 (IETF, 2009).
[20] Object Management Group, Inc. Common Object Request Broker Architecture
(CORBA) Specication, Version 3.1. (2008).
[21] World Wide Web Consortium SOAP Version 1.2 Part 1: Messaging Framework.
(2007). second editions.
[22] D. Winer, Xml-rpc specication (xml-rpc.com, 1999).
[23] Sun Microsystems, Inc. Sun Advanced Lights Out Manager (ALOM) 1.6 Administration Guide. (2007b). 819-2445-11.
55
56
[24] Sun Microsystems, Inc. Advanced Lights Out Management (ALOM) CMT v1.4
Guide. (2007a). 819-7991-10.
[25] Sun Microsystems, Inc. Embedded Lights Out Manager Administration GuideFor
the Sun Fire X2200 M2 and Sun Fire X2100 M2 Servers. (2009). 819-6588-14.
[26] Oracle, Inc. Oracle Integrated Lights Out Manager (ILOM) 3.0 Getting Started
Guide. (2010c). 820-5523-11.
[27] Oracle, Inc. Oracle VM Server for SPARC. (2010e). (formerly LDOMS).
[28] Sun Microsystems, Inc. Sun SNMP Management Agent for Sun Fire and Netra
Systems. (2004).
[29] Oracle, Inc. Sun Server Management Agents 2.0 User's Guide. (2010b).
821-1610.
[30] K. McCloghrie and A. Bierman, RFC2737: Entity MIB (Version 2) (IETF, 1999).
Obsoleted by RFC 4133.
[31] Intel, HP, NEC, and Dell Platform Event Trap Format Specication. v1.0.
[32] Oracle, Inc. Auto Service Request (ASR) v2.6Installation and Operations
Guide. (2010a). http://wikis.sun.com/display/ASRSO/Home.
[33] D. Laurie IPMItool. (2007). http://ipmitool.sourceforge.net/.
[34] Oracle, Inc. IPMItool. (2010d).
http://www.sun.com/systemmanagement/tools.jsp.
[35] Sun Microsystems, Inc., PSARC 2008/119 sun4v /dev/bmc (Sun Microsystems,
Inc., 2008). (not available publicly).
[36] Zenoss, Inc. Zenoss Administration. (2010b).
[37] Zenoss, Inc. Zenoss Developer's Guide. (2010c).
[38] Zenoss, Inc., Zenoss 2.5 source code documentation (Zenoss, Inc., 2010a).
[39] J. Curry Zenoss Event Management. (2010). version 3.
[40] J. Curry, Creating Zenoss ZenPacks (Jane Curry, 2009a).
[41] J. Curry Crafting Zenoss Core users for events and zProperties. (2009b). draft.
[42] Sun Microsystems, Inc. Monitoring Sun Servers in an IBM Tivoli Enterprise
Console Environment. (2009b).
[43] Sun Microsystems, Inc. Monitoring Sun Servers in an IBM Tivoli Netcool/OMNIbus Environment. (2009a).
[44] N. Brockett, batchaddlocations.py (Zenoss, Inc., 2009).
A CD Contents
As a part of this project, a CD was created. It contains the following les and directories:
LVII
58