Sie sind auf Seite 1von 56

Cisco Support Community

● Home
ASR9000/XR: Understanding SNMP and troubleshooting
Document
Fri, 09/11/2015 - 14:41

Alexander Thuijs Mar 25th, 2013

● Introduction
● SNMP architecture in IOS-XR
❍ SNMP Packet flow inside the system
❍ XR SNMP Specifics
❍ SNMP performance improvements
❍ Caching
❍ Parallel vs Serialized processing
■ Enhancements in XR 4.1
■ Enhancements in XR 4.2
■ Example (performance) trace point logging
❍ SNMP process architecture
■ XR processes referenced
❍ SNMP tracepoints
● XR MIB implementation specifics
● Troubleshooting commands and what they do
❍ Show commands that are new to XR 4.2 onwards
● Troubleshooting PDU performance issues
❍ Troubleshooting Goal:
❍ Workarounds
❍ Determining Internal Timeout of a MIBd
■ Troubleshooting
● Examples and Recommendations
❍ Examples of row traversal
❍ Timeout and Retry Setting on NMS
● Related Information

Introduction
In this document we'll be discussing the SNMP architecture as it is implemented in IOS-XR. As
you can read in the IOS to XR migration guide (A starting point), some of the high level
differences between IOS and XR are already being highlighted.

As IOS-XR is a highly distributed operating system and is using hardware forwarding, the way
that SNMP retrieves counts and responds to requests is a bit different then what you might be
used to and in this article we deep dive into the architecture of stats collection, how it
operates and what show commands you can use to verify the performance of your SNMP in
regards to IOS-XR and specifically for the ASR9000 (though this article also applies to CRS and
GSR running IOS-XR).
XR routers are highly distributed. Increasing capacity by distribution and replication does come
at a cost. In any scaled design where replication or multiplication of the processing devices is
used, a critical additional component is the design is the inter process communication path
between the processing components

The nature for this article originated from the fact that some of our customers have seen SNMP
timeouts in XR 4.2.3 and has raised a lot of questions in regards to caching, stats collection
and the way SNMP operates. Hopefully with this technote we can clear up some of the
confusion.

SNMP architecture in IOS-XR


This section describes the symptoms of the problem and the main issue the document
resolves.

SNMP Packet flow inside the system

Depending on your configuration SNMP packets can be received in band or out of band (as per
MPP definitions, see article on LPTS and MPP for more info) and after intial reception and
punting to the control plane (RSP), they are handed over to NETIO. NETIO is sort of an IP
INPUT process in IOS that deals with process level switching.

IF the SNMP requests are "for me" they are handed over to the SNMP-D process for evaluation
of the request and dispatch to the next layer of processing.
XR SNMP Specifics

● Informs supported as of 4.1 (Inform proxy not supported)


● Full AES Encryption support in 4.1 (V3 related)
● Full IPv6 support In 4.2 (snmp engine transport)
● VRF-aware support in 3.3 (snmp engine, some MIBs already available)
● Across Cisco capability files not well supprtoed, ASR9K MIB guide developed to improve
situation
● Event/expression MIB support for extendibility as in IOS
● Warm standby on snmp agent
● Management plane protection (mpp) / snmp overload control to limit impact of snmp on
device
● Standards based MIB support (IETF & IEEE)

ENTITY-MIB
IF-MIB
IP MIBs support
Routing MIBs support (BGP, OSPF, ISIS, etc)
MPLS, Pseudowire, VPLS MIBs support
IEEE 802x (LAG, CFM, OAM)

SNMP performance improvements


● Asynchronous request processing / multithreading (4.2)
● Bulk processing (dedicated processing path for bulking) (4.2)
● Data Collection Manager – bulk MIB data collection and file push (4.2.0 & 4.2.1)
● Additional IPv6 / VRF aware MIB support (4.2 and after)
● Additional improvements with Async IPC and SysDB Backend infra (4.1)
● Overload Control Integration (4.0)

SNMP request processing blocked during critical event periods (i.e. OSPF convergence)

Debugability:
● Additional PDU performance monitoring support (4.2)
● MIB guide update (4.2)

Caching

Caching is an integral part of IOS XR SNMP processing allowing it to perform at best


performance while maintaining the most accurate stats possible.

There are various levels of caching and some of them are configurable, some of them are not.
The reason why we cache is also to alleviate the hardware from the burden of getting
continuous requests, especially in WALK scenarios retrieving many requests for eg interface
stats counters.

There is a process called STATS-D which is a proc running on the linecard that periodically
scrapes statistics of the linecards hardware and updates the interface counters and MIB stats.

This means that if you poll within the stats-D update time, you'd realistically see the same
counter being returned twice.

Show interface commands (depending on release) will force a direct update read from
hardware to get the most accurate reading, but the IF-MIB stats are cached.
1.The SNMP UDP transport receives sends a SNMP GetRequest-PDU, GetNextRequest-PDU or
GetBulk-PDU to the SNMPD.
2.The SNMP Engine parses the PDU and dispatches the individual variable bindings. IF-MIB
objects are dispatched to mibd_interface process & IF-MIB DLL callbacks get invoked.
3.If the request is a getnext, the IF-MIB’s cache of variable bindings is checked to see if
there is a cache hit. If so, the value is returned to the engine and the response PDU is
sent. ***look-ahead cache

4.If no cache hit, the IF-MIB passes a message to the statsd_manager process to get the
information for the interface (and the next 99 interfaces for the cache in the
getnext case). IPC = LWM The sysdb direct EDM connection invokes the EDM for
statsd.
5.The statsd_manager gets the interface data from its cache and returns the statsd bags
for the interfaces to IF-MIB.

Visualizing caching differently:

Two caching mechanisms:

1. Statsd caching
2. Lookahead caching
1: Statsd caching:
Used for interface related statistics (IF-MIB, IF-EXTENSON-MIB, etc.)
Statsd caching is configurable (via CLI).

2: Lookahead caching:
Conceptually a varbind cache.
Not configurable.
Not all MIBs leverage/use this cache.

Statsd cache:
– Use command “snmp-server ifmib stats cache”* to enable it.
– This is a periodic cache which gets refreshed every 30 seconds for all interfaces.
– Statsd cache maintenance is done irrespective of this command. The command only
dictates from where to fetch the stats.
– Without the above command stats are actually fetched from the linecard, real-time
counters. (Default behavior).
• Involves more number of processes and hence more CPU utilization and
latency. Additional tax for real-time counters.

System maintains look ahead cache:


– Stats fetched for next 100 rows (interfaces) in bulk and cached.
***Data for up to 500 interfaces kept in cache
– Cache is maintained for a max of 20 seconds.
– Oldest used blocked is reused to maintain a new set of cache.
– There is no **** to enable/disable this cache.
– Provides good performance improvement if used along with statsd cache.

Parallel vs Serialized processing

The following picture tries to explain what the serialized processing means:

When an SNMP request is being received they are handled in a sequential manner. If one
request currently in progress is "slow", subsequent requests are waiting to be handled and
may time out.

The NMS station may resend its SNMP request building up the request queue potentially
causing more trouble.

Now the good news is in 431 we have the ability to detect duplicate requests and throw them
out of the queue making sure we're dealing only with "NEW" requests.
Enhancements in XR 4.1
Enhancements in XR 4.2
Example (performance) trace point logging
SNMP process architecture
● All management interfaces (SNMP, XML, CLI) utilize the same core processing architecture
[sysdb].
● The SNMP processing architecture serializes PDU processing (pre-4.2).
● Request PDUs for all pollers effect the response rate seen for a single poller.
● The SNMP per-OID polling rate is very MIB specific (each MIB’s underlying data model
dictates the performance of MIB’s OID access)
● MIB request processing commonly involves the GSP IPC mechanism, sysDB (data store) and
statsd in some cases.
● In band and out of band SNMP requests are treated the same within SNMP.
● (In band means that the SNMP request can be received on an interface that is also
transporting customer/user traffic. Out of band interfaces, such as the MGMT interfaces on
the RSP are dedicated for management and carry management traffic only).
● The current SNMP architecture has an SNMP daemon enqueue requests and separate MIB
daemons process requests (requests are enqueued from transport layer receive fairly quickly)
● There are multiple MIB-specific caching mechanisms in place to improve performance which
also complicate the polling rate calculations.
● There is no queue size limit for SNMP requests (grows with memory).
XR processes referenced

StatsD is a process that collects statistics from various places (eg hardware) and updates
tables on the LC shared memory.

IPC is an inter process call or communication that is used by processes to talk to each other
to request data or send commands.

GSP is group services protocol, which is a process in IOS-XR that allows for one process to
communicate with multiple "nodes" at the same time (like a sort of multicast way that the RSP
can use to talk to multiple linecards, for instance to update a FIB route).

SNMP tracepoints

“show snmp trace requests” is a sliding window of logs indicating the above information about
PDU processing
XR MIB implementation specifics
Implementations of specific MIBs packaged as individual DLLs. Each MIBd process “houses”
a group of MIB DLLs

Grouped according to the “type” of MIB—interface, entity, route, infra, at runtime, grouping
is determined via a config file in XR source control

MIB DLLs handle the specifics of mapping MIB defined data model to XR data model. MIB
DLLs map MIB namespace to XR data owner access

APIs (sysdb EDM is most common)

Look-ahead Caching—Any support for look-ahead cacheing is done within the MIB DLL. (No
generic support for all MIBs)

Non-look-ahead cacheing—Some features may support access to cached managed data.


These are accessed via separate data access point (ie. separate sysdb EDM path)
Troubleshooting commands and what they do

The following show and debug commands are very powerful to verify and track SNMP.

Global agent counters—incoming, outgoing (request and trap), &


error PDUs
show snmp Technique:
- Periodically collect output to determine overall PDU response
rate and identify error rate.
Log of high level PDU processing tracepoints—Rx, Proc Start,
Tx time
Technique:
Periodically collect this log. Decode and use the data to
determine the following per-PDU data:
1.Source IPs of pollers
show snmp trace requests 2.Queue lengths of per-source IP PDU queues
3.Types of request PDUs being used
4.Timestamp when PDUs are enqueued into the queues for
the source IPs
5.Duration of the PDU enqueued & waiting to be processed
6.Processing time of PDUs from pollers

Per-OID counters indicating the number of times an operation


was done on that OID, ie. GET, GETN, SET.
show snmp mib access Technique:
Periodically collecting & diff will indicate what was polled during
the time periods.
Per-OID timestamp of the last operation on the OID.
show snmp mib access time Technique:
Periodically collecting & diff will indicate if any polling on the
OID was done in the time period.
Enable to log every OID being processed by every PDU to
syslog. Need to enable “debug snmp packet” as well to identify
debug snmp request
source of PDUs.
(careful!!!)
NOTE: Disable “logging trap debug” if “snmp trap syslog” is
configured!!!
Enable to log same data as “sh snmp trace requests” to syslog.
debug snmp packet
NOTE: Disable “logging trap debug” if “snmp trap syslog” is
(careful!!!)
configured!!!

Show commands that are new to XR 4.2 onwards

Per-OID statistics summarizing transaction times within the mibd


level—count + min/max/avg .
Technique:
show snmp mib statistics
Collect to determine if specific MIB objects are averaging high
processing times and/or large variance (low min, high avg &
max).
Indicates the min/max/avg queue sizes for the PDU receive and
show snmp queue rx
pending queues. Real-time and 5min views.
Indicates the min/max/avg queue sizes for the internal trap PDU
show snmp queue trap
queue
Allows configuring a duration threshold for logging per-OID
(config)# snmp logging transactions exceeding the time threshold.
thresh oid This is measured within the mibd process beginning with the call
show snmp trace slow oid to the MIB specific handler for the OID and ending with the
response from the same.

Allows configuring a duration threshold for logging per-PDU


transactions exceeding the time threshold. When logging all
(config)# snmp logging
OIDs within the PDU are also logged to this buffer.
thresh pdu
This is measured within the snmpd process beginning with the
show snmp trace slow pdu
dequeue of the PDU from the receive queue and ending when all
the OIDs in the PDU have been processed and the response is
ready to be sent.

Troubleshooting PDU performance issues


Some MIBs dont have accelerated processing or dont have caching and because in certain
releases SNMP is processed serially, it could happen that you'll see timeouts on OID requests
that are normally operating perfectly fine. An example of a slow MIB is the SONET MIB.
Because this mib needs to talk from the SNMP process all the way down to the SPA of the SIP-
700 linecard (on the ASR9000), the response may not be provided in a timely manner. At the
same time new requests for other OID's may be in the holding or pending queue causing
timeouts and retries.

Retries to an already under performing MIB may exacerbate the overal issue.

The vast majority of PDU performance issues are related to a poller polling a specific MIB
which is slow to process its OIDs.

This causes all other pollers to see some of their PDUs slowed due to queueing delays (waiting
on slow MIB)

Troubleshooting Goal:
Identify the slow MIB/MIBs being polled
Workarounds
Use SNMP View Access Control to block access to the slow MIB tables / objects
snmp-server view MyPollView <slow MIB OID> excluded
Use ACLs to permit only “known” NMS devices/applications . In this case “known” is
referring to content of requests issued from the app

Determining Internal Timeout of a MIBd


snmpd will timeout a mibd process if it has not received a response to a request for an
OID/s within 10s by default.
Once in timeout state, snmpd will continue processing requests BUT it will mark the mibd
as unavailable until it responds to the timed-out request.

● Getnext operations to any OIDs for MIBs in the timed out mibd will skip to the lexi-next OID
owned by a different mibd process.
● Get/Set operations to any OIDs for MIBs in the timed out mibd will be responded to with a
PDU error-code of “resourceUnavailable”.

Troubleshooting

(in addition to normal “slow OID” techniques):

If able to catch mibd in the state:


run attach_process –p <PID of mibd process> -i 5 –S
May be possible to identify the MIB being polled via examining “show snmp lib group agent
ipc” for “request timeout” to get the timestamp for when
the mibd timeout is detected.
Using the timeout timestamp, “sh snmp mib access time” may still have an OID
timestamp correlating to 10s prior.

Examples and Recommendations


For the purpose of clarification the following is an example of an snmp table. The columns
(vertical) represent the instance or entity, and the rows represent the objects. In this case we
have 3 instances 1, 2 and 3, and each instance has 3 objects, ifName, ifInOctets and ifMtu.

ifIndex ifName ifInOctets ifMtu


1 Ethernet1/0 1234 1500
2 POS2/0 512 500
3 Serial3/0 235 600

The customers current snmp design is using snmpwalk. Snmpwalk works by performing a
sequence of get-nexts, but on a column by column basis if the column object is specified as
the starting point.

An example of a column walk specifying the ifDescr from IF-MIB

[no-sense-1 68] ~ > snmpwalk -c public 10.66.70.87 IF-MIB::ifDescr

IF-MIB::ifDescr.1 = STRING: Loopback0

IF-MIB::ifDescr.2 = STRING: Bundle-POS1

IF-MIB::ifDescr.3 = STRING: Bundle-Ether1

IF-MIB::ifDescr.4 = STRING: TenGigE1/2/0/0

IF-MIB::ifDescr.5 = STRING: TenGigE1/2/0/1

IF-MIB::ifDescr.6 = STRING: SONET0/2/0/0

IF-MIB::ifDescr.7 = STRING: SONET0/2/0/1

IF-MIB::ifDescr.8 = STRING: SONET0/2/0/2


IF-MIB::ifDescr.9 = STRING: SONET0/2/0/3

IF-MIB::ifDescr.10 = STRING: SONET0/2/0/4

<cut>

Snmpwalk can also be used to get a single object only, for instance, the object IF-
MIB::ifDescr.9. It does not support the ability to specify any more than 1 object in its request.
The example below shows two objects being requested, but only the first returned.

[no-sense-1 69] ~ > snmpwalk -c public 10.66.70.87 IF-MIB::ifDescr.9

IF-MIB::ifDescr.9 = STRING: SONET0/2/0/3

[12:18 - 0.31]

[no-sense-1 70] ~ > snmpwalk -c public 10.66.70.87 IF-MIB::ifDescr.9 IF-MIB::ifDescr.10

IF-MIB::ifDescr.9 = STRING: SONET0/2/0/3

[12:18 - 0.36]

For efficiency row traversal is preferred, with multiple objects requested in a single snmp
transaction. This reduces unnecessary overhead on the XR system. For this reason snmpwalk is
not recommended.

Examples of row traversal

The customer is currently requesting via snmpwalk the following IF-MIB objects

ifDescr

ifHCInOctets

ifHCOutOctets

ifHCInUcastPkts

ifHCOutUcastPkts

ifInNUcastPkts
ifOutNUcastPkts

ifInOctets

ifOutOctets

ifInUcastPkts

ifOutUcastPkts

The preferred method is to specify all the objects required from an instance/entity in a single
command such as get-next or bulk-get. An example follows using snmpbulkget

[no-sense-1 115] ~ > snmpbulkget -v 2c -c public 10.66.70.87 IF-MIB::ifDescr IF-


MIB::ifHCInOctets IF-MIB::ifHCOutOctets IF-MIB::ifHCInUcastPkts IF-MIB::ifHCOutUcastPkts IF-
MIB::ifInNUcastPkts IF-MIB::ifOutNUcastPkts IF-MIB::ifInOctets IF-MIB::ifOutOctets IF-
MIB::ifInUcastPkts IF-MIB::ifOutUcastPkts

IF-MIB::ifDescr.1 = STRING: Loopback0

IF-MIB::ifHCInOctets.2 = Counter64: 0

IF-MIB::ifHCOutOctets.2 = Counter64: 7116596

IF-MIB::ifHCInUcastPkts.2 = Counter64: 0

IF-MIB::ifHCOutUcastPkts.2 = Counter64: 99611

IF-MIB::ifInDiscards.2 = Counter32: 0

IF-MIB::ifOutDiscards.2 = Counter32: 0

IF-MIB::ifInOctets.2 = Counter32: 0

IF-MIB::ifOutOctets.2 = Counter32: 7116596

IF-MIB::ifInUcastPkts.2 = Counter32: 0

IF-MIB::ifOutUcastPkts.2 = Counter32: 99611

IF-MIB::ifDescr.2 = STRING: Bundle-POS1

IF-MIB::ifHCInOctets.3 = Counter64: 38796828

IF-MIB::ifHCOutOctets.3 = Counter64: 66076323


IF-MIB::ifHCInUcastPkts.3 = Counter64: 331833

IF-MIB::ifHCOutUcastPkts.3 = Counter64: 402546

IF-MIB::ifInDiscards.3 = Counter32: 0

IF-MIB::ifOutDiscards.3 = Counter32: 0

IF-MIB::ifInOctets.3 = Counter32: 38796828

IF-MIB::ifOutOctets.3 = Counter32: 66076323

IF-MIB::ifInUcastPkts.3 = Counter32: 331833

IF-MIB::ifOutUcastPkts.3 = Counter32: 402546

IF-MIB::ifDescr.3 = STRING: Bundle-Ether1

<snip>

Note above that all the objects in a row for all instances (columns) are obtained with one
command. The same can be done with a get-next, however the added overhead of including
the instance must be used for each instance present.

[no-sense-1 120] ~ > snmpgetnext -v 2c -c public 10.66.70.87 IF-MIB::ifDescr.1 IF-


MIB::ifHCInOctets.1 IF-MIB::ifHCOutOctets.1 IF-MIB::ifHCInUcastPkts.1 IF-MIB::ifHCOutUcastPkts.1
IF-MIB::ifInNUcastPkts.1 IF-MIB::ifInOctets.1 IF-MIB::ifOutOctets.1 IF-MIB::ifInUcastPkts.1 IF-
MIB::ifOutUcastPkts.1

IF-MIB::ifDescr.2 = STRING: Bundle-POS1

IF-MIB::ifHCInOctets.2 = Counter64: 0

IF-MIB::ifHCOutOctets.2 = Counter64: 7116596

IF-MIB::ifHCInUcastPkts.2 = Counter64: 0

IF-MIB::ifHCOutUcastPkts.2 = Counter64: 99611

IF-MIB::ifInDiscards.2 = Counter32: 0

IF-MIB::ifInOctets.2 = Counter32: 0

IF-MIB::ifOutOctets.2 = Counter32: 7116596

IF-MIB::ifInUcastPkts.2 = Counter32: 0
IF-MIB::ifOutUcastPkts.2 = Counter32: 99611

[13:03 - 0.35]

Although the examples are specific to IF-MIB, the same concept is relevant to all MIBs.

Timeout and Retry Setting on NMS

Timeout recommenations:

1. use dynamic timeout when available


2. if dynamic timeout is not available, increase timeout if more management applications are
simultaneously polling the SNMP agent on asr9k. Multiply the default timeout by the number
of applications that are simultaneously polling the SNMP agent on asr9k.

Retry recommendations:

1. use dynamic retry when available


2. if dynamic retry is not available, establish number of retries based on testing

For more details refer to IOS XR SNMP Best Practices.

Related Information
● Monitoring power supplies via SNMP, technote https://supportforums.cisco.com/docs/DOC-
21667

Special thanks to the XR SNMP dev team for some of the amazing content used in this article,
most notably Timothy Swanson and Leon Zachary

Xander Thuijs CCIE #6775

Principal Engineer, ASR9000

Rating
1
2
3
4
5

Overall Rating: 0 (0 ratings)

Comments
● Collapse all
● Recent replies last

Scott Ulmen Fri, 09/11/2015 - 14:41

Once again on the 'sh snmp mib statistics' COUNT field. Per Xander's reply below, "its
indicative of the number of queries it had received on that mibD:". In the most simple way,
what I am wondering is whether that count increments even tho there are now OIDs being
excluded via "snmp-server view view_name...".

My follow up question (as I assume the answer is yes): Is there a way to determine ONLY what
OIDs are actually replying to requests? After applying a quite lengthy list of OID excludes it
would be nice to have some sort of verification that things are working as expected from the
router POV.

Thanks again!

See More
Scott Ulmen Fri, 09/11/2015 - 09:22
Good morning-

I'm looking for some advice/clarification on changes that I'm looking to make. Let me start with a little
background.

Our SNMP environment has multiple platforms (a mix of IOS and IOS XR systems) - unfortunately we
have little access to in regards to the configuration of the various SNMP management systems polling
the network. This is putting a strain on the IOS XR platforms such that much of the needed information
is not getting out (due to time outs). We have pulled 'sh snmp mib statistics' and made a list of higher
level OIDs that we would like to start with in an attempt reduce this strain (see attached). Our main
priority (a must have) is the IF MIB (access to IF statistics) we need this to manage the network. Our
study revealed 72 different MIB families based on grouping the polled OIDs by the first 5 or 6
descriptors. We find the requested information ranges from ATM to IPV4/6 to CFM... many of which are
outside of our primary objective (Interface stats).

My questions are as follows:

* Are any of the OIDs that are on my "no" list likely to be polled internally by the system? i.e. Is there
a chance / risk that I may break a service on the 9k by excluding any of the OIDs in question?
* Is there a way that we can quickly determine if any of the OIDs are NOT supported in the IOS-XR Mib
set?

As always, thanks in advance for any replies!


Attachment:

partialoid_review.xlsx

See More
Alexander Thuijs Fri, 09/11/2015 - 09:56

hi scott: the system itself wont poll info, only an "external" snmp get will induce access to the
mib (dll's) inside.

as for the oid support, the mib list shows which mibs we are specifically testing against per
release:

ftp://ftp.cisco.com/pub/mibs/supportlists/asr9000/asr9000-supportlist.html

if a mib is not supported it wont return a value.


you can also use the mib"views" to restrict access to certain mibs if you like with the snmp-
view commands.

cheers

xander

See More

Scott Ulmen Fri, 09/11/2015 - 12:48

Xander-

I want to thank you again for your prompt responses! That was just the type of response I
was looking for. Have a great weekend.

See More
Scott Ulmen Thu, 09/10/2015 - 10:41
I am looking at the output from "show snmp mib statistics" and wonder about the "COUNT"
column. Does anyone know what timeframe this reflects? i.e. Is this since reboot, last 15min,
etc?

My output gives me a couple OIDs in the 1.3.6.1.4.1.9.10.106.1...group that show an avg


response time >100000ms, but only show a 1 for count.

Thanks in advance for the assist!

See More
Alexander Thuijs Thu, 09/10/2015 - 10:52

hi scott! its indicative of the number of queries it had received on that mibD:

example:

RP/0/RSP0/CPU0:A9K-BNG#proc restart mibd_interface


Thu Sep 10 13:47:44.450 EDT
RP/0/RSP0/CPU0:Sep 10 13:47:44.479 : sysmgr_control[65900]: %OS-SYSMGR-
4-PROC_RESTART_NAME : User root (con0_RSP0_CPU0) requested a restart of process
mibd_interface at 0/RSP0/CPU0
RP/0/RSP0/CPU0:A9K-BNG#show snmp mib statistics
Thu Sep 10 13:47:48.901 EDT
Object ID COUNT AVG[ms] MAX[ms] MIN[ms] MAX_TS
.....

Group:interface <<< no queries now!


RP/0/RSP0/CPU0:A9K-BNG#

running a walk:

......

20260: ifNumber.0 (INTEGER) 188


20261: ifNumber.0 (INTEGER) 188
20262: ifNumber.0 (INTEGER) 188
20263: ifNumber.0 (INTEGER) 188
***** SNMP QUERY ABORTED *****

Group:interface
1.3.6.1.2.1.2.1 20264 <1 <1 <1 Sep 10 13:48:25.201

ok one off :) but good enough right :)

xander

See More
Scott Ulmen Thu, 09/10/2015 - 11:18

WOW - thanks for the quick response!

OK, so it looks like that increment then continues to count up since the process was last started.
That helps a lot. Assuming that I have been running for some time, the following shouldn't be an
issue?

****SNIP****

1.3.6.1.4.1.9.10.106.1.2.1.3 1 100006 100006 100006


1.3.6.1.4.1.9.10.106.1.2.1.4 1 100007 100007 100007
1.3.6.1.4.1.9.10.106.1.2.1.5 1 100007 100007 100007
1.3.6.1.4.1.9.10.106.1.2.1.6 1 38913 38913 38913

****SNIP****

See More

Alexander Thuijs Thu, 09/10/2015 - 11:59

10.106 is the pw mib.

this mib is known to have some performance issues and I think only recently we are adding it
back into service for limited scale.

the query time for this one event that it got was 100secs that is too much of course, but
depending on release you have you may see this mib not supported at all or some perf issues.

xander

See More
Benjamin Bartsch Wed, 06/17/2015 - 13:28

Xander - great to see you at Cisco Live and I appreciate you taking the time to meet with us
to discuss BNG.

Quick question for you on SNMPv2c on v5.1.3. What is the command to show the 'hits' on the
ACL? I can't seem to find the ingress location to reference. My customer's SNMP server is
unable to poll the 9k, but he is receiving traps from the 9k. I suspect there is a configuration
issue on the SNMP server somewhere. We can ping between loopback0 on the 9k and the
A.B.C.D address on the server. I'm not seeing any packets coming in to the 9k from the SNMP
server.

Here is my config:

snmp-server host A.B.C.D traps version 2c [string]


snmp-server community [string] RO SNMP_ACL
snmp-server traps
snmp-server trap-source Loopback0

ipv4 access-list SNMP_ACL


10 remark SNMP_ACL
20 permit ipv4 host A.B.C.D any
1000 remark ** DENY EVERYTHING ELSE **
1010 deny ipv4 any any

RP/0/RSP0/CPU0:CORE-1#sh snmp
Wed Jun 17 15:21:27.144 CDT
Chassis:
0 SNMP packets input
0 Bad SNMP version errors
0 Unknown community name
0 Illegal operation for community name supplied
0 Encoding errors
0 Number of requested variables
0 Number of altered variables
0 Get-request PDUs
0 Get-next PDUs
0 Set-request PDUs
1265 SNMP packets output
0 Too big errors (Maximum packet size 1500)
0 No such name errors
0 Bad values errors
0 General errors
0 Response PDUs
1265 Trap PDUs

SNMP logging: Enabled


Logging to Notification host: A.B.C.D, udp-port: 162
Trap Statistics
---------------
Number of pkts in Trap Queue: 0
Maximum length of Trap Queue: 100
Number of pkts sent: 636
Number of pkts dropped: 629

Inform Statistics
-----------------
Number of Informs sent: 0
Number of Informs retries: 0
Number of Informs pending: 0
Number of Informs dropped: 0

See More
Alexander Thuijs Wed, 06/17/2015 - 13:37

hey Ben!! yeah that was really awesome!! enjoyed our bng talk very much also!

say for this issue, do you have MPP configured? (management plane protection).

if so, it may be necessary to enable the interface that receives the snmp request to have
snmp allowed. (check the local packet transport services document for more detail or the cisco
live 2014 sanfran id 2904 for more details on LPTS).

show access-list should report it, but it sounds like the requests are not even coming in!

if the acl is removed do oyu see the requests coming in? if still not it must be MPP related, so
lets try to add the interface to the mpp section and enable that interace to allow inband snmp.

cheers
xander

See More

Benjamin Bartsch Wed, 06/17/2015 - 14:18

You nailed it - this customer is doing in band management and I never opened up SNMP in
MPP. Once SNMP peer is added to the backbone interfaces in MPP we see ACL hits and SNMP
working. The security works too well on the 9k!

Thanks again, Xander.

See More
Alexander Thuijs Wed, 06/17/2015 - 14:47
super! great we found it!

ha yeah, that lpts works pretty well!

cheers!

xander

See More

Abraham Camacho Mon, 11/10/2014 - 10:49

Hi Xander,

We have a brand new ASR9K and I was wondering if we should monitor this device in the
same way we do with our 6500, 4900, 7500, etc. I mean, monitor the interfaces status, drop
packets counters, etc; or we should consider another important variable based on the
complexity added by this kind of device. Do you have any recommendations about this you
could provide me?

thanks

Abraham

See More
Alexander Thuijs Mon, 11/10/2014 - 12:14

hey abraham,

you can, but you probably don't want to. Let me explain:
in the XR sw account for drops differently then in IOS, so if you have a KPI that looks at
ifdrops to determine an issue, XR will erroneously generate trouble; in XR we accumulate ANY
drop under that SNMP counter (eg policer drop, acl drop etc).

Same as with cpu and memory, since XR manages mem same was as Linux or OSX, the free
mem is not necessarily a point of concern, since some mem is marked as "ready for use",
eventhough not marked free as such.

Managing cpu and mem for IOS to XR is not the same, and that is most important. Best
practices? Hard to define, this is Linux based, so you need to "manage" that accordingly.

cheers

xander

See More
Abraham Camacho Mon, 11/10/2014 - 13:27

Hi Xander,

Thanks a lot for your answer. One final question based on your previous reponse. Does
Monitoring Interfaces + Traffic also have the same "variation" for this platform in comparison
with regular Cisco Routers?

regards

Abraham

See More

Alexander Thuijs Tue, 11/11/2014 - 04:49

It used to, but I want to say that in the recent codes such as 513 the implementation is rather
robust. There is a delta between the interface CLI counters and the snmp counters, this simply
because of the caching, the cli pulls it directly from the hardware, while snmp reads it out of
the cache that gets periodically updated.

regards!!

xander

See More

amaged Mon, 11/10/2014 - 10:56


Hi Abraham,

There's a lot to monitor, some guidance in the below links. Basically you should first decide
what is important for you to monitor (a healthy system and services) then translate that to
OIDs.

Implementing SNMP on Cisco IOS XR Software

http://www.cisco.com/en/US/docs/routers/asr9000/software/asr9k_r4.2/system_management/confi
guration/guide/b_sysman_cg42asr9k_chapter_01100.html

SNMP Server Commands on Cisco IOS XR Software

http://www.cisco.com/en/US/docs/routers/asr9000/software/asr9k_r4.2/system_management/com
mand_reference/b_sysman_cr42asr9k_chapter_01101.html

SNMP Object Navigator:

http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=

MIB Guide:

http://www.cisco.com/en/US/docs/routers/asr9000/mib/guide/asr9kmib.html

General guide to supported MIBs:

http://www.cisco.com/en/US/docs/routers/asr9000/mib/guide/asr9kmib3.html

Specific MIBs for 4.2.0:

ftp://ftp-sj.cisco.com/pub/mibs/supportlists/asr9000/asr900-
-supportlist.html#Supported_and_Verified_MIBs_XE_4_2_0

CISCO-PROCESS-MIB information:

ftp://ftp.cisco.com/pub/mibs/v2/CISCO-PROCESS-MIB.my
http://tools.cisco.com/Support/SNMP/do/BrowseMIB.do?local=en&step=2&submitClicked=true&mi
bName=CISCO-PROCESS-MIB

Collecting CPU utilisation on Cisco IOS devices using SNMP:

http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a0080094a94.shtml

Determining free and the largest block of contiguous memory usage on Cisco IOS devices with
SNMP:

http://www.cisco.com/en/US/customer/tech/tk648/tk362/technologies_tech_note09186a0080094a
95.shtml

Performance Monitoring feature offers CPU, memory, bgp, ldp and interface monitoring:

http://www.cisco.com/en/US/docs/routers/asr9000/software/asr9k_r4.2/system_monitoring/configu
ration/guide/b_sysmon_cg42asr9k_chapter_0110.html

See More
Alexandr Gurbo Tue, 07/08/2014 - 03:46

Xander,

My question about the periodic MIB data collection and transfer feature is also known as bulk
statistics on asr9k. More detail about Bulk Statistics Transfer Options.

Is it possible to download the data to the server that is located in the VRF?

See More

Alexander Thuijs Tue, 07/08/2014 - 03:59

hi there,

the destination location is in the form of a url, and in 41 we added the vrf capability to that in
this way:

ftp://<ftp-server-ip>;<vrf-name>//<pie-path >

but this was specifically for the install add, so I am not sure if this is going to work also for
the snmp bulk stat url. I haven't tried it myself though.

regards

xander

See More
Alexandr Gurbo Tue, 07/08/2014 - 04:35

Xander,
Thank you for quick reply. I tried. For snmp bulk stat url doesn't work vrf-aware mechanism.
May be in future IOS XR releases will be support vrf-aware feature.

See More

Alexandr Gurbo Tue, 07/08/2014 - 06:04

Xander,

I found with you help that it works for tftp.

url primary tftp://192.168.x.x;Mgmt/

without filename after last slash. All other combinations doesn't work for tftp.

For ftp doesn't work in any my combinations. With anonymous ftp, with authenticated ftp user,
with/without filename after last slash, with "ftp client vrf Mgmt username/password".

See More
Alexander Thuijs Tue, 07/08/2014 - 10:58

do you see the connection attempt going out when using FTP or there is nothing happening?
Looking at the ddts that provided support for that ;<vrf> it tells me this is in the base infra
that takes a url input, hence should also be applicable to this CLI in question.

With that a case/ddts is probably warranted.


Also see if it makes a difference using a mgmt interface on the RSP vs a fabric enabled
interface on a linecard.

thanks for the testing btw!

cheers

xander

See More
davecs Sun, 01/05/2014 - 16:59

4.3.4 doesn't solve our snmp timeout issue.

here is the strange thing - we have 4 ASR's, a mix of 4.2.3 and 4.3.4, every 30 mins or so
each ASR will stop responding to snmp queries.

and not at the same time, ie at *:15 and *:45 the first ASR will stop responding, then at *:20
and *:50 the second ASR will stop responding, I am querying both ASR's from multiple clients,
just incase its a client issue and the same thing occurs on both clients.

its almost like there is some sort of 30 min garbage collection timer that blocks snmp queries
to IFMIB but not say the hostname mib (tested).

also I havn't confirmed this, but the more interfaces the ASR has the longer the outage, eg the
4.3.4 ASR which has maybe 100 interfaces has an snmp outage for about 1 minute, whereas
the the 4.2.3 ASR has 400 interfaces seems to have a few minutes of outage time...
btw - icmp/ping to each is working properly (i have a script which runs every second, first
pings and then snmpget's, the ping always works, the snmpget stops every 30 mins or so)

See More
Alexander Thuijs Mon, 01/06/2014 - 04:14

thanks for that notification davecs, I did not expect that in 434 where we have put a lot of
improvements in place.

the "outage time to number of interfaces" seems suspicious and a resurrection of a previous
problem that I thought

was fixed.

I may need to ask you to file a tac case to have this investigated in depth asap so we can
identify the reasoning for this issue and address it if necessary.

regards
xander

See More
davecs Mon, 01/06/2014 - 22:06

Yeah I logged a TAC case and they told me to use "snmp-server ifmib stats cache" (and a restart
of snmpd) which worked straight away on 4.3.4, but didn't work at all on the other three ASR's running
4.2.3.

BUT I just had another look and my cacti graphs started looking gappy again, and log show snmp
timeouts...

Something interesting, I also graph CPU, when I initially enabled stats cache (on 4.3.4) one of the four
CPU basically went idle instead of peaks every 30 mins, now that the graph has returned to "gappy
mode" that CPU is back to it's 30 min peaky behaviour.
See More
Alexander Thuijs Tue, 01/07/2014 - 05:36

davecs,

yup leveraging the cache is key for good snmp performance.

the 30 min spikes you see, needs to be correlated to something, it could be the snmp process,
it could be something else periodic.

also it may help to increase the timeout on your snmp requests, especially when long mib
walks are done.

considering you have a tac case open, perhaps we can drive it through that channel, and we
can report back on thsi discussion when the situation is resolved for everyone's awareness.
otherwise we may be duplicating efforts potentially.

regards

xander

See More

davecs Tue, 01/07/2014 - 20:11

short answer = solarwinds topology mapping (occurs at a 30 min interval)

disable or change (ie max) the interval and it fixes the ifmib problems.

long answer = hopefully TAC/DE can work out what OID is causing snmp ifmib to hang

See More
fjordan Sun, 01/19/2014 - 02:55

davecs,

We have this exact same problem; gaps in cacti graphs.

We opened a TAC case and found the policer was limiting the SNMP queries and dropping
packets.

Do the following command for each location:

show lpts pifib hardware police location 0/0/CPU0

If you have drops listed for SNMP you are being policed by the default policy.

The default of 300pps on the SNMP LPTS policer was causing our drops.

We are considering increasing this from 300pps to a higher value.

Would be interested in if this is your problem as well.

You can email me off line at fmjordan@fsu.edu

Thanks, and hope this helped.

See More
davecs Sun, 01/19/2014 - 14:48

interesting and good to know about this command.

but no there are 0 drops for anything in that list

i believe the problem to be that solarwinds is trying to grab routing tables from the ASR, and
perhaps the ASR is blocking on this request...

See More

eric.follos Fri, 10/25/2013 - 07:10

Hello Xander,

I have a problem where our NMS shows intermittent SNMP Poll failures with our 9ks. Issue
occurs every couple hours or so, at seemingly irregular intervals. Polling to other IOS device
types (6509, 3750, 7206) works ok. We are running XR 4.3.1.

When I try to ping the server from a 9k with a series of 10k pings I don’t see any packet loss.

Tried to explore slow oid but we don’t always have slow requests coincide with the poll fail.

'show snmp trace request' shows pdu requests with the same src ip/port and occasionally same
req_id. After an SNMP poll fail if I run the command 'show snmp trace dupdrop' it reflects Dup
Count increasing. Is there a way to disable on the 9k the ability to detect duplicate requests?

Regards,

Eric

See More
Alexander Thuijs Fri, 10/25/2013 - 07:53

hi eric,

yeah there are some perf issues in 431 that we have uncovered. although there are
improvements in 431 over 423, we were not done there. So in 432 you'll see better results in
especially in 434. we have put a lot of testing effort into the snmp operations and
discovered/implemented many improvements. I dont know right off hand if we smu'd things in
431.

the issue you are running into is likely not network connectivity or anything but just the snmp
processing inside that is the culprit.

you cant enable dupe request checking, that is one of the performance improvements, if we
are alrady working on a request and getting the retry in, it would only excarbate the issue if
we enqueue the same request again.

for sure enable the caching if not already, and try to stay current on the smu's pertaining to
snmp.

when XR434 comes out in December consider that as a goto release if you're planning an
upgrade.

regards

xander

See More
davecs Wed, 10/09/2013 - 19:07

thanks xander.

if you can you let me know the bugid that relates to that optics issue that would be great!
See More

davecs Wed, 10/09/2013 - 16:06

We do use RSP440's, but IOSXR-4.3.1 has the same problem, and I am guessing that 55767 is
already included in 4.3.1.

We do use tranceiver permit pid all for some non-working Cisco GE-T's, is there a fix coming for either
the SFP's or the IFMIB stats for these ports?

Also with the above document, how do I give myself the correct priveleges to run those commands? Do
I need to login as cisco-support or create an account in that group or something?

See More
Alexander Thuijs Wed, 10/09/2013 - 16:17

yeah the umbella smu fixes are already in XR43.

I dont have the ddts handy for the optic situation I was referencing, but it is going to be a
smu for 432 and integrated in 434 (not out yet).

to run the commands, you need cisco-support privileges, int eh username <name> config in
either admin or exec config you would need to add the group cisco-support, logout and in
again to get that task group applied.
show user tasks shold then be the full list and show user group will tell you that group
membership.

xander

See More

davecs Tue, 10/08/2013 - 19:54

As mentioned:

4.3.1 some SMU's installed, not relating to SNMP. However CSM says no SNMP SMU's are
available.

4.2.3 with CSCuf51534 installed, again CSM says no other SNMP SMU's are available.

Unless there is another SMU i should install?

See More
Alexander Thuijs Wed, 10/09/2013 - 05:45

Hi Dave, if you have an RSP440, you may need CSCug55767

it could be the sonetmib is acting you up if you do a walk. you could potentially exclude that
from your snmp veiw and see if that makes a difference. If you are targeting actual individual
OID's without walk this may not do it for you.

it might be good to get the perf stats from snmp as per document in that case and file a tac
case.

I see there is another smu in the pipeline for snmp performance also, but in the absence of
that logging I cant say if you'll benefit from those changes either.

Another thing might be that if you have optics that are enabled with service unsupported
tranceiver/tranceiver permit pid all you may be running into another known issue (if you're
pulling IFMIB stats).

regards

xander

See More
davecs Mon, 10/07/2013 - 22:55

I am not sure the retries and extended timeouts help.

Our Cacti box has "gaps" in the graphs, to find out whats going on I setup the following bash
script:

while [ $? -eq 0 ]; do sleep 1; date; snmpget -v 2c -c public -r 10 -t 5 10.1.2.130


.1.3.6.1.2.1.31.1.1.1.10.264
This script bombs out every 30 mins or so, across all three of our four ASR9k's (a combination
of 4.2.3 unpatched, 4.3.1, 4.2.3 with some SMU's)

Increasing the timeout (-t) from 1 to 5 seconds helped a little, but increasing to 10 seconds
didn't do much.. Also the retries (-r) didn't do much...

It doesn't seem to matter how many items we poll for either (for example 1 ASR has 28 graphs
(so maybe 40 individual OID's) and runs fine and another has maybe 10 OID's and has issues.

for the record the management network that is being used services an number of other
devices (srx650's and nexus7k's) and none of those show snmp dropouts...

quite frustrating to have gappy graphs

See More
Alexander Thuijs Tue, 10/08/2013 - 03:19

What version are you running Davecs? There have been a set of SNMP smu's out recently and
over time that fix a lot of these timeout issues.

The reason for the timeout I can't say with the info provided, some of the performance tricks
discussed in this article may help find out where the problem is, or you can try the SNMP
umbrella smu's that are out there now.
regards

xander

See More

Ganesh Kondaveeti Mon, 07/01/2013 - 01:18

Hi Alexander,

Just curious and want to know as based on the which RFC, SNMP architecture of IOS-XR had
been adopted?

Regards

K. Lakshmi Ganesh

See More
Alexander Thuijs Mon, 07/01/2013 - 05:51

I don't have a specifc RFC that defines what archticture is followed. In general we always
follow the specifications and exceptions are documented.

Is there a particular functionality or (mib) implemention you are interested in?

We have a good overview of which mibs are specifically tested against asr9000 listed here:

ftp://ftp.cisco.com/pub/mibs/supportlists/asr9000/asr9000-supportlist.html

Some platform independent MIBs, that have no hardware dependency, such as the OSPF mib
are maybe not listed, but definitely supported. The mibs in this list have some sort of HW
dependency hence specifically tested against asr9000.

regards

xander

See More
Ganesh Kondaveeti Mon, 07/01/2013 - 09:18

Hi Alexander,

Thank you for your reply.

Actually i was just going through the related RFCs and additionally we (HTTS-WW-NMS) are
going to support SNMP K/W for XR so was just going through these.

Informatively in IOS RFC 2571 was being followed for implementation of SNMP architecture. In Nexus
RFC 3411 (which obsoletes RFC 2571) is being used. So was just wondering as which RFC is being
followed in XR?

Any idea as who can comment on this definitively?


Thanks and Regards

K. Lakshmi Ganesh

See More

Alexander Thuijs Mon, 07/01/2013 - 10:46

We dont claim support for 3411 natively. But it is fully compliant with 2571.

What precise changes proposed from 3411 is it that you're interested in? because it may be
natively supported already without calling out 3411 official compliance.

cheers

xander

See More
Ganesh Kondaveeti Mon, 07/01/2013 - 12:26

Hi Alexander,

Thanks for your kind reply.

As i noted above, in personal capacity i am in study mode going about SNMP K/W so also
looking out the relevant RFCs to be thorough with. So as part of that only interested to know
w.r.t IOS-XR also. Informatively we did support SNMP K/W for IOS and NX-OS at the moment
and in transition to support XR also.

Regards

K. Lakshmi Ganesh

See More
Purwo Hidayat Wed, 05/22/2013 - 03:52

Hi Xander,

I have a problem for implementing snmp on asr9k.

Is "snmp-server host" mandatory for implementing snmp on asr9k?

I've configured only snmp-server community and snmp-server traps.

but if i type "sh snmp trace requests" command, it only show Rx and processing but no Tx.

May 22 17:06:29.981 snmp/requests 0/RSP0/CPU0 t10 Rx PDU from x.x.x.x,27129 len = 35 [Q


= 1]

May 22 17:06:29.981 snmp/requests 0/RSP0/CPU0 t1 Processing PDU from x.x.x.x,27129 req_id


= 1 (0ms on Q), type GETN

May 22 17:06:34.988 snmp/requests 0/RSP0/CPU0 t10 Rx PDU from x.x.x.x,27129 len = 35 [Q


= 1]

Please advice.

Regards,

Purwo

See More
amaged Sun, 06/02/2013 - 08:58

Hi Purwo,

You dont need :

snmp-server host
To be able to implement snmp, It is only to used to specify SNMP trap notifications, the version
of SNMP to use, the security level of the notifications, and the recipient (host) of
the notifications.

What you are seeing in show snmp trace is not what we expect to see, you should see both:

RP/0/RSP0/CPU0:PE2#sh snmp trace requests

Sun Jun 2 08:57:13.652 PDT

Entering snmp_ltrace main....

2107 wrapping entries (2112 possible, 0 filtered, 4078884 total)

May 20 19:10:59.194 snmp/requests 0/RSP0/CPU0 2947497# t8 Tx PDU to 1.73.54.10,34458


len = 49

May 20 19:10:59.196 snmp/requests 0/RSP0/CPU0 3760763# t9 Rx PDU from 1.73.54.10,34885


len = 46 [Q = 1]

Are you sure this is still the case and you are seeing the requests going from the NMS?

Regards,

Ahmed

See More
Purwo Hidayat Sun, 06/02/2013 - 20:51
Hi Ahmed,

Thanks for your kind respons.

It's doing fine. It seem i was wrong at configuring the ACL.

Regards,

Purwo

See More

https://supportforums.cisco.com/document/132706/asr9000xr-understanding-snmp--
nd-troubleshooting

Das könnte Ihnen auch gefallen