The Trouble With I

The trouble with I
v3
Dan Rautio
Copyright 2006 Juniper Networks, Inc.
Proprietary and Confidential
www.juniper.net
The trouble with I

Initial version 2/12/08
V2 add case study 8 and 9 3/5/08
V3 add case study 10 12/5/08
www.juniper.net
Overview
Type of problems
Symptoms Very Important
Which block of the ASIC?
Case study of past problems
Deep Dive for each symptom
Ichip Performance
Ichip enhancements
www.juniper.net
Type of problems
Transit traffic loss
PFE related problem
check out the ASIC
RE generated traffic is not affected
Normal to have transit traffic affected, but no problem

with RE generated packets
RE generated packets have the DT bit set to 1.
The RE doesnt need a next-hop lookup (Ir) since it
knows which interface to send the packet.
Also, RE will form the L2/L3 headers so dont need Iwo
to touch the packet either.
www.juniper.net
I chip packet flow
www.juniper.net
Symptoms Very Important!

Typical Symptoms
1. Interface flaps, far-end router receives all garbage.
Or, far-end dpc is reset, causes long FC to be sent to
other side.
10GE Interface getting flow controlled for more than 200

msec PR/231419, PR/250350, PR/103298, PR/104884,
PR/103597, PR/103712
2. After a restart routing, all multicast traffic stops.

Many other examples in the past with gimlet, but
would only cause illegal nh size and SRAM parity
errors. Ichip will wedge.
Incorrect Iwo_key (Lout_key) pointing to a RLDRAM (SRAM)

PR/240012, PR/258760
www.juniper.net
Symptoms Very Important! (continued)

3. Specific type (mpls->ipv4) of traffic stops working.
Incorrect packet length calculation for MPLS->IPv4 nexthop

traffic PR/251042
4. Specific type (ipv6) of traffic stops working.
Incorrect packet length calculation for IPv6 nexthop traffic

PR/105266
5. Large tunnel traffic stops working.
Tunnel ingress with fragmentation PR/237450
6. Complete packet loss between 2 PFEs
I3.0 Ichip fabric output queue PR/268274
7. DPC repeatedly crashes with JUNOS 8.5
IA FPGA DMA corruption PR/269699
www.juniper.net
Symptoms Very Important! (continued)

8. Interface stopped forwarding packets or IP
CRC packet error in syslog are reported
PR/277853, PR/27741
9. BIST Memory Error on ICHIP rldram
PR/255204
10. Trouble shooting LINK (SF fabric ports) error

messages to the proper FRU
PR/407207
www.juniper.net
Which ASIC block?

1. 10GE Interface getting flow controlled for more than
200 msec PR/231419, PR/250350, PR/103298,
PR/104884 , PR/103597, PR/103712
I pktrd packet reader age cells not detected
2. Incorrect Iwo_key (Lout_key) pointing to a RLDRAM

(SRAM) PR/240012, PR/258760
Ir sends a bogus Lout_key. Iwo receives the bad Lout_key

sends feedback to Imq/Ipktrd which will be bogus. Ipktrd
gets into a bad state.
3. Incorrect packet length calculation for MPLS->IPv4

nexthop traffic PR/251042
Iwo data buffer and Iwo SPI microcode packet length

calculation
www.juniper.net
Which ASIC block? (continued)

4. Incorrect packet length calculation for IPv6 nexthop
traffic PR/105266
Iwo microcode packet length calculation incorrect
5. Tunnel ingress with fragmentation PR/237450
Iwo microcode packet length calculation incorrect
6. I3.0 Ifo queue buffer PR/268274
Ipktrd fab icell buffer allocation incorrect
7. IA FPGA DMA corruption PR/269699
Ichip DMA code not protected from interrupts. This causes

DMA corruption.
www.juniper.net
10
Which ASIC block? (continued)

8. Traffic sent to Qs that are not configured PR/277853, PR/27741
Iwo wedge or Iwo CRC errors
9. Fail RLDRAM BIST, the memory is not

initialized correctly and can end up with parity
errors - PR/255204
BIST Memory Error on ICHIP rldram
10. Trouble shooting LINK (SF fabric ports) error

messages to the proper FRU
PR/407207
www.juniper.net
11
Case Study 1 10GE Interface getting flow

controlled for more than 200 msec
PR/231419, PR/250350, PR/103298, PR/104884 , PR/103597,

PR/103712
Reboot of far-end router, will assert FC to the DUT PR/103298
cFPC 10GE pic interface flap, send garbage to other end
PR/250350
Flow control for a few seconds PR/104884
Some of the time, get messages like these:
Apr 26 13:46:26 tomahawk fpc0 ICHIP(3): New crc errors in WO IP stream_id 0,
iwo_ip_poll_stream_stats
Since those events are mostly silent. On the remote

interface the adj. would just timeout.
www.juniper.net
12

If you have [OSPF] enabled then the interface which is

getting wedged would move into [OSPF 1-Way state] "neighbor is in one-way mode" and
if you have [isis] enabled the wedge interface would
move into [ISIS "init state"] due to "Not Seenself".
On BGP, you will likely see "NOTIFICATION 6" from the
remote BGP peer reset due to keepalive timer expired
The following ichip outputs show the pktrd is wedged
because of the following:
This state says it's PRQ and ICB buffer are not empty and the
packet read is not done, but WO is not receiving any cells.
www.juniper.net
13

RFEB0(reno-re0 vty)# sh ichip 0 ipktrd qstatus
WAN Queue Status
WAN_PRQ_MPTY - 0xfffffffe
WAN_ICB_MPTY - 0xfffffffe
WAN_PRD_DONE - 0xfffffffe
FABRIC Queue Status
FAB_PRQ_MPTY[0] - 0xffffffff
FAB_ICB_MPTY[0] - 0xffffffff
FAB_PRD_DONE[0] - 0xffffffff
RFEB0(reno-re0 vty) show ichip 0 registers pktrd wan
...
(0xf0830400)
pktrd.wan_prq_mpty:0xfffffffe
(0xf0832400)
pktrd.wan_icb_mpty:0xfffffffe
(0xf0833500)
pktrd.wan_prd_done:0xfffffffe
(0xf0834300)
pktrd.wan_dbf_org[0]:0x80000000
www.juniper.net
14

RFEB0(reno-re0 vty)# show ichip 0 wo statistics ip wan_stream 0

Iwo Input Processor Statistics:
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------Stream(0):

Input packets
322996428
75722
output packets
322996212
75722
ssmcst packets
fragmented packets
input drops
215
output drops
62736
146
> 2 cell crc drops
5357
27
<= 2 cell crc drops
4315
10
www.juniper.net
15

The above symptoms uncovered a hardware bug in the

Ichip
The Ichip issue is caused by aged icells that are not detected as
aged. If the packet memory write pointer wraps 4 times on a
Ichip with 512MB*3 of packet memory then a icell read request
in the pipeline before the flow control started may not be
detected as aged when the flow control is released.
The work-around for PR 231419 is to flush aged icells before
they can become undetected aged icells. There are 3 "virtual
address" bits in the packet memory address (2 bits if 512MB
DIMMs). An aged icell can become an undetected aged icell if
these 3 address bits are the same during the read as the 3 bits
currently being written by packet writer. That is, the packet
memory has be overwritten a multiple of 8 times.
www.juniper.net
16

The workaround is to sample traffic flow and if traffic is blocked

on a stream and if the time is getting near the undetected icell
aged time then flush the stream. Detecting traffic flow is done
using the flow state register in the WO_SPI block. If these bits
say no traffic has flowed during the sample period and there was
traffic pending then traffic is blocked.
The delay bandwidth buffer is at least 200msec, so overwriting 8
times requires at least 1.6 seconds of real time. The sample rate
must be at least twice this rate to avoid sample aliasing. Pick a flow
sample rate of 0.5 seconds to detect block traffic.
Will see the following syslogs when auto recovery is successful:
[Nov 1 11:46:50.478 LOG: Info] ICHIP (0) one or more stream blocked detected: 0x00000001
[Nov 1 11:46:50.678 LOG: Err] imq_q_waiting_queue_empty: phy_q 0, telapsed:200ms,
mu0:1491049, mu1:1491049, mu2:1491049, loop:42879
[Nov 1 11:46:50.678 LOG: Err] imq_q_disable_queue:failed, phy_q:0
[Nov 1 11:46:50.678 LOG: Err] ICHIP(0) imq_stream_disable_stream() failed to disable
physical queue 0 in stream 32
www.juniper.net
17
Case Study 2 Incorrect Iwo_key (Lout_key)

pointing to a RLDRAM (SRAM)
PR/240012, PR/258760, PR/228361

After a restart routing, multicast traffic stops working PR/240012
After a restart routing, VPLS traffic stops working PR/258760
Receive syslogs like the below:
Aug 24 10:08:07 Poseidon-RE0 fpc1 ICHIP(3): New illegal link
errors in WO DESRD lout_key 0x000008bc stream_id 11,
iwo_desrd_poll_stream_stats
www.juniper.net
18

PR/240012, PR/258760
Here are the errors on the ichip with this problem:
ADPC0(r16 vty)# show ichip 3 wo statistics
Iwo Statistics:
Summary
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------input packets
196629610
148810
149095
output packets
196629140
148810
149095
output bytes
20646461553
15625047
15654950
total drops
975
203
www.juniper.net
19

ADPC0(r16 vty)# show ichip 3 wo statistics desrd wan_stream 1

Iwo Descriptor Reader Statistics:
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------input packets
191808272
148810
149141
output packets
191808272
148810
149141
Stream(1):
id error drops
356
0
77
data error drops
0
0
0
oflow error drops
37
ADPC1(Poseidon-RE0 vty)# ... stream wan_stream 11 mu_mad

Stream 43 mas/mu/mad/hnq_ptr info:
queue
mas
mu
mad
hnq_ptr(0x)
------- -------- -------- -------- --------------0
24132
24026
000b75:000b71
www.juniper.net
20

It has been observed that if descriptor reader

encounters an incorrect next-hop database then
WO can wedge. PR228361
The cause for this is not understood, and no
test-bench has been created that can reliable
reproduce this.
Normally when the next-hop data base is
corrupted there are "illegal link" errors
reported.
While these errors are being reported WO has
occasionally been observed to wedge.
www.juniper.net
21

For PR228361, what we think is occurring is that the memory being

pointed to by a Lout key is modified by new next-hop entries. The
memory is released, but there are still packets in flight that point
at this memory. The memory is then reallocated to a new next-hop
and overwritten with new data. When the in flight packet reaches
WO the alignment is different so the old Lout key points at random
data.
You are correct in that it may be helpful for debugging if memory is
initialized with illegal descriptor tags (top 4 bits of zero). Also,
memory allocation could be changed to allocate the oldest freed
block first, hiding the bug probably forever.
Exactly what sequence of bad bits in the descriptors causes the
wedge is unknown. But since there is no software control over the
hardware state machine that sequences the descriptor reading it is
unlikely that any software workaround can be applied to prevent
wedging.
www.juniper.net
22
Case Study 3 Incorrect packet length
calculation for MPLS->IPv4 nexthop traffic
PR/251042, PR/240148
On M120 and MX-series routers and M320 Enhanced III FPCs, when an
MPLS-encapsulated IPv4 packet that is padded to meet the minimum Layer2 frame size (for example, 64 bytes for frames on Ethernet media) exits
an LSP, the egress interface might stop forwarding packets. This can
happen when the router is configured as a PE router in a VPN or is the
penultimate node of an LSP. To recover, reboot the FPC (on MX-series or
M320 routers) or the FEB (on M120 platform) that houses the affected
interface.
Root cause: don't trust the plen from the IP header since it's not sanity
checked for MPLS->IPV4 case. Use the notification plen to calculate the
dbuf size for requesting data from Iwo l23 to Iwo IP.
Incorrect DBUFSZ computed by the WO microcode can cause a wedge
(PR251042). Microcode programming requirement.
Output packets are assembled in the WO output block (wo_spi) by pulling
the headers and the first part of the payload from the L23 engine and the
remainder of the payload from DBUF.
Microcode computes the remainder size and sets DBUFSZ as number of
bytes to pull from DBUF.
If there is data in DBUF but microcode incorrectly sets DBUFSZ to zero
then WO does not pull data from DBUF and WO can wedge.
www.juniper.net
23

The wedge can happen because DBUF can become full so it stops
sending grants to packet reader.
WO drains its packets but does not drain DBUF, and waits for new
packets from packet reader which never arrive.
Another possible way it can wedge, but not confirmed by simulation,
is that the incorrect DBUFSZ results in a temporally incorrect
"byte limit" back to packet reader. If packet reader waits for the
bytes to drain, but they never do as they are stuck in DBUF.
The opposite DBUFSZ zero value error can corrupt packets for an
indefinite time, but this has not been observed to cause a wedge.
That is, if the packet's plen is two cells or less then the
microcode*must* set DBUFSZ to zero.
www.juniper.net
24

If it does not then the next packet can also become
corrupted. It is speculated, but not confirmed by
simulating, that this type of error can also result in WO
CRC errors.
If DBUFSZ should be non-zero but is computed to be
larger or smaller than it is suppose to be then the error
is detected by hardware and the current packet is sent
out with an EOPE, and WO recovers. It's only when the
zero/non-zero state is incorrect that WO has been
observed to wedge.
www.juniper.net
25

calculation for IPv6 nexthop traffic
PR/105266
invalid length value in the IPV6 header may

cause Iwo to wedge. so when calculating DBUF
size, use the adj_plen instead of plen in the v6
header.
don't trust the packet length field in the IP
header, instead use the H/W calculated
ADJ_PLEN to calculate DBUF size.
ADPC2(mercator-re1 vty)# sh nh in ge-2/2/1
ID
Type
Interface
Next Hop Addr
Protocol
-----
--------
-------------
---------------
----------
534
Unicast
ge-2/2/1.0
2001:668:0:2::1:662
548
Unicast
ge-2/2/1.0
Encap
MTU
------------
----
IPv6
Ethernet
9194
IPv6
Ethernet
9194
fe80::217:cb00:aa1:7ff0
www.juniper.net
26

Here the killer packet has the wrong payload

length (second row, right before 3B, change
from the correct payload length of 06 to 00):
00 19 E2 B1 61 73 00 00 04 00 01 00 86 DD 60 30
00 00 00 00 3B FF FE 80 00 00 00 00 00 00 02 00
04 FF FE 00 01 00 20 01 06 68 00 00 00 02 00 00
00 00 00 01 06 62 FF FF 00 00 00 00 CB 8D C8 9B
ADPC2(mercator-re1 vty)# sh ich 1 re wo ip

(0xc4900b0c)
wo.ip.inter_status: 0x00000002
(0xc4900b10)
wo.ip.int_status_diag: 0x00000002
(0xc4900b14)
wo.ip.inter_log: 0x00000880
www.juniper.net
27

ADPC2(mercator-re1 vty)# sh ich 1 ipktrd qs
WAN Queue Status
WAN_PRQ_MPTY -
0xfffffffb
WAN_ICB_MPTY -
0xffffffff
WAN_PRD_DONE -
0xffffffff
ADPC2(mercator-re1 vty)# sh ich 1 imq conf str wan 2 mu

queue
------0
mas
-------90456
mu
mad
--------
hnq_ptr(0x)
--------
90035
--------------000692:00068e
ADPC2(mercator-re1 vty)# sh ich 1 wo stat ip wan 2

Iwo Input Processor Statistics:
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------Stream(2):

input packets
622446783
933309
output packets
622446759
933309
www.juniper.net
28
Case Study 5 Tunnel ingress with

fragmentation
PR/237450
The issue is the tag len is not getting correctly

added to the L2 header len in the Iwo ucode.
This causes incorrect len and offset in the
subsequent IP fragments. This a day one bug on
Ichip platforms
The tag len was not correctly extracted from
$R_TC due to incorrect mask usage. Thus tag
len is not getting correctly added to the L2
header len which causes incorrect update of the
ip total len and fragment offset field locations
in subsequent IP fragments.
www.juniper.net
29
Case Study 6 I3.0 Ifo queue buffer

PR/268274
found an improper programming in Ichip pktrd

configuration
the fab_cfg_icb was assigning 4 lookahead quota but
only 5 entries to the buffer;
there needs to be space for 2 additional entries for the
current packet. So I dialed down the lookahead quota to
3.
The script does this for all 96 fabric destinations for all
4 Ichips on the DPC.
A quota of 3 is adequate for performance; this is what
we used in our chip simulations...
Somehow, this value did not get propagated to the Junos
software, which is using the value of 4 that was ok for
I2.0 (but not I3.0). So this seems like a Day-1 bug.
www.juniper.net
30

The need for the large MTU to trigger the wedge now
makes sense. A packet needs 2 non-lookahead icells
when it is 3392 bytes or more.
What happens is that when the icell lookahead limit is
wrong by one the extra look-ahead icell will overwrite
only the second non-lookahead icell.
Thus if the MTU is less than 3392 bytes the second icell
will never be overwritten.
There also needs to be four 1-icell packets arriving
during the large (2 icell) packet so that all 4 of the icell
lookahead buffers are filled.
So one stream of low rate 9K packets and a second
stream with burst of at least four single icell (384 to
3391 byte) packets should show the problem.
www.juniper.net
31

the stream of smaller packets needs to be between 384
bytes (6 cells) and 3391 bytes (53 cells). I suspect a
stream of 384 byte smaller packets will cause the wedge
the quickest.
These small packets need to be in a burst, that is, back
to back, for at least 4 packets, and need to arrive just
after the 9K packet.
I2 has 32 fabric streams, I3 has 96. I2 has 256 icell
buffer pointers, I3 has 512. The streams were
increased by a factor 3 but the buffer pointers were
increased only by a factor of 2. Thus the programming
must be different.
www.juniper.net
32

ADPC1(A2-MX960-BOT vty)# sh ich 0 ipktrd qst
FABRIC Queue Status
FAB_PRQ_MPTY[0] -
0xfffcffff
FAB_PRQ_MPTY[1] -
0xffffffff
FAB_PRQ_MPTY[2] -
0xffffffff
FAB_ICB_MPTY[0] -
0xfffeffff
FAB_ICB_MPTY[1] -
0xffffffff
FAB_ICB_MPTY[2] -
0xffffffff
FAB_PRD_DONE[0] -
0xfffcffff
<<<
<<<
ADPC1(A2-MX960-BOT vty)# sh ich 0 imq conf str fab 8 mu

queue
mas
mu
--------
hnq_ptr(0x)
-------
--------
114684
114399
00010a:000106
114684
113178
00011c:000118
--------
mad
---------------
www.juniper.net
33

ADPC1(A2-MX960-BOT vty)# sh ich 0 ipktrd err fab 16
Cell-error Counters:
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------fab_strm[16] ecc
fab_strm[16] err
165
fab_strm[16] nt_age
fab_strm[16] ic_age
fab_strm[16] dc_age
55
13
Rate
Peak Rate
ADPC1(A2-MX960-BOT vty)# sh ich 0 imq stat fab 8 qu 0

physical queue 272:
Counter Name
Total
---------------------- ---------------- -------------- -------------OUT_PKT_Q

(BYTE)
26530802
2813438901
776265
82355652
www.juniper.net
34

ADPC1(A2-MX960-BOT vty)# sh ich 0 imq stat fab 8 qu 0
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------[..]

RDROP_PKT_Q DP00
3157265045
129153
258849
(BYTE)
330653936819
13561141
27296088
www.juniper.net
35
Case Study 7 IA FPGA DMA corruption

PR/269699
Some critical sections of DMA code were not

protected from interrupts which made DMA
data corruption possible if one I-chip write was
interrupted by the code that performed another
I-chip write.
On MX-Platform if we have many traffic going to
the routing engine and route changes the DPC
might trigger an assertion without producing a
core-dump
www.juniper.net
36
Case Study 7 IA FPGA DMA corruption

Look at the How to Repeat section of
PR/269699 for good ways to reproduce
problems.
Use the shell command route.new to install and delete
many routes
Use JIT to generate traffic through many interfaces
On the interfaces with JIT generated traffic, add a
firewall filter that will create a lot of exception
traffic sample, log, count
Use an ixia tester to send exception traffic send
pim traffic that will cause the iif mismatch, send
traffic to internal loopback address, send ARP traffic
www.juniper.net
37
Case Study 8 Traffic sent to Qs that are not

configured - PR/277853, PR/27741
How to repeat: Make sure you get traffic flowing with

bursts and some mpls packets in Q1 and Q2. After
flapping of the interface on upstream router you should
see aging counter increasing and after some time you
should also see the packet crc errors reported
If some traffic flows are using queue 1 or queue 2
without class-of-service configured it might trigger
packets aging which can corrupt packets.
This will be reported as IP CRC packet error in syslog or
can even stop forwarding traffic.
As a workaround make sure you have some transmit-rate
configured for queues which is used for traffic.
www.juniper.net
38

configured
The default configuration leaves several (6 of 8) queues

at zero transmit-rate.
If traffic happens to go to these queues of an interface
which is oversubscribed by traffic flowing thru the
other non-zero-rate queues, the zero-rate queues will
accumulate pending traffic.
That traffic will age and then enter the realm of
undetected aging.
When the oversubscription ends, the pending traffic will
drain.
A lot of it will be detected as aged (with ~7/8
probability, if we use 3 bits for "aging" detection), but
some will drain out as undetected.
www.juniper.net
39

configured
Most of it will encounter packet-CRC errors and get dropped before

it leaves the chipset
This problem exists in T/M/MX series PFE (Gimlet, Stoli, Ichip).
However, the Ichip is particularly susceptible to it due to PR231419,
where it can cause a wedge! PR277853 documents such a situation.
There are two ways to address this issue:
1. Change the default transmit-rate on all queues to a very tiny value
(i.e. not zero); this value needs to be carved out of the overall
credits that are assigned to the other queues on the interface.
It needs to be large enough to drain the minimum Mas of the queue in
less than undetected age time (e.g. ~4 seconds for Ichip).
A strict-priority queue configuration with improperly provisioned traffic
can defeat this mechanism, so there are no 100% guarantees.
Configure a policer for the strict-high queue
www.juniper.net
40

configured
2. Enhance the workaround in PR231419 to detect

a blocked queue in Imq as well:
check that Mu is non-zero, and the queue output
counter has not incremented in the one second
duration, and if so,
disable enqueue into this queue, write the
RAMQ_START_PTR and HNQ_STE_REG for that queue
(which effectively throws away all pending traffic in this
queue), and then
re-enable enqueue into the queue.
www.juniper.net
41
Case Study 9: Fail RLDRAM BIST PR/255204

Once we fail rldram BIST testing we will not initialize memory
correctly and eventually end up with parity errors.
Need to reboot the DPC and make sure the BIST does not fail
[Mar 4 20:04:01.139 LOG: Err] ICHIP(2): iifwo2 sram
parity error
[Mar 4 20:04:01.139 LOG: Err] ICHIP(2): iifwo2 sram
parity err count 1
[Mar 4 20:04:01.139 LOG: Err] ICHIP(2): iifwo2 parity
err at word offset 0x4017e2(hashed 0x17e4)
[Mar 4 20:04:01.139 LOG: Err] ICHIP(2): iifwo2 parity
err data0 0x8c400c24 data1 0x203000d
www.juniper.net
42
Case Study 9: Fail RLDRAM BIST

perform a stress test of the RLDRAM.
configure the following
set chassis no-reset-on-timeout
set chassis fpc slot <x> command noboot
set chassis images rfeb slot <x> command
noboot
then you cty of the DPC and reboot and once its
rebooting it should not load the image and you
get the BOOT prompt
you enter the following
boot -m manufacturing 1 adpc.jbf
www.juniper.net
43
Case Study 9: Fail RLDRAM BIST

once this is loaded you do
set parser security 10
diagnostic ram_bist all stress
diagnostic ichip 0 register
and see if there are errors.
www.juniper.net
44
Case Study 10: Trouble shooting LINK (SF fabric

ports) error messages to the proper FRU
syslog error messages appear like below. The trouble is
to map these LINKs to the proper FRU. PR/407207 has
been opened to enhance the error messages
Note the link numbers in the below output
Link 62 and 56 which FRU do these map to?
Dec 2 18:29:05 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip
(CB 0, ID 0): link 62 failed because of crc errors
Dec 2 18:29:05 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip
(CB 1, ID 0): link 56 failed because of crc errors
www.juniper.net
45

Since there are many different commands that
give channel and link output, it is difficult to
know which command would help cross-reference
the FRU with the proper LINK as found in the
messages log
After much digging around, we could find
circumstantial evidence with a bunch of
different commands but had a very hard time
identifying the proper command that would
match the LINK error messages to a FRU
www.juniper.net
46

Here are some of the commands we found that
provide circumstantial evidence of which FRU
might be causing problems
See which dpc has a link error, only dpc0 in this case
cli> show chassis fabric plane
cli> show chassis fabric plane
Fabric management PLANE state
Plane 0
Plane state: SPARE
FPC 0
PFE 0 :Link error
<<<
PFE 1 :Link error
<<<
PFE 2 :Link error
<<<
PFE 3 :Link error
<<<
[..]
www.juniper.net
47

See which (dpc) slot reports crc errors (0-5 for
calypso). Only slot 0 (dpc0) shows crc errors in this
case
cli> show chassis hsl channel asic-name CB[0,1]F[0,1] info slot [0-5]
cli> show chassis hsl channel asic-name CB0F0 info slot 0

CB0F0-chan-rx-105 : Down
Full with 8 links
HSL2_TYPE_T_RX
link-120
Flag: 0
reg: 0x2f000
first_link: CB0F0-hss_cu08-
64b66b No-scramble No-plesio input-width:0
Cell received: 0
CRC errors: 393210 exceeded 0
Cell last
CRC last
: 0
: 65535
<<<
www.juniper.net
48

Finally, we can find a hidden command that maps the error
messages LINK to the FRU causing the problems
Refer to the first slide for the log messages
We are looking for the FRU that matches the LINK numbers 62
and 56
cli> show chassis fabric statistics detail totals 0
SF-chip statistics for plane#0
------------------------------Total pio errors=0
<<< does not match
Statistics for input link#0 (DPC1PFE0->CB0F0_00_0):
<<< dpc1 connects to link 0
Valid cells
: 0
Request cells
: 0
Grant cells
: 0
Dropped cells
: 0
[..]
www.juniper.net
49

[..]
Statistics for output link#56 (CB0F0_14_0->DPC0PFE0) 0: << match link 56 to dpc0
Valid cells
: 0
Request cells
: 0
Grant cells
: 0
Dropped cells
: 0
[..]
Statistics for output link#62 (CB0F0_15_2->DPC0PFE2) 0: << match link 62 to dpc0
Valid cells
: 0
Request cells
: 0
Grant cells
: 0
Dropped cells
: 0
www.juniper.net
50

Here we can artificially create crc errors and check the
output:
ADPC3(router vty)# test hsl2 corrupt ichip_0 0 100
We can see the following error messages in the log

In this case, we want to find which FRU matches up with link
number 12 and 20
Dec 5 10:57:19 honolulu-re0 chassisd[5941]:
CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link 12
failed because of crc errors
Dec 5 10:57:19 honolulu-re0 chassisd[5941]:
CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link 20
failed because of crc errors
www.juniper.net
51

This hidden command does not provide the link mapping
but does show the crc errors on dpc3 (slot 3) with the
CB0F0-chan-rx-24
cli> show chassis hsl channel asic-name CB0F0 info slot 3
CB0F0-chan-rx-24 : up
Sub channel 1 of 4 with 2 links
HSL2_TYPE_T_RX
Flag: 0x23
reg: 0x23000
first_link: CB0F0-hss_cu08-link-24
64b66b No-scramble Plesio input-width:3
Cell received: 10272828104118
CRC errors: 100 exceeded 0
Cell last
CRC last
: 10105354
<<<
: 0
www.juniper.net
52

This command makes the proper mapping between the
LINK reported in the log and which FRU is causing the
problem:
Again, we are looking for link 12 and 20
SF-chip statistics for plane#0
------------------------------Total pio errors=0
<<< no match
Statistics for input link#0 (DPC1PFE0->CB0F0_00_0): <<< link 0

Valid cells
: 0
Request cells
: 0
Grant cells
: 0
Dropped cells
: 0
[..]
www.juniper.net
53

This command makes the proper mapping between the LINK reported in the log and
which FRU is causing the problem. Below we find link 12 and 20 map to DPC3:
<..>
Valid cells
: 1
Request cells
: 1
Grant cells
: 0
Dropped cells
: 0
<..>
Valid cells
: 1
Request cells
: 1
Grant cells
: 0
Dropped cells
: 0
<..>
www.juniper.net
54

Here is the fabric mapping for MX240
scb 0
scb 0
scb 1
scb 1
sf
sf
sf
sf
dpc slot
sf port
Ichip port
Ichip port
Ichip port
Ichip port
========
=======
==========
==========
==========
==========
15
n/a
n/a
14
n/a
n/a
13
11
9
8
0
4
1
5
2
6
3
7
www.juniper.net
55

PR/407207 was opened to enhance the error
message logs to report the FRU instead of the
SF fabric port LINK
www.juniper.net
56
I chip performance
For a vanilla IPv4 packet, we have only one IP DA lookup. For uRPF,
we will have SA IP lookup, + IIF tree lookup, + DA IP lookup.
So for the uRPF path we have doubled (or even more) the number of
SRAM accesses.
Basically, there is a limited budget of RLDRAM (aka SRAM, since we
used that for previous chips) accesses per packet, due to the
memory bandwidth limit.
The budget is the smallest for smallest packet sizes (highest pps
rates).
the requirement for uRPF is significantly higher than vanilla IPv4, so
at minimum packet-sizes, there won't be adequate RLDRAM
bandwidth, route-lookups will back-up and we'll drop in Ichip iif
block.
This is similar behavior to Rchip input tail-drops we've seen when
large filters were configured (at other customers).
www.juniper.net
57
I chip performance
Each RLDRAM part can service at most 200*106 reads/sec. We'll
lose some performance due to RLDRAM refresh overhead and other
small inefficiencies, so peak utilization is typically ~95% (i.e.,
~190*106 rds/sec per part). For the two parts combined, the max
rate should be ~380*106 rds/sec.
If the rlkp subsystem must perform 16 fetches from RLDRAM for
every notification that it services, then the max throughput rate is
~380/16 * 106 notifs/sec.
If you observe RLDRAM utilization rates and system throughput
rates significantly less than the above, then that is consistent with
our suspicion of an RLDRAM hot bank bottleneck (but isn't really
strong enough to be evidence). It will provide us with some
quantitative measurements of the level of activity seen at each of
the two RLDRAM parts for rlkp.
www.juniper.net
58
I chip performance
Ichip does not provide you statistics on accesses to
*each* RLDRAM bank. So if you have a hot-bank
problem, you can only deduce it indirectly, using jsim and
counting the number of accesses to each bank (the 3
LSB of the SRAM address), as you have started doing
below. Of course, jsim can be done on only a handful of
routes, and so one might not get a full picture
ADPC7(lab-brdr-01_re0 vty)# sh ich 0 iif stat
Discard counters:
Counter Name
Total
Rate
Peak Rate
---------------------- ---------------- -------------- -------------WAN_DROP_CNTR
17021548790
4009502
7973163
FAB_DROP_CNTR
283905625
323074
1168296
www.juniper.net
59
I chip performance
ADPC7(lab-brdr-01_re0 vty)# sh ich 0 isr stat
RLDRAM accessing statistics
Counter Name
Total
Rate
Peak
Rate
---------------------- ---------------- -------------- ------------R1_SRM_RD
11890300277284
195280981
R2_SRM_RD
9481857885985
186351963
219119725
213945425
sram bank is the 3 lsb of the sram address

for example, for the first line below,
- [ 2]
9 sram 000153
sram bank 3 (decimal) is being used
www.juniper.net
60
I chip performance
21 sram reads on ingress pfe
- the breakdown
- bank - used
[ 2]
9 sram 000153 4005eb61 nh: multiple SER(no SE) hops=1 addr=0x10017a

segment extended reference
www.juniper.net
61
I chip enhancements
PR/231419 - PR/103712, PR/228781, PR/103298
The work-around for PR 231419 is to flush aged icells before

they can become undetected aged icells. There are 3 "virtual
address" bits in the packet memory address (2 bits if 512MB
DIMMs). An aged icell can become an undetected aged icell if
these 3 address bits are the same during the read as the 3 bits
currently being written by packet writer. That is, the packet
memory has be overwritten a multiple of 8 times.
The workaround is to sample traffic flow and if traffic is blocked

on a stream and if the time is getting near the undetected icell
aged time then flush the stream. Detecting traffic flow is done
using the flow state register in the WO_SPI block. If these bits
say no traffic has flowed during the sample period and there was
traffic pending then traffic is blocked.
The delay bandwidth buffer is at least 200msec, so overwriting 8

times requires at least 1.6 seconds of real time. The sample rate
must be at least twice this rate to avoid sample aliasing. Pick a
flow sample rate of 0.5 seconds to detect block traffic.
www.juniper.net
62
I chip enhancements
PR/260708 - Add more debug info in Iwo as

requested by customer
PR/257149
although the sw bug trigger the wedge had been fixed this
PR will be moved into suspend mode and should reflect the
hw part. In future design hw should prevent to wedge
based on a fifo overflow and have a better protection and
debugging tools to address such conditions quickly.
PR/264319 IQ2 - add syslog when prolonged

backpressure happens on the ET/XET FPGA to the
upstream block
PR/264322 Iwo/Lout - add syslog when prolonged
backpressure happens on the ET/XET FPGA to the
upstream block
www.juniper.net
63
I chip enhancements
PR/263909
there are many different ways in which an ichip can get

wedged. When there are many ichips on one fru that can be
power reset (for example the 4x10G dpc with 4 ichips),
then if one ichip gets wedged then all of the ichips must be
power cycled to unwedge just one ichip. The above makes
customer's very unhappy as 3 other working 10G streams
must be interrupted to fix an issue with the other ichip.
This modification request is to see if it is possible to reset
just one ichip on a multi-ichip based fru.
PR/407207
Change the syslog messages to map the FRU to the LINK

error messages
www.juniper.net
64
I chip enhancements
PR/263084
In the I2 and I3 chip, if an Lout key points to corrupted Next

Hop DB entry it is possible for the WO block to wedge. The exact
hardware reason for the wedge is currently unknown, but it has
been observed experimentally (PR240012). The hardware does
check certain invalid combinations and drops the packet. But
there some combinations of corrupted nhdb are not detected and
can lead to wedging.
To improve robustness in the case of bad Lout keys or corrupted
Next Hop DB, it is desirable to initialize to zero all of memory
that can be addressed by an Lout key. This includes the multicast
area, tag area, and L2 Descriptor area. A zero value allows the
hardware to be much more robust in detecting a corrupted nhdb,
as zero is always an illegal value. Whenever the hardware reads an
entry with zeros in the top nibble the results will be a syslog of
"New illegal link errors in WO DESRD lout_key" and the packet is
dropped, instead of a possibly wedging.
Robustness can also be improved if the memory allocator zeros
any freed nhdb and multi-cast table memory blocks. This results
in stale Lout keys being reported as illegal link errors instead of
possibly wedging.
www.juniper.net
65
I chip enhancements
PR/262802 - IPv6 Jumbograms has not been handled correctly RFC

2675
PR/105469
Ichip wan stream summary stats.
When we capture the wo stats for wan_stream 0,

only the following is listed. The Iwo desrd, hdrf
and ip are all missing.
Also, the imq stream red / tail drop is missing
as well.
Can we include them in the "summary" capture for
trouble shooting purpose ? That would mininize
a "missing info" case when we perform trouble shooting
within a limited period of time.
www.juniper.net
66
Question or Suggestion
If you have any question regarding I chip
trouble-shooting, please contact
mx-escalation@juniper.net
If you have any question about this presentation

drautio@juniper.net
www.juniper.net
67

The Trouble With I

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Trouble With I

Hochgeladen von

Copyright:

Verfügbare Formate

The trouble with I

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential