Sie sind auf Seite 1von 15

MX Fabric Troubleshooting

Aditya Mahale
Advanced JTAC
July 2009

MX960
New fabric ASIC (SF)
SF uses new high speed link (HSL2) to connect
to forwarding engines
Fabric ASICs are located on the CB (no SIBs)
No SPMB (SF controlled by RE through CB
FPGA)
2 spare planes
Fabric supports 2 levels of priority (low and high)
Fabric planes not restarted on GRES

MX960
12 FPCs + 2 SCBs (OR) 11 FPCs + 3
SCBs connected over back plane
I3 (on FPCs) connected to SF using HSL2
CB2 / DPC6 is a dual use slot
RE in CB2 should be blank, will be nonoperational if present
2 SF per CB
4 I3 per DPC

MX480/MX240
4 active planes
4 spare planes
6 FPCs + 2 SCBs connected over back
plane(4 FPCs for MX240)
I3 (on FPCs) connected to SF using HSL2
NO dual use slot
2 SF per CB
4 I3 per DPC

Commands
Following commands should give you error messages and tell you the issue is
with which CB and a link between DPC:
show log messages
Show log chassisd
show chassis alarms

Following commands show you basic information about fabric like its location
and uptime and F chip ASICs:
show fabric plane-location
show chassis fabric summary
show chassis hsl asic

Following commands show you the status of the plane:


Show chassis fabric plane
Show chassis fabric fpcs

Following commands show you the connections between a plane and DPC:
Show chassis fabric map
Show chassis fabric map plane <>

Following commands show you the statistics for a plane:


show chassis fabric statistics <plane> totals detail<<<<for

all DPC with

a plane, this also shows you the link in error message


show chassis fabric statistics <plane> totals fpc<<<<for a particular DPC
Following commands show you the cell rate/drop for a particular plane:
show chassis fabric statistics rates <plane> [summery|detail|fpc]

Following command shows you the CRC errors with the link on a particuler
DPC:
show chassis hsl channel asic-name CBXFY slot D errros

For MX960:
X=0,1,2 and Y=0 and 1. D=0-11
For MX480
X=0 ,1 Y=0 and 1 D=0-5

Script
1. The errors will look something like this:
Mar 6 17:16:15 CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 2, ID 0): link 61 failed because of
crc errors
2. show log chassisd (for the day the issue was reported)
3. show chassis fabric plane(from this command check which plane shows the link as error=P and which
DPC =D)
Example:
lab@Lightning_re0MX480_J6# run show chassis fabric plane
Fabric management PLANE state
Plane 0
Plane state: ACTIVE
FPC 0
PFE 0 :Links ok
PFE 1 :Links ok
PFE 2 :Links ok
PFE 3 :Links ok

4. show chassis hsl channel asic-name CBXFY slot D errros (D is from earlier
command) X= 0-2 on MX960
X=0-1 on MX240 and MX480
Y= 0-1 on MX240,480 and 960
D=number of the DPC slot on the box
Example:
{master}[edit]
lab@Lightning_re0MX480_J6# ...asic-name CB0F1 slot 1 extensive
CB0F1-chan-rx-0 : Down
Full with 8 links
HSL2_TYPE_T_RX reg: 0x20000 first_link: CB0F1-hss_cu08-link-0
Flag: 0 64b66b No-scramble No-plesio input-width:0
Cell received: 0
CRC errors: 0 exceeded 0
Cell last : 0
CRC last : 0
<snip>

5. show chassis fabric statistics totals P detail (P is captured in the first command as the plane which shows error)
This command will show exact mapping between FPC and CB link, the link # in the error message should be matched
in this command
example:
show chassis fabric statistics totals 0 detail
SF-chip statistics for plane#0
------------------------------Total pio errors=0
Statistics for input link#0 (DPC1PFE0->CB0F0_00_0):
Valid cells
:0
Request cells
:0
Grant cells
:0
Dropped cells
:0
Statistics for input link#1 (DPC1PFE1->CB0F0_00_1):
Valid cells
:0
Request cells
:0
Grant cells
:0
Dropped cells
:0
Statistics for input link#2 (DPC1PFE2->CB0F0_00_2):
Valid cells
:0
Request cells
:0
Grant cells
:0
Dropped cells
:0
Statistics for input link#3 (DPC1PFE3->CB0F0_00_3):
Valid cells
:0
Request cells
:0
Grant cells
:0
Dropped cells
:0
<snip>

Guidelines
a) If all the links on a particular plane show errors, then replace
the corresponding CB. (reset CB before replacing)
b) If we have errors only on the links for a particular DPC, replace the
DPC.
c) If we take errors only on one (or some) of the links of a DPC on a
particular plane, replace the Corresponding DPC with another DPC
which has no error.
C1)If the errors move with the DPC, then replace the DPC.
C2)If the errors clear after reseating, the DPC, conclude
that the DPC was not seated properly (and dont return in few days).
C3)If the errors stay with the same slot, then replace the CB.
C4)If the errors come back after replacing the CB and DPCs,
replace the chassis.

Caveats

PR408359
When a RE is rebooted if the DPC's are still powered on and sending reconnects, if it becomes the master RE it
mistakenly assumes , after getting the reconnects, that it is connecting to DPC's as part of a GRES switchover
instead of a reboot. So it attempts to program the SF chip under the assumption that it has already been initialized.
This results in errors as the SF chip hardware is out of step with the software. The software does not know the SF
chip has not been initialized. In a classic GRES scenario with two RE's this is not the case as the SF chip accesses
have not been reset and hence it does not need to be reinitialized.

Avoid following operations,

1. Cold Boot Scenario


- With GRES enabled on MX
- Remove RE1 from slot
- Do power cycle.
2. Warm Boot Scenario
- With GRES enabled on MX
- Halt RE1 (backup RE)
- Restart RE0

Caveats
PR448744
Chassisd does not generate an error if it cannot
bring up links to a DPC and plane remains
offline due to an hardware error. It logs the error
message in chassisd log but does not generate
a chassis alarm

PR291541
ISSU
Yanking out working RE/CB Pair
GRES issues

Das könnte Ihnen auch gefallen