Beruflich Dokumente
Kultur Dokumente
Equipment
V200R003C10
Emergency Maintenance
Issue 02
Date 2014-04-30
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or representations
of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Purpose
This document describes how to troubleshoot emergent faults on ATN in aspects of basic
concepts, operation process, and operation guidelines. The last chapter in this manual also
provides a table for recording emergent troubleshooting operations and other related information
for reference in the future.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
Symbol Conventions
Symbol Description
Symbol Description
Command Conventions
Convention Description
GUI Conventions
Convention Description
Change History
Updates between document issues are cumulative. Therefore, the latest document issue contains
all updates made in previous issues.
Contents
This chapter describes the definition of emergencies, and the definition, sources, handling
guidelines, and precautions of emergencies maintenance.
An emergency is an extreme situation, which may be foretold by abnormality alarms and logs.
You can confirm whether an emergency occurs by checking either alarms and logs or a
customer's complaint.
NOTE
This section describes the roadmap for handling emergencies. The roadmap for locating common faults,
refer to the troubleshooting.
l Customer' complaint
A customer's complaint is the main reason for the application of emergency maintenance.
When a fault reported by a customer or the Customer Service Center conforms to the
conditions in section "1.2 Definition of Emergencies", emergency maintenance needs to
be applied.
l Alarm messages
When the alarm messages output by the Network Management System (NMS) or displayed
on the terminal initiate a large scale of service interruption, emergency maintenance needs
to be applied.
l Natural disaster
When a natural disaster such as the earthquake, fire, or flood happens, it is required to
temporarily power off devices and apply emergency maintenance.
Emergencies easily cause network access failures of numerous users, device breakdown, and
service interruption, resulting in serious consequences. To improve the efficiency in handling
emergencies and minimize the losses, you must comply with the following guidelines when
maintaining devices:
l To keep the stable running of a device and minimize the occurrence probability of
emergencies, refer to the Routine Maintenance.
l The core function of emergency maintenance is to recover system operation and service
provision as soon as possible. To efficiently handle emergencies, you need to set up schemes
to handle various emergencies according to the emergency maintenance manual. Managers
and maintenance personnel must be well-trained and familiar with these schemes.
l The maintenance personnel must attend the emergency maintenance training to learn
methods of identifying and handling emergencies.
l When an emergency occurs, keep calm and check whether the hardware and route of the
ATN are normal. Then, check whether the emergency is caused by the ATN. If so, handle
the emergency according to the predetermined emergency handling schemes or the
processing procedures in this manual.
l The CF card contains important data. When an emergency occurs, do not format the CF
card before consulting Huawei engineers.
l Contact the Customer Service Center or the local office of Huawei early for technical
support during troubleshooting.
l Once the emergency is solved, collect related alarm information and send the handling
report, device alarm files, and log files to Huawei for analysis. This can help Huawei to
improve the after-sales service.
To ensure the security of the device and safety of the operators, comply with the following
guidelines.
Static Electricity
Wear an ESD wrist strap before operating a board or the backplane, and comply with the
following rules:
l For precautions and procedures of board replacement, see "Replacing Boards" in Parts
Replacement.
l Place a board in an ESD bag before installing it.
l Place a removed board in an ESD bag.
Laser/LED
When you maintain a device with an optical module or optical interface, comply with the
following rules:
l When installing and maintaining the optical fiber, do not look into the optical fiber without
eye protection.
l When replacing the pluggable optical module, do not look into the connector of the optical
fiber without eye protection.
l Only personnel who attend the mandatory training can operate the optical module and
optical fiber on the ATN.
NOTE
When you install and maintain the optical fiber, keep the connector of the optical fiber clean, unfolded,
and straight.
During the equipment operation and maintenance, if a fault occurs and is difficult to locate or
rectify, or if you cannot rectify the fault by referring to the after-sales customer documentation,
contact Huawei for assistance (Huawei engineers will provide guidance remotely or on site on
troubleshooting).
Call local Huawei branches or representative offices or contact Huawei Customer Service
Center.
l Call local Huawei branches or representative offices or contact Huawei Customer Service
Center.
l Contact Huawei Customer Service Center: support@huawei.com.
l During troubleshooting, maintain detailed records of operation procedures and results, which can
provide reference for Huawei technical support personnel and thus handle the emergency sooner.
l When a fault persists, contact Huawei Customer Service Center. For contact information, see section
"1.6 Technical Support."
The main purpose of emergency maintenance is to recover a system operation and service
provision as soon as possible. Figure 2-1 shows the flow chart of emergency maintenance.
Collect fault
information
No
Service recovery ? Obtain help
Yes
Check the handling
result
Record information
about emergency
maintenance
End
NOTE
Even if you can independently complete emergency maintenance with the guidance of this manual, notify
Huawei of the emergency. Then, Huawei technical personnel maintain records of the fault to improve after-
sales services.
For details about fault information collection, see the chapter "3.1 Guide to Collecting Fault
Information".
Start
Can log in No
through the console
Interface?
Yes
System starts No
normally?
Yes
Board status is No
normal?
Yes
Interface status No
is normal?
Yes
Item Method
Login through Connect the serial interface of a PC or terminal to the console interface of
the console the ATN and set relevant parameters on the terminal. For details, refer to
interface the ATN Multi-service Access Equipment Configuration Guide - Basic
Configurations.
Check that a terminal display is provided.
System startup Check whether the system can be started normally and the command
prompt such as <HUAWEI> is displayed.
Board status Run the display device command on the terminal to check whether the
status of all boards is Normal. In the case of a local fault, check the status
of the service board connected to the customer who reports the fault.
Interface status Run the display interface command on the terminal to check whether the
status of the interface connected to the customer who reports the fault is
Up and whether the number of packets received on the interface increases.
After the fault type is verified, apply emergency maintenance according to the description in the
Chapter "3.2 Guide to Handling Emergencies".
You are recommended to arrange technical personnel to monitor the system running during the
service peak time so that further problems can be handled immediately.
1 Occurrence time Record the time when the fault occurs. The value should
be accurate to a minute.
2 Fault symptom Collect the fault symptom and maintain detailed records.
3 Fault severity level Record the fault severity level according to the range and
the severity of the fault.
6 Taken measures Record the measures that have been taken and the results.
NOTE
When collecting fault information through command lines, you can copy the information displayed on the
console, such as the serial interface or the Telnet terminal, and then attach it to a txt. file for a record.
1 Device information Run the display device command to collect the device
information.
3 CPU usage Run the display cpu-usage command to collect the CPU
usage information.
5 Log information Run the display logbuffer command to collect the log
information.
6 Alarm information Run the display trapbuffer command to collect the alarm
information.
10 Network connectivity Run the ping command to collect information about the
information network connectivity and record the results.
NOTE
When a device runs normally, you are recommended to back up the historical traps and logs in the CF card
through the Trivial File Transfer Protocol (TFTP) or File Transfer Protocol (FTP).
Fault Symptom
After the serial interface of a PC or a terminal is connected to the console interface of the
ATN and the relevant parameters are set, nothing is displayed on the terminal.
Processing Procedure
Figure 3-1 Flowchart of solving the problem that users cannot log in to a system through the
serial interface
Start
Yes No
No Yes
Is the cable in Replace the Is the fault
good condition? cable rectified?
Yes No
Are No
Is the fault Yes
parameters for the Modify the
serial interface parameters rectified?
correct?
Yes No
Does No Exchange or
Is the fault Yes
the MPU/SRU work replace the
rectified?
normally? MPU/SRU
Yes No
Seek technical
End
support
NOTICE
All the following steps are performed only when the customer's services are already interrupted,
and therefore have no adverse effect on services. If the customer's services are not interrupted,
collect fault information and provide it to Huawei engineers for further processing.
Procedure
Step 1 Check and repair the power supply system.
When you find that the indicators of all the boards are off and all the fans fail to work (which
can be identified by fan's rotating), or the indicator of the power module is abnormal, the power
supply system of the device is possibly faulty and need repairing. The power supply system
consists of the following:
1. Check whether the power module is switched on. When there are multiple power modules,
ensure that at least one works normally.
2. When none of the preceding problems is found, but the power supply system fails to work,
seek Huawei technical support according to 1.6 Technical Support.
Check whether parameters set for the serial interface are identical with those for the console
interface on the ATN. If they are not identical, modify the parameters of the serial interface.
----End
Fault Symptom
The system is failed to be started and the prompt may be displayed on the terminal as follows:
l "XXXXX selftest.........FAIL!", which indicates that the self-test of a certain module fails.
l The system remains in the phase of file decompression for a long time.
l The system is repeatedly restarted.
2 Name of the startup Check the name of the startup file through the Basic Input/
file Output System (BIOS) menu.
Processing Procedure
Figure 3-2 Flow chart of solving the problem that the system cannot be started
Start
No No
Make the
Is the Yes startup files of
the master and Is the fault Yes
system repeatedly
slave system rectified?
restarted?
control board No
identical
No
Seek technical
End
support
NOTICE
All the following steps are performed only when the customer's services are already interrupted,
and therefore have no adverse effect on services. If the customer's services are not interrupted,
do not perform the following steps. Instead, collect fault information and feed it back to Huawei
engineers for further processing.
Procedure
Step 1 Check and repair the power supply system.
When you find that the indicators of all the boards are off and all the fans fail to work (which
can be identified by fan's rotating), or the indicator of the power module is abnormal, the power
supply system of the device is possibly faulty and need repairing. The power supply system
consists of the following:
1. Check whether the power module is switched on. When there are multiple power modules,
ensure that at least one works normally.
2. When none of the preceding problems is found, but the power supply system fails to work,
seek Huawei technical support according to 1.6 Technical Support.
It is complicated to upload the startup file through BIOS. Contact Huawei technical support
personnel and perform the uploading under their guidance. For detailed operation procedures,
refer to Appendix B "3.4 Built-in System Software Is Incorrect or Does Not Exist."
----End
Fault Symptom
Hardware components refer to hardware modules including board modules, power supply
modules, and fan modules. The abnormality of hardware component status includes (one or
multiple items):
l When you run the display device command in any view to view information about a
hardware component where services are interrupted, the hardware component status is
Abnormal.
l When you run the display device command in any view to view information about a
hardware component where services are interrupted, the hardware component status is
Unregistered.
l The RUN or STATUS indicator of a hardware component blinks or is off, or the ALM
indicator of the hardware component is on.
2 Detailed information Run the display device slot-id command in any view to
about a hardware view detailed information about the specified hardware
component component.
3 Status of a PIC Run the display device pic-status command in any view
channel to collect information about the status of a PIC channel.
Processing Procedure
Figure 3-3 Flow chart of solving the problem that the hardware component status is abnormal
Start
End
NOTICE
All the following steps are performed only when the customer's services are already interrupted,
and therefore have no adverse effect on services. If the customer's services are not interrupted,
do not perform the following steps. Instead, collect fault information and feed it back to Huawei
engineers for further processing.
Procedure
Step 1 Reset the hardware component.
1. Check whether the power module is switched on. When there are multiple power modules,
ensure that at least one works normally.
2. When none of the preceding problems is found, but the power supply system fails to work,
seek Huawei technical support according to 1.6 Technical Support.
If the abnormality occurs on the fan modules, directly replace the fan modules.
If a board is abnormal and the situation is urgent, reset and replace the board. Relevant cause
location will be performed by Huawei technical support personnel.
You can reset a board by using the reset slot slot-id command in the user view, pressing the
RESET button on the panel, or pulling out and inserting the board.
NOTE
You are recommended not to pull out and insert in the board for resetting. This can avoid damages on the
board.
Step 3 Cut over services on the board and seek Huawei technical support.
When the preceding methods cannot solve the problem, you can cut over services on the faulty
board to a normal board or a board in an vacant slot. For operation details, contact Huawei
technical support personnel or comply with your cutover scheme.
In addition, report the fault information to the local Huawei office for technical support.
----End
Fault Symptom
The abnormality of the interface status includes:
l When the display interface [ interface-type interface-number ] command is run in any
view to check an interface where services are interrupted, the interface status is Down.
l When the display interface [ interface-type interface-number ] command is run in any
view to check an interface where services are interrupted, the number of packets transmitted
on the interface remains unchanged.
l When the display interface [ interface-type interface-number ] command is run in any
view to check the interface where services are interrupted, a large number of CRC packets
are received.
l The indicator status of an interface is abnormal. For example, the LINK indicator of the
interface is off.
4 Brief information Run the display interface brief command in any view to
about all interfaces collect brief information about all interfaces.
Processing Procedure
Figure 3-4 Flow chart of solving the problem that the interface status is abnormal
Start
Status
of interface Proceed to the
Yes
indicator flow for handling
normal? service faults
No
Interface No Yes
status Is manually Shut up the
shut down? interface
is Up?
No
Yes Yes
Fault
Detect the link End
rectified?
Fault No
rectified?
Yes
End
NOTICE
All the following steps are performed only when the customer's services are already interrupted,
and therefore have no adverse effect on services. If the customer's services are not interrupted,
do not perform the following steps. Instead, collect fault information and feed it back to Huawei
engineers for further processing.
NOTE
Usually, the interface fault is caused by the problem of the cable or optical module.
l When the cable is broken or the optical module is damaged, the interface fails to go Up.
l When the cable or optical module on an interface has been being used for many years, the signal
attenuation may be severe. In this case, although the interface is Up, a large number of packets are
discarded.
Replace the cable or optical module on the faulty interface. If the problem persists, perform the following
operations.
Procedure
Step 1 Start the interface manually.
Run the display this in the interface view to check the configuration files of the faulty interface.
If you find that an interface is shut down through the shutdown command, run the undo
shutdown command in the interface view to start it.
If so, it indicates that the physical link is Up and you can detect the link as follows:
1. Run the display this interface command in the interface view to check whether the
interface parameters at both ends of the link are identical, such as the duplex mode and rate.
2. In the case of optical interfaces, use the optical power meter to check whether the receiving
and sending optical signals at both ends are normal. If it is not convenient to use the optical
power meter, you can use the optical power detecting function in the optical module: run
display this interface in the error interface and compare the parameters in the display
information with the optical module parameters, and check whether the power of the send
or receive optical signals are in normal range. If optical interfaces only send or receive
optical signals, the optical module is possibly faulty or the optical fiber does not match the
optical module. Then, you can try to replace the optical module or the optical fiber.
3. If the interface is an electric interface, observe the pinouts in the RJ-45 connectors on both
ends of the case to check whether the cable should be straight-through or crossover cable
for the specified interface.
DANGER
When you check the receiving and sending of optical signals, do not look into the optical fiber
without eye protection. You must use the optical power meter to measure the optical power.
When the LINK indicator of the interface is off, you can check the link as follows:
1. Perform a physical loopback test on the device. That is, connect the faulty interface to an
interface is in the normal state through an optical fiber or cable in good condition. Pay
attention to the two interfaces' type should match with each other.
2. If the LINK indicator is on, it indicates that the interface is normally. In this situation, you
need to check whether the optical fiber or cable is damaged and the trunk link runs normally.
Usually, you need to check the optical fibers, cables, and trunk links at the neighboring
sites.
3. If the LINK indicator is off, it indicates that the interface hardware is faulty. When a
pluggable optical module is used, you can replace the optical module; otherwise, cut over
services on the faulty interface to other interfaces in the normal state.
Step 3 Check and modify the data link layer or upper layer protocol.
If the interface still fails to send and receive packets in the local loopback test, check the data
link layer or upper layer protocol. For example, check the Point-to-Point Protocol (PPP) protocol
at both ends is identical and the routing protocol runs normally.
Run the shutdown command to disable the interface, and then run the undo shutdown command
to enable the interface to reset the interface.
----End
NOTICE
l Restart the ATN with caution. If it is required to restart the ATN, go over the principles and
precautions described in Chapter 1, or perform the restart operation under the guidance of
Huawei technical personnel.
l Before the ATN is successfully restarted, all services on the ATN are interrupted unless the
dual-system hot backup networking is adopted.
When a critical fault occurs on the ATN during the equipment running, the ATN is automatically
restarted. After the restart, the ATN runs normally. The ATN needs to be restarted manually
only in emergency or exception, for example, services are interrupted because of the fault
occurred on the ATN, and the ATN fails to automatically restart or recover by using other
methods.
Before restarting the ATN, confirm whether configuration files of the ATN need to be backed
up. Configuration files should be backed up and executed automatically after the restart. In this
case, services can be automatically resumed.
NOTE
You are recommended not to restart the ATN remotely. Otherwise, once the restart operation fails, services
may be interrupted for a long period.
Enter the reboot command in the user view and press Y after the display to restart the ATN.
The operation example is as follows:
<HUAWEI> reboot
mpu 2:
Next startup system software: cfcard:/V200R003C10.cc
Paf: V200R003C10
License: V200R003C10
Next startup saved-configuration file: cfcard:/vrpcfg.zip
NOTE
The reboot command output varies with system versions. Take the command output of the current system
version as the standard.
l The schedule reboot delay command is used to enable the scheduled reboot function and
set the wait delay for the ATN.
The wait delay set for the scheduled restart of the ATN can be expressed in two formats:
"hour: minute" and "absolute minutes". The total minutes cannot be greater than 30 x 24 x
60 minutes.
l The schedule reboot at command is used to enable the scheduled restart function and set
the specific restart date and time for the ATN. Note that the specified date cannot be 30
days later than the current date.
If the schedule reboot at command sets a specific date (yyyy/mm/dd) and the date is a
future date, the ATN is restarted at the set time and the error is within 1 minute. If no specific
date is set, the following situations occur:
– If the set time is later than the current time, the ATN is restarted at this time that day.
– If the set time is earlier than the current time, the ATN is restarted at this time the next
day.
After the schedule reboot delay or schedule reboot at command is run, the system prompts
you to confirm the restart. Enter Y or y, and the configuration takes effect. If the related
configuration exists, the latest configuration overrides the previous one.
NOTE
If you adjust the system time through the clock command after running the schedule reboot delay or
schedule reboot at command, the parameter set through the schedule reboot delay or schedule reboot
at command becomes invalid.
You can run the undo schedule reboot command to remove the parameter set through the
schedule reboot delay or schedule reboot at command.
You can run the display schedule reboot command to view the parameter set through the
schedule reboot delay or schedule reboot at command.
NOTICE
After the ATN is restarted, check that the configuration data is recovered correctly and
completely. The loss of configuration data will result in the service interruption and you are
therefore required to manually add the configuration data and save it.
The preceding display shows that the device is restarted successfully. You can press Enter and
enter the user view.
The preceding command output shows the Versatile Routing Platform (VRP) version, host
version, and patch version. You can check whether the version number before and after the
restart is identical.
Context
When you load the system software package in BIOS mode, only FTP is allowed and the
operation terminal (a PC) must be connected to the device through a serial port, network port,
or ATN. The PC and device communicate with each other using hyper terminal.
NOTE
In this section, the active and standby system control boards refer to the ones that are working in the system
before the software package is loaded. After the system software package is loaded, the active/standby
status of the system control boards will change.
For ATN, the system software package must be separately loaded to the active and standby system control
boards. First load the software package to the active system control board and in the meanwhile remove
the standby system control board. Then load the software package to the standby system control board in
the same way as you load it to the active system control board.
Procedure
Step 1 Connect the system control board's console port to the PC's COM port and configure the hyper
terminal.
The FTP server and operation terminal can be the same PC.
The following uses an example of configuring the hyper terminal on Windows XP to illustrate
how to configure hyper terminal.
1. Run Windows XP and choose Start > Accessories > Communications > Hyper
Terminal. In the window that is displayed, enter a name in the Name field.
2. Click OK. The following window is displayed. In the window, select a COM port.
3. Click OK. The following window is displayed. In the window, set B to 38400 and retain
the default settings for other parameters.
Step 2 Run the FTP server on the PC and create an FTP user.
NOTE
The FTP setting display depends on the FTP software.
Set the FTP server parameters, including the home directory, user name, and password.
Step 3 Run the reboot command to restart ATN. The device startup information is displayed in the
hyper terminal window. The device startup information includes [LAN2]TCP Server Recv
Task Begin! and Boot pkg check begin.... If [DMM] Beat Timer Proc Begin! is displayed,
the system control board has successfully entered the BIOS state.
NOTE
If the system control board fails to enter the BIOS state, contact Huawei technical support engineers.
Step 4 When [DMM] Beat Timer Proc Begin! is displayed, press Enter and enter devs to check
directories. Ensure that there is a cfcard: directory.
drv name 0 /null 1 /tyCo/0 1 /tyCo/1 5 bootHost: 8 /vio 3 cfcard: 3 ofs1 3 ofs2 3
mfs value = 25 = 0x19
In the preceding information, "Internet address: 129.10.6.32" refers to the IP address of the ATN
NE. The ATN NE's IP address and PC's IP address must be in the same network segment. If
they are in different network segments, a login to the ATN NE will fail.
Step 6 On Windows XP, choose Start > Run, enter the FTP address (for example, ftp://129.10.6.32),
and click OK to log in to the FTP server.
Step 7 Copy the bcf.txt, configuration file, and system software that are stored on the PC to the
cfcard: directory by FTP. After that, shut down FTP.
If the following message is displayed, the system has successfully started up.
Recover configuration...OK!
Press ENTER to get started.
<HUAWEI>
For a device with a single system control board , loading the software package is completed after
the startup. Then, go to next step.
For a device with two system control boards, remove the active system control board and insert
the standby system control board. Repeat the preceding steps to load the software package to
the standby system control board.
If "ok" is displayed, the upgrade is successful. If other information is displayed, contact Huawei
technical support engineers.
----End
This chapter describes how to fill up the emergency maintenance record table.
You can fax an emergency maintenance notice to Huawei. The format of the notice is shown in
Table 4-1:
Result (attachment):
Handled by: Date:
Known anomalies:
Site:__________________Date: MM/DD/YY_________________.
Emergency Emergency
occured at handled at
Person on Emergency
duty handled by
Routine
maintenance
Alarms
Other sources
Fault symptom: